Prof. Alex Bronstein
Fast nonlinear vector quantile regression
Quantile regression (QR) is a powerful tool for estimating one or more conditional quantiles of a target variable Y given explanatory features X. A limitation of QR is that it is only defined for scalar target variables, due to the formulation of its objective function, and since the notion of quantiles has no standard definition for multivariate distributions. Recently, vector quantile regression (VQR) was proposed as an extension of QR for high-dimensional target variables, thanks to a meaningful generalization of the notion of quantiles to multivariate distributions. Despite its elegance, VQR is arguably not applicable in practice due to several limitations: (i) it assumes a linear model for the quantiles of the target Y given the features X; (ii) its exact formulation is intractable even for modestly-sized problems in terms of target dimensions, the number of regressed quantile levels, or the number of features, and its relaxed dual formulation may violate the monotonicity of the estimated quantiles; (iii) no fast or scalable solvers for VQR currently exist. In this work we fully address these limitations, namely: (i) We extend VQR to the non-linear case, showing substantial improvement over linear VQR; (ii) We propose vector monotone rearrangement, a method which ensures the estimates obtained by VQR relaxations are monotone functions; (iii) We provide fast, GPU-accelerated solvers for linear and nonlinear VQR which maintain a fixed memory footprint with the number of samples and quantile levels, and demonstrate that they scale to millions of samples and thousands of quantile levels; (iv) We release an optimized python package of our solvers as to widespread the use of VQR in real-world applications.
Towards predicting fine finger motions from ultrasound images via kinematic representation
A central challenge in building robotic prostheses is the creation of a sensor-based system able to read physiological signals from the lower limb and instruct a robotic hand to perform various tasks. Existing systems typically perform discrete gestures such as pointing or grasping, by employing electromyography (EMG) or ultrasound (US) technologies to analyze the state of the muscles. In this work, we study the inference problem of identifying the activation of specific fingers from a sequence of US images when performing dexterous tasks such as keyboard typing or playing the piano. While estimating finger gestures has been done in the past by detecting prominent gestures, we are interested in classification done in the context of fine motions that evolve over time. We consider this task as an important step towards higher adoption rates of robotic prostheses among arm amputees, as it has the potential to dramatically increase functionality in performing daily tasks. Our key observation, motivating this work, is that modeling the hand as a robotic manipulator allows to encode an intermediate representation wherein US images are mapped to said configurations. Given a sequence of such learned configurations, coupled with a neural-network architecture that exploits temporal coherence, we are able to infer fine finger motions. We evaluated our method by collecting data from a group of subjects and demonstrating how our framework can be used to replay music played or text typed. To the best of our knowledge, this is the first study demonstrating these downstream tasks within an end-to-end system.
Physical passive patch adversarial attacks on visual odometry systems
Deep neural networks are known to be susceptible to adversarial perturbations — small perturbations that alter the output of the network and exist under strict norm limitations. While such perturbations are usually discussed as tailored to a specific input, a universal perturbation can be constructed to alter the model’s output on a set of inputs. Universal perturbations present a more realistic case of adversarial attacks, as awareness of the model’s exact input is not required. In addition, the universal attack setting raises the subject of generalization to unseen data, where given a set of inputs, the universal perturbations aim to alter the model’s output on out-of-sample data. In this work, we study physical passive patch adversarial attacks on visual odometry-based autonomous navigation systems. A visual odometry system aims to infer the relative camera motion between two corresponding viewpoints, and is frequently used by vision-based autonomous navigation systems to estimate their state. For such navigation systems, a patch adversarial perturbation poses a severe security issue, as it can be used to mislead a system onto some collision course. To the best of our knowledge, we show for the first time that the error margin of a visual odometry model can be significantly increased by deploying patch adversarial attacks in the scene. We provide evaluation on synthetic closed-loop drone navigation data and demonstrate that a comparable vulnerability exists in real data.
Machine learning approaches demonstrate that protein structures carry information about their genetic coding
Synonymous codons translate into the same amino acid. Although the identity of synonymous codons is often considered
inconsequential to the final protein structure there is mounting evidence for an association between the two. Our study
examined this association using regression and classification models, finding that codon sequences predict protein backbone dihedral angles with a lower error than amino acid sequences, and that models trained with true dihedral angles have better classification of synonymous codons given structural information than models trained with random dihedral angles. Using this classification approach, we investigated local codon-codon dependencies and tested whether synonymous codon identity can be predicted more accurately from codon context than amino acid context alone, and most specifically which codon context position carries the most predictive power.
Defining amino acid pairs as structural units suggests mutation sensitivity to adjacent residues
Proteins fold from chains of amino acids, forming secondary structures, α-helices and β-strands, that, at least for globular proteins, subsequently fold into a three-dimensional structure. A large-scale analysis of high-resolution protein structures suggests that amino acid pairs constitute another layer of ordered structure, more local than these conventionally defined secondary structures. We develop a cross-peptide-bond Ramachandran plot that captures the 15 conformational preferences of the amino acid pairs and show that the effect of a particular mutation on the stability of a protein depends in a predictable manner on the adjacent amino acid context.
Codon-specific Ramachandran plots show amino acid backbone conformation depends on identity of the translated codon
Synonymous codons translate into chemically identical amino acids. Once considered inconsequential to the formation of the protein product, there is now significant evidence to suggest that codon usage affects co-translational protein folding and the final structure of the expressed protein. Here we develop a method for computing and comparing codon-specific Ramachandran plots and demonstrate that the backbone dihedral angle distributions of some synonymous codons are distinguishable with statistical significance for some secondary structures. This shows that there exists a dependence between codon identity and backbone torsion of the translated amino acid. Although these findings cannot pinpoint the causal direction of this dependence, we discuss the vast biological implications should coding be shown to directly shape protein conformation and demonstrate the usefulness of this method as a tool for probing associations between codon usage and protein structure. Finally, we urge for the inclusion of exact genetic information into structural databases.
Inverse design of spontaneous parametric downconversion for generation of high-dimensional qudits
Spontaneous parametric down-conversion in quantum optics is an invaluable resource for the realization of high-dimensional qudits with spatial modes of light. One of the main open challenges is how to directly generate a desirable qudit state in the SPDC process. This problem can be addressed through advanced computational learning methods; however, due to difficulties in modeling the SPDC process by a fully differentiable algorithm that takes into account all interaction effects, progress has been limited. Here, we overcome these limitations and introduce a physically-constrained and differentiable model, validated against experimental results for shaped pump beams and structured crystals, capable of learning every interaction parameter in the process. We avoid any restrictions induced by the stochastic nature of our physical model and integrate the dynamic equations governing the evolution under the SPDC Hamiltonian. We solve the inverse problem of designing a nonlinear quantum optical system that achieves the desired quantum state of down-converted photon pairs. The desired states are defined using either the second-order correlations between different spatial modes or by specifying the required density matrix. By learning nonlinear volume holograms as well as different pump shapes, we successfully show how to generate maximally entangled states. Furthermore, we simulate all-optical coherent control over the generated quantum state by actively changing the profile of the pump beam. Our work can be useful for applications such as novel designs of high-dimensional quantum key distribution and quantum information processing protocols. In addition, our method can be readily applied for controlling other degrees of freedom of light in the SPDC process, such as the spectral and temporal properties, and may even be used in condensed-matter systems having a similar interaction Hamiltonian.
Mint: An Accelerator For Mining Temporal Motifs
A variety of complex systems, including social and communication networks, financial markets, biology, and neuroscience are modeled using temporal graphs that contain a set of nodes and directed timestamped edges. Temporal motifs in temporal graphs are generalized from subgraph patterns in static graphs in that they also account for edge ordering and time duration, in addition to the graph structure. Mining temporal motifs is a fundamental problem used in several application domains. However, existing software frameworks offer suboptimal performance due to high algorithmic complexity and irregular memory accesses of temporal motif mining. This paper presents Mint—a novel accelerator architecture and a programming model for mining temporal motifs efficiently. We first divide this workload into three fundamental tasks: search, book-keeping, and backtracking. Based on this, we propose a task–centric programming model that enables decoupled, asynchronous execution. This model unlocks massive opportunities for parallelism, and allows storing task context information on-chip. To best utilize the proposed programming model, we design a domain-specific hardware accelerator using its data path and memory subsystem design to cater to the unique workload characteristics of temporal motif mining. To further improve performance, we propose a novel optimization called search index memoization that significantly reduces memory traffic. We comprehensively compare the performance of Mint with state-of-the-art temporal motif mining software frameworks (both approximate and exact) running on both CPU and GPU, and show 9×–2576× benefit in performance.
Interpretable deep learning unveils structure-property relationships in polybenzenoid hydrocarbons
In this work, interpretable deep learning was used to identify structure-property relationships governing the HOMO-LUMO gap and relative stability of polybenzenoid hydrocarbons (PBHs). To this end, a ring-based graph representation was used. In addition to affording reduced training times and excellent predictive ability, this representation could be combined with a subunit-based perception of PBHs, allowing chemical insights to be presented in terms of intuitive and simple structural motifs. The resulting insights agree with conventional organic chemistry knowledge and electronic structure-based analyses, and also reveal new behaviors and identify influential structural motifs. In particular, we evaluated and compared the effects of linear, angular, and branching motifs on these two molecular properties, as well as explored the role of dispersion in mitigating torsional strain inherent in non-planar PBHs. Hence, the observed regularities and the proposed analysis contribute to a deeper understanding of the behavior of PBHs and form the foundation for design strategies for new functional PBHs.
Contrast to divide: Self-supervised pre-training for learning with noisy labels
The success of learning with noisy labels (LNL) methods relies heavily on the success of a warm-up stage where standard supervised training is performed using the full (noisy) training set. In this paper, we identify a” warm-up obstacle”: the inability of standard warm-up stages to train high quality feature extractors and avert memorization of noisy labels. We propose” Contrast to Divide”(C2D), a simple framework that solves this problem by pre-training the feature extractor in a self-supervised fashion. Using self-supervised pre-training boosts the performance of existing LNL approaches by drastically reducing the warm-up stage’s susceptibility to noise level, shortening its duration, and improving extracted feature quality. C2D works out of the box with existing methods and demonstrates markedly improved performance, especially in the high noise regime, where we get a boost of more than 27% for CIFAR-100 with 90% noise over the previous state of the art. In real-life noise settings, C2D trained on mini-WebVision outperforms previous works both in WebVision and ImageNet validation sets by 3% top-1 accuracy. We perform an in-depth analysis of the framework, including investigating the performance of different pre-training approaches and estimating the effective upper bound of the LNL performance with semi-supervised learning.
MetAdapt: Meta-learned task-adaptive architecture for few-shot classification
Few-Shot Learning (FSL) is a topic of rapidly growing interest. Typically, in FSL a model is trained on a dataset consisting of many small tasks (meta-tasks) and learns to adapt to novel tasks that it will encounter during test time. This is also referred to as meta-learning. Another topic closely related to meta-learning with a lot of interest in the community is Neural Architecture Search (NAS), automatically finding optimal architecture instead of engineering it manually. In this work we combine these two aspects of meta-learning. So far, meta-learning FSL methods have focused on optimizing parameters of pre-defined network architectures, in order to make them easily adaptable to novel tasks. Moreover, it was observed that, in general, larger architectures perform better than smaller ones up to a certain saturation point (where they start to degrade due to over-fitting). However, little attention has been given to explicitly optimizing the architectures for FSL, nor to an adaptation of the architecture at test time to particular novel tasks. In this work, we propose to employ tools inspired by the Differentiable Neural Architecture Search (D-NAS) literature in order to optimize the architecture for FSL without over-fitting. Additionally, to make the architecture task adaptive, we propose the concept of ‘MetAdapt Controller’ modules. These modules are added to the model and are meta-trained to predict the optimal network connections for a given novel task. Using the proposed approach we observe state-of-the-art resu
Joint optimization of system design and reconstruction in MIMO radar imaging
Multiple-input multiple-output (MIMO) radar is one of the leading depth sensing modalities. However, the usage of multiple receive channels lead to relative high costs and prevent the penetration of MIMOs in many areas such as the automotive industry. Over the last years, few studies concentrated on designing reduced measurement schemes and image reconstruction schemes for MIMO radars, however these problems have been so far addressed separately. On the other hand, recent works in optical computational imaging have demonstrated growing success of simultaneous learningbased design of the acquisition and reconstruction schemes, manifesting significant improvement in the reconstruction quality. Inspired by these successes, in this work, we propose to learn MIMO acquisition parameters in the form of receive (Rx) antenna elements locations jointly with an image neuralnetwork based reconstruction. To this end, we propose an algorithm for training the combined acquisition-reconstruction pipeline end-to-end in a differentiable way. We demonstrate the significance of using our learned acquisition parameters with and without the neural-network reconstruction.
Loss aware post-training quantization
Neural network quantization enables the deployment of large models on resource-constrained devices. Current post-training quantization methods fall short in terms of accuracy for INT4 (or lower) but provide reasonable accuracy for INT8 (or above). In this work, we study the effect of quantization on the structure of the loss landscape. We show that the structure is flat and separable for mild quantization, enabling straightforward post-training quantization methods to achieve good results. We show that with more aggressive quantization, the loss landscape becomes highly non-separable with steep curvature, making the selection of quantization parameters more challenging. Armed with this understanding, we design a method that quantizes the layer parameters jointly, enabling significant accuracy improvement over current post-training quantization methods.
Water stabilizes an alternate turn conformation in horse heart myoglobin
Comparison of myoglobin structures reveals that protein isolated from horse heart consistently adopts an alternate turn conformation in comparison to its homologues. Analysis of hundreds of high-resolution structures discounts crystallization conditions or the surrounding amino acid protein environment as explaining this difference, that is also not captured by the AlphaFold prediction. Rather, a water molecule is identified as stabilizing the conformation in the horse heart structure, which immediately reverts to the whale conformation in molecular dynamics simulations excluding that structural water.
GRASP: Graph Alignment through Spectral Signatures
What is the best way to match the nodes of two graphs? This graph alignment problem generalizes graph isomorphism and arises in applications from social network analysis to bioinformatics. Some solutions assume that auxiliary information on known matches or node or edge attributes is available, or utilize arbitrary graph features. Such methods fare poorly in the pure form of the problem, in which only graph structures are given. Other proposals translate the problem to one of aligning node embeddings, yet, by doing so, provide only a single-scale view of the graph. In this paper, we transfer the shape-analysis concept of functional maps from the continuous to the discrete case, and treat the graph alignment problem as a special case of the problem of finding a mapping between functions on graphs. We present GRASP, a method that first establishes a correspondence between functions derived from Laplacian matrix eigenvectors, which capture multiscale structural characteristics, and then exploits this correspondence to align nodes. Our experimental study, featuring noise levels higher than anything used in previous studies, shows that GRASP outperforms state-of-the-art methods for graph alignment across noise levels and graph types.
Deep fused two-step cross-modal hashing with multiple semantic supervision
Existing cross-modal hashing methods ignore the informative multimodal joint information and cannot fully exploit the semantic labels. In this paper, we propose a deep fused two-step cross-modal hashing (DFTH) framework with multiple semantic supervision. In the first step, DFTH learns unified hash codes for instances by a fusion network. Semantic label and similarity reconstruction have been introduced to acquire binary codes that are informative, discriminative and semantic similarity preserving. In the second step, two modality-specific hash networks are learned under the supervision of common hash codes reconstruction, label reconstruction, and intra-modal and inter-modal semantic similarity reconstruction. The modality-specific hash networks can generate semantic preserving binary codes for out-of-sample queries. To deal with the vanishing gradients of binarization, continuous differentiable tanh is introduced to approximate the discrete sign function, making the networks able to back-propagate by automatic gradient computation. Extensive experiments on MIRFlickr25K and NUS-WIDE show the superiority of DFTH over state-of-the-art methods.
Intra-class low-rank regularization for supervised and semi-supervised cross-modal retrieval
Cross-modal retrieval aims to retrieve related items across different modalities, for example, using an image query to retrieve related text. The existing deep methods ignore both the intra-modal and inter-modal intra-class low-rank structures when fusing various modalities, which decreases the retrieval performance. In this paper, two deep models (denoted as ILCMR and Semi-ILCMR) based on intra-class low-rank regularization are proposed for supervised and semi-supervised cross-modal retrieval, respectively. Specifically, ILCMR integrates the image network and text network into a unified framework to learn a common feature space by imposing three regularization terms to fuse the cross-modal data. First, to align them in the label space, we utilize semantic consistency regularization to convert the data representations to probability distributions over the classes. Second, we introduce an intra-modal low-rank regularization, which encourages the intra-class samples that originate from the same space to be more relevant in the common feature space. Third, an inter-modal low-rank regularization is applied to reduce the cross-modal discrepancy. To enable the low-rank regularization to be optimized using automatic gradients during network back-propagation, we propose the rank-r approximation and specify the explicit gradients for theoretical completeness. In addition to the three regularization terms that rely on label information incorporated by ILCMR, we propose Semi-ILCMR in the semi-supervised regime, which introduces a low-rank constraint before projecting the general representations into the common feature space. Extensive experiments on four public cross-modal datasets demonstrate the superiority of ILCMR and Semi-ILCMR over other state-of-the-art methods.
Delta-GAN-Encoder: Encoding semantic changes for explicit image editing, using few synthetic samples
Understating and controlling generative models’ latent space is a complex task. In this paper, we propose a novel method for learning to control any desired attribute in a pre-trained GAN’s latent space, for the purpose of editing synthesized and real-world data samples accordingly. We perform Sim2Real learning, relying on minimal samples to achieve an unlimited amount of continuous precise edits. We present an Autoencoder-based model that learns to encode the semantics of changes between images as a basis for editing new samples later on, achieving precise desired results – example shown in Fig. 1. While previous editing methods rely on a known structure of latent spaces (e.g., linearity of some semantics in StyleGAN), our method inherently does not require any structural constraints. We demonstrate our method in the domain of facial imagery: editing different expressions, poses, and lighting attributes, achieving state-of-the-art results.
CAT: Compression-aware training for bandwidth reduction
Convolutional neural networks (CNNs) have become the dominant neural network architecture for solving visual processing tasks. One of the major obstacles hindering the ubiquitous use of CNNs for inference is their relatively high memory bandwidth requirements, which can be a main energy consumer and throughput bottleneck in hardware accelerators. Accordingly, an efficient feature map compression method can result in substantial performance gains. Inspired by quantization-aware training approaches, we propose a compression-aware training (CAT) method that involves training the model in a way that allows better compression of feature maps during inference. Our method trains the model to achieve low-entropy feature maps, which enables efficient compression at inference time using classical transform coding methods. CAT significantly improves the state-of-the-art results reported for quantization. For example, on ResNet-34 we achieve 73.1% accuracy (0.2% degradation from the baseline) with an average representation of only 1.79 bits per value.
Self-supervised classification network
We present Self-Classifier — a novel self-supervised end-to-end classification neural network. Self-Classifier learns labels and representations simultaneously in a single-stage end-to-end manner by optimizing for same-class prediction of two augmented views of the same sample. To guarantee non-degenerate solutions (i.e., solutions where all labels are assigned to the same class), a uniform prior is asserted on the labels. We show mathematically that unlike the regular cross-entropy loss, our approach avoids such solutions. Self-Classifier is simple to implement and is scalable to practically unlimited amounts of data. Unlike other unsupervised classification approaches, it does not require any form of pre-training or the use of expectation maximization algorithms, pseudo-labelling or external clustering. Unlike other contrastive learning representation learning approaches, it does not require a memory bank or a second network. Despite its relative simplicity, our approach achieves comparable results to state-of-the-art performance with ImageNet, CIFAR10 and CIFAR100 for its two objectives: unsupervised classification and unsupervised representation learning. Furthermore, it is the first unsupervised end-to-end classification network to perform well on the large-scale ImageNet dataset. Code will be made available.
Learning to localize objects using limited annotation with applications to thoracic diseases
Motivation: The localization of objects in images is a longstanding objective within the field of image processing. Most current techniques are based on machine learning approaches, which typically require careful annotation of training samples in the form of expensive bounding box labels. The need for such large-scale annotation has only been exacerbated by the widespread adoption of deep learning techniques within the image processing community: deep learning is notoriously data-hungry. Method: In this work, we attack this problem directly by providing a new method for learning to localize objects with limited annotation: most training images can simply be annotated with their whole image labels (and no bounding box), with only a small fraction marked with bounding boxes. The training is driven by a novel loss function, which is a continuous relaxation of a well-defined discrete formulation of weakly supervised learning. Care is taken to ensure that the loss is numerically well-posed. Additionally, we propose a neural network architecture which accounts for both patch dependence, through the use of Conditional Random Field layers, and shift-invariance, through the inclusion of anti-aliasing filters. Results: We demonstrate our method on the task of localizing thoracic diseases in chest X-ray images, achieving state-of-the-art performance on the ChestX-ray14 dataset. We further show that with a modicum of additional effort our technique can be extended from object localization to object detection, attaining high quality results on the Kaggle RSNA Pneumonia Detection Challenge. Conclusion: The technique presented in this paper has the potential to enable high accuracy localization in regimes in which annotated data is either scarce or expensive to acquire. Future work will focus on applying the ideas presented in this paper to the realm of semantic segmentation.
PILOT: Physics-Informed Learned Optimal Trajectories for accelerated MRI
Magnetic Resonance Imaging (MRI) has long been considered to be among “the gold standards” of diagnostic medical imaging. The long acquisition times, however, render MRI prone to motion artifacts, let alone their adverse contribution to the relatively high costs of MRI examination. Over the last few decades, multiple studies have focused on the development of both physical and post-processing methods for accelerated acquisition of MRI scans. These two approaches, however, have so far been addressed separately. On the other hand, recent works in optical computational imaging have demonstrated growing success of the concurrent learning-based design of data acquisition and image reconstruction schemes. Such schemes have already demonstrated substantial effectiveness, leading to considerably shorter acquisition times and improved quality of image reconstruction. Inspired by this initial success, in this work, we propose a novel approach to the learning of optimal schemes for conjoint acquisition and reconstruction of MRI scans, with the optimization, carried out simultaneously with respect to the time-efficiency of data acquisition and the quality of resulting reconstructions. To be of practical value, the schemes are encoded in the form of general k-space trajectories, whose associated magnetic gradients are constrained to obey a set of predefined hardware requirements (as defined in terms of, e.g., peak currents and maximum slew rates of magnetic gradients). With this proviso in mind, we propose a novel algorithm for the end-to-end training of a combined acquisition-reconstruction pipeline using a deep neural network with differentiable forward- and backpropagation operators. We also demonstrate the effectiveness of the proposed solution in application to both image reconstruction and image segmentation, reporting substantial improvements in terms of acceleration factors as well as the quality of these end tasks.
Detector-free weakly supervised grounding by separation
Nowadays, there is an abundance of data involving images and surrounding free-form text weakly corresponding to those images. Weakly Supervised phrase-Grounding (WSG) deals with the task of using this data to learn to localize (or to ground) arbitrary text phrases in images without any additional annotations. However, most recent SotA methods for WSG assume the existence of a pre-trained object detector, relying on it to produce the ROIs for localization. In this work, we focus on the task of Detector-Free WSG (DF-WSG) to solve WSG without relying on a pre-trained detector. We directly learn everything from the images and associated free-form text pairs, thus potentially gaining an advantage on the categories unsupported by the detector. The key idea behind our proposed Grounding by Separation (GbS) method is synthesizing `text to image-regions’ associations by random alpha-blending of arbitrary image pairs and using the corresponding texts of the pair as conditions to recover the alpha map from the blended image via a segmentation network. At test time, this allows using the query phrase as a condition for a non-blended query image, thus interpreting the test image as a composition of a region corresponding to the phrase and the complement region. Using this approach we demonstrate a significant accuracy improvement, of up to 8.5% over previous DF-WSG SotA, for a range of benchmarks including Flickr30K, Visual Genome, and ReferIt, as well as a significant complementary improvement (above 7%) over the detector-based approaches for WSG.
Meeting the unmet needs of clinicians from AI systems showcased for cardiology with deep-learning-based ECG analysis
Despite their great promise, artificial intelligence (AI) systems have yet to become ubiquitous in the daily practice of medicine largely due to several crucial unmet needs of healthcare practitioners. These include lack of explanations in clinically meaningful terms, handling the presence of unknown medical conditions, and transparency regarding the system’s limitations, both in terms of statistical performance as well as recognizing situations for which the system’s predictions are irrelevant. We articulate these unmet clinical needs as machine-learning (ML) problems and systematically address them with cutting-edge ML techniques. We focus on electrocardiogram (ECG) analysis as an example domain in which AI has great potential and tackle two challenging tasks: the detection of a heterogeneous mix of known and unknown arrhythmias from ECG and the identification of underlying cardio-pathology from segments annotated as normal sinus rhythm recorded in patients with an intermittent arrhythmia. We validate our methods by simulating a screening for arrhythmias in a large-scale population while adhering to statistical significance requirements. Specifically, our system 1) visualizes the relative importance of each part of an ECG segment for the final model decision; 2) upholds specified statistical constraints on its out-of-sample performance and provides uncertainty estimation for its predictions; 3) handles inputs containing unknown rhythm types; and 4) handles data from unseen patients while also flagging cases in which the model’s outputs are not usable for a specific patient. This work represents a significant step toward overcoming the limitations currently impeding the integration of AI into clinical practice in cardiology and medicine in general.
StarNet: towards weakly supervised few-shot detection and explainable few-shot classification
In this paper, we propose a new few-shot learning method called StarNet, which is an end-to-end trainable non-parametric star-model few-shot classifier. While being meta-trained using only image-level class labels, StarNet learns not only to predict the class labels for each query image of a few-shot task, but also to localize (via a heatmap) what it believes to be the key image regions supporting its prediction, thus effectively detecting the instances of the novel categories. The localization is enabled by the StarNet’s ability to find large, arbitrarily shaped, semantically matching regions between all pairs of support and query images of a few-shot task. We evaluate StarNet on multiple few-shot classification benchmarks attaining significant state-of-the-art improvement on the CUB and ImageNetLOC-FS, and smaller improvements on other benchmarks. At the same time, in many cases, StarNet provides plausible explanations for its class label predictions, by highlighting the correctly paired novel category instances on the query and on its best matching support (for the predicted class). In addition, we test the proposed approach on the previously unexplored and challenging task of Weakly Supervised Few-Shot Object Detection (WS-FSOD), obtaining significant improvements over the baselines.
Noise estimation using density estimation for self-supervised multimodal learning
One of the key factors of enabling machine learning models to comprehend and solve real-world tasks is to leverage multimodal data. Unfortunately, the annotation of multimodal data is challenging and expensive. Recently, self-supervised multimodal methods that combine vision and language were proposed to learn multimodal representations without annotation. However, these methods choose to ignore the presence of high levels of noise and thus yield sub-optimal results. In this work, we show that the problem of noise estimation for multimodal data can be reduced to a multimodal density estimation task. Using multimodal density estimation, we propose a noise estimation building block for multimodal representation learning that is based strictly on the inherent correlation between different modalities. We demonstrate how our noise estimation can be broadly integrated and achieves comparable results to state-of-the-art performance on five different benchmark datasets for two challenging multimodal tasks: Video Question Answering and Text-To-Video Retrieval.
Digital Gimbal: End-to-end deep image stabilization with learnable exposure times
Mechanical image stabilization using actuated gimbals enables capturing long-exposure shots without suffering from blur due to camera motion. These devices, however, are often physically cumbersome and expensive, limiting their widespread use. In this work, we propose to digitally emulate a mechanically stabilized system from the input of a fast unstabilized camera. To exploit the trade-off between motion blur at long exposures and low SNR at short exposures, we train a CNN that estimates a sharp high-SNR image by aggregating a burst of noisy short-exposure frames, related by unknown motion. We further suggest learning the burst’s exposure times in an end-to-end manner, thus balancing the noise and blur across the frames. We demonstrate this method’s advantage over the traditional approach of deblurring a single image or denoising a fixed-exposure burst.
Spectral geometric matrix completion
Deep Matrix Factorization (DMF) is an emerging approach to the problem of reconstructing a matrix from a subset of its entries. Recent works have established that gradient descent applied to a DMF model induces an implicit regularization on the rank of the recovered matrix. Despite these promising theoretical results, empirical evaluation of vanilla DMF on real benchmarks exhibits poor reconstructions which we attribute to the extremely low number of samples available. We propose an explicit spectral regularization scheme that is able to make DMF models competitive on real benchmarks, while still maintaining the implicit regularization induced by gradient descent, thus enjoying the best of both worlds.
Inverse design of quantum holograms in three-dimensional nonlinear photonic crystals
We introduce a systematic approach for designing 3D nonlinear photonic crystals and pump beams for generating desired quantum correlations between structured photon-pairs. Our model is fully differentiable, allowing accurate and efficient learning and discovery of novel designs.
Early-stage neural network hardware performance analysis
UNIQ: Uniform noise injection for non-uniform quantization of neural networks
We present a novel method for training a neural network amenable to inference in low-precision arithmetic with quantized weights and activations. The training is performed in full precision with random noise injection emulating quantization noise. In order to circumvent the need to simulate realistic quantization noise distributions, the weight distributions are uniformized by a non-linear transfor- mation, and uniform noise is injected. This procedure emulates a non-uniform k-quantile quantizer at inference time, which adapts to the specific distribution of the quantized parameters. As a by-product of injecting noise to weights, we find that activations can also be quantized to as low as 8-bit with only a minor accuracy degradation. The method achieves state-of-the-art results for training low-precision networks on ImageNet. In particular, we observe no degradation in accuracy for MobileNet and ResNet-18/34/50 on ImageNet with as low as 4-bit quantization of weights. Our solution achieves the state-of-the-art results in accuracy, in the low computational budget regime, compared to similar models.
3D FLAT: Feasible Learned Acquisition Trajectories for Accelerated MRI
Magnetic Resonance Imaging (MRI) has long been considered to be among the gold standards of today’s diagnostic imaging. The most significant drawback of MRI is long acquisition times, prohibiting its use in standard practice for some applications. Compressed sensing (CS) proposes to subsample the k-space (the Fourier domain dual to the physical space of spatial coordinates) leading to significantly accelerated acquisition. However, the benefit of compressed sensing has not been fully exploited; most of the sampling densities obtained through CS do not produce a trajectory that obeys the stringent constraints of the MRI machine imposed in practice. Inspired by recent success of deep learning-based approaches for image reconstruction and ideas from computational imaging on learning-based design of imaging systems, we introduce 3D FLAT, a novel protocol for data-driven design of 3D non-Cartesian accelerated trajectories in MRI. Our proposal leverages the entire 3D k-space to simultaneously learn a physically feasible acquisition trajectory with a reconstruction method. Experimental results, performed as a proof-of-concept, suggest that 3D FLAT achieves higher image quality for a given readout time compared to standard trajectories such as radial, stack-of-stars, or 2D learned trajectories (trajectories that evolve only in the 2D plane while fully sampling along the third dimension). Furthermore, we demonstrate evidence supporting the significant benefit of performing MRI acquisitions using non-Cartesian 3D trajectories over 2D non-Cartesian trajectories acquired slice-wise.
Towards learned optimal q-space sampling in diffusion MRI
Fiber tractography is an important tool of computational neuroscience that enables reconstructing the spatial connectivity and organization of white matter of the brain. Fiber tractography takes advantage of diffusion Magnetic Resonance Imaging (dMRI) which allows measuring the apparent diffusivity of cerebral water along different spatial directions. Unfortunately, collecting such data comes at the price of reduced spatial resolution and substantially elevated acquisition times, which limits the clinical applicability of dMRI. This problem has been thus far addressed using two principal strategies. Most of the efforts have been extended towards improving the quality of signal estimation for any, yet fixed sampling scheme (defined through the choice of diffusion encoding gradients). On the other hand, optimization over the sampling scheme has also proven to be effective. Inspired by the previous results, the present work consolidates the above strategies into a unified estimation framework, in which the optimization is carried out with respect to both estimation model and sampling design concurrently. The proposed solution offers substantial improvements in the quality of signal estimation as well as the accuracy of ensuing analysis by means of fiber tractography. While proving the optimality of the learned estimation models would probably need more extensive evaluation, we nevertheless claim that the learned sampling schemes can be of immediate use, offering a way to improve the dMRI analysis without the necessity of deploying the neural network used for their estimation. We present a comprehensive comparative analysis based on the Human Connectome Project data.
Self-supervised learning for large-scale unsupervised image clustering
Unsupervised learning has always been appealing to machine learning researchers and practitioners, allowing them to avoid an expensive and complicated process of labeling the data. However, unsupervised learning of complex data is challenging, and even the best approaches show much weaker performance than their supervised counterparts. Self-supervised deep learning has become a strong instrument for representation learning in computer vision. However, those methods have not been evaluated in a fully unsupervised setting.
In this paper, we propose a simple scheme for unsupervised classification based on self-supervised representations. We evaluate the proposed approach with several recent self-supervised methods showing that it achieves competitive results for ImageNet classification (39% accuracy on ImageNet with 1000 clusters and 46% with overclustering). We suggest adding the unsupervised evaluation to a set of standard benchmarks for self-supervised learning.
Generating adversarial surfaces via band-limited perturbations
Adversarial attacks have demonstrated remarkable efﬁcacy in altering the output of a learning model by applying a minimal perturbation to the input data. While increasing attention has been placed on the image domain, however, the study of adversarial perturbations for geometric data has been notably lagging behind. In this paper, we show that effective adversarial attacks can be concocted for surfaces embedded in 3D, under weak smoothness assumptions on the perceptibility of the attack. We address the case of deformable 3D shapes in particular, and introduce a general model that is not tailored to any speciﬁc surface representation, nor does it assume access to a parametric description of the 3D object.In this context, we consider targeted and untargeted variants of the attack, demonstrating compelling results in either case. We further show how discovering adversarial examples, and then using them for adversarial training, leads to an increase in both robustness and accuracy. Our ﬁndings are conﬁrmed empirically over multiple datasets spanning different semantic classes and deformations.
Self-Supervised Object Detection and Retrieval Using Unlabeled Videos
Unlabeled video in the wild presents a valuable, yet so far unharnessed, source of information for learning vision tasks. We present the first attempt of fully self-supervised learning of object detection from subtitled videos without any manual object annotation. To this end, we use the How2 multi-modal collection of instructional videos with English subtitles. We pose the problem as learning with a weakly- and noisily-labeled data, and propose a novel training model that can confront high noise levels, and yet train a classifier to localize the object of interest in the video frames, without any manual labeling involved. We evaluate our approach on a set of 11 manually annotated objects in over 5000 frames and compare it to an existing weakly-supervised approach as baseline. Benchmark data and code will be released upon acceptance of the paper.
Data-driven prediction of embryo implantation probability using IVF time-lapse imaging
The process of fertilizing a human egg outside the body in order to help those suffering from infertility to conceive is known as in vitro fertilization (IVF). Despite being the most effective method of assisted reproductive technology (ART), the average success rate of IVF is a mere 20-40%. One step that is critical to the success of the procedure is selecting which embryo to transfer to the patient, a process typically conducted manually and without any universally accepted and standardized criteria. In this paper, we describe a novel data-driven system trained to directly predict embryo implantation probability from embryogenesis time-lapse imaging videos. Using retrospectively collected videos from 272 embryos, we demonstrate that, when compared to an external panel of embryologists, our algorithm results in a 12% increase of positive predictive value and a 29% increase of negative predictive value.
Horizontal flows and manifold stochastics in geometric deep learning
We introduce two constructions in geometric deep learning for 1) transporting orientation-dependent convolutional filters over a manifold in a continuous way and thereby defining a convolution operator that naturally incorporates the rotational effect of holonomy; and 2) allowing efficient evaluation of manifold convolution layers by sampling manifold valued random variables that center around a weighted Brownian motion maximum likelihood mean. Both methods are inspired by stochastics on manifolds and geometric statistics, and provide examples of how stochastic methods — here horizontal frame bundle flows and non-linear bridge sampling schemes, can be used in geometric deep learning. We outline the theoretical foundation of the two methods, discuss their relation to Euclidean deep networks and existing methodology in geometric deep learning, and establish important properties of the proposed constructions.
Over-parameterized models for vector fields
Vector fields arise in a variety of quantity measure and visualization techniques such as fluid flow imaging, motion estimation, deformation measures, and color imaging, leading to a better understanding of physical phenomena. Recent progress in vector field imaging technologies has emphasized the need for efficient noise removal and reconstruction algorithms. A key ingredient in the success of extracting signals from noisy measurements is prior information, which can often be represented as a parameterized model. In this work, we extend the over-parameterization variational framework in order to perform model-based reconstruction of vector fields. The over-parameterization methodology combines local modeling of the data with global model parameter regularization. By considering the vector field as a linear combination of basis vector fields and appropriate scale and rotation coefficients, the denoising problem reduces to a simpler form of coefficient recovery. We introduce two versions of the over-parameterization framework: total variation-based method and sparsity-based method, relying on the co-sparse analysis model. We demonstrate the efficiency of the proposed frameworks for two- and three-dimensional vector fields with linear and quadratic over-parameterization models.
Intrinsic multi-scale evaluation of generative models
Generative models are often used to sample high-dimensional data points from a manifold with small intrinsic dimension. Existing techniques for comparing generative models focus on global data properties such as mean and covariance; in that sense, they are extrinsic and uni-scale. We develop the first, to our knowledge, intrinsic and multi-scale method for characterizing and comparing underlying data manifolds, based on comparing all data moments by lower-bounding the spectral notion of the Gromov-Wasserstein distance between manifolds. In a thorough experimental study, we demonstrate that our method effectively evaluates the quality of generative models; further, we showcase its efficacy in discerning the disentanglement process in neural networks.
HCM: Hardware-aware complexity metric for neural network architectures
Convolutional Neural Networks (CNNs) have become common in many fields including computer vision, speech recognition, and natural language processing. Although CNN hardware accelerators are already included as part of many SoC architectures, the task of achieving high accuracy on resource-restricted devices is still considered challenging, mainly due to the vast number of design parameters that need to be balanced to achieve an efficient solution. Quantization techniques, when applied to the network parameters, lead to a reduction of power and area and may also change the ratio between communication and computation. As a result, some algorithmic solutions may suffer from lack of memory bandwidth or computational resources and fail to achieve the expected performance due to hardware constraints. Thus, the system designer and the micro-architect need to understand at early development stages the impact of their high-level decisions (e.g., the architecture of the CNN and the amount of bits used to represent its parameters) on the final product (e.g., the expected power saving, area, and accuracy). Unfortunately, existing tools fall short of supporting such decisions. This paper introduces a hardware-aware complexity metric that aims to assist the system designer of the neural network architectures, through the entire project lifetime (especially at its early stages) by predicting the impact of architectural and micro-architectural decisions on the final product. We demonstrate how the proposed metric can help evaluate different design alternatives of neural network models on resource-restricted devices such as real-time embedded systems, and to avoid making design mistakes at early stages.
Colored noise injection for training adversarially robust neural networks
Even though deep learning have shown unmatched performance on various tasks, neural networks has been shown to be vulnerable to small adversarial perturbation of the input which lead to significant performance degradation. In this work we extend the idea of adding independent Gaussian noise to weights and activation during adversarial training (PNI) to injection of colored noise for defense against common white-box and black-box attacks. We show that our approach outperforms PNI and various previous approaches in terms of adversarial accuracy on CIFAR-10 dataset. In addition, we provide an extensive ablation study of the proposed method justifying the chosen configurations.
Do we need depth in state-of-the-art face authentication?
Some face recognition methods are designed to utilize geometric features extracted from depth sensors to handle the challenges of single-image based recognition technologies. However, calculating the geometrical data is an expensive and challenging process. Here, we introduce a novel method that learns distinctive geometric features from stereo camera systems without the need to explicitly compute the facial surface or depth map. The raw face stereo images along with coordinate maps allow a CNN to learn geometric features. This way, we keep the simplicity and cost-efficiency of recognition from a single image, while enjoying the benefits of geometric data without explicitly reconstructing it. We demonstrate that the suggested method outperforms both existing single-image and explicit depth-based methods on large-scale benchmarks. We also provide an ablation study to show that the suggested method uses the coordinate maps to encode more informative features.
Robust Quantization: One Model to Rule Them All
Neural network quantization methods often involve simulating the quantization process during training. This makes the trained model highly dependent on the precise way quantization is performed. Since low-precision accelerators differ in their quantization policies and their supported mix of data-types, a model trained for one accelerator may not be suitable for another. To address this issue, we propose KURE, a method that provides intrinsic robustness to the model against a broad range of quantization implementations. We show that KURE yields a generic model that may be deployed on numerous inference accelerators without a significant loss in accuracy
Deep matrix factorization with spectral geometric regularization
We address the problem of reconstructing a matrix from a subset of its entries. Current methods, branded as geometric matrix completion, augment classical rank regularization techniques by incorporating geometric information into the solution. This information is usually provided as graphs encoding relations between rows/columns. In this work, we propose a simple spectral approach for solving the matrix completion problem, via the framework of functional maps. We introduce the zoomout loss, a multiresolution spectral geometric loss inspired by recent advances in shape correspondence, whose minimization leads to state-of-the-art results on various recommender systems datasets. Surprisingly, for some datasets, we were able to achieve comparable results even without incorporating geometric information. This puts into question both the quality of such information and current methods’ ability to use it in a meaningful and efficient way.
Code is available either as Google Colab notebook, or via https://github.com/amitboy/SGMC
Loss aware post-training quantization
Neural network quantization enables the deployment of large models on resource-constrained devices. Current post-training quantization methods fall short in terms of accuracy for INT4 (or lower) but provide reasonable accuracy for INT8 (or above). In this work, we study the effect of quantization on the structure of the loss landscape. We show that the structure is flat and separable for mild quantization, enabling straightforward post-training quantization methods to achieve good results. On the other hand, we show that with more aggressive quantization, the loss landscape becomes highly non-separable with sharp minima points, making the selection of quantization parameters more challenging. Armed with this understanding, we design a method that quantizes the layer parameters jointly, enabling significant accuracy improvement over current post-training quantization methods. Reference implementation accompanies the paper.
Smoothed inference for adversarially-trained models
Deep neural networks are known to be vulnerable to inputs with maliciously constructed adversarial perturbations aimed at forcing misclassification. We study randomized smoothing as a way to both improve performance on unperturbed data as well as increase robustness to adversarial attacks. Moreover, we extend the method proposed by arXiv:1811.09310 by adding low-rank multivariate noise, which we then use as a base model for smoothing. The proposed method achieves 58.5% top-1 accuracy on CIFAR-10 under PGD attack and outperforms previous works by 4%. In addition, we consider a family of attacks, which were previously used for training purposes in the certified robustness scheme. We demonstrate that the proposed attacks are more effective than PGD against both smoothed and non-smoothed models. Since our method is based on sampling, it lends itself well for trading-off between the model inference complexity and its performance. A reference implementation of the proposed techniques is provided.
MetAdapt: Meta-learned task-adaptive architecture for few-shot classification
Few-Shot Learning (FSL) is a topic of rapidly growing interest. Typically, in FSL a model is trained on a dataset consisting of many small tasks (meta-tasks) and learns to adapt to novel tasks that it will encounter during test time. This is also referred to as meta-learning. So far, meta-learning FSL methods have focused on optimizing parameters of pre-defined network architectures, in order to make them easily adaptable to novel tasks. Moreover, it was observed that, in general, larger architectures perform better than smaller ones up to a certain saturation point (and even degrade due to over-fitting). However, little attention has been given to explicitly optimizing the architectures for FSL, nor to an adaptation of the architecture at test time to particular novel tasks. In this work, we propose to employ tools borrowed from the Differentiable Neural Architecture Search (D-NAS) literature in order to optimize the architecture for FSL without over-fitting. Additionally, to make the architecture task adaptive, we propose the concept of `MetAdapt Controller’ modules. These modules are added to the model and are meta-trained to predict the optimal network connections for a given novel task. Using the proposed approach we observe state-of-the-art results on two popular few-shot benchmarks: miniImageNet and FC100.
Localization with limited annotation for chest X-rays
Localization of an object within an image is a common task in medical imaging. Learning to localize or detect objects typically requires the collection of data which has been labelled with bounding boxes or similar annotations, which can be very time consuming and expensive. A technique which could perform such learning with much less annotation would, therefore, be quite valuable. We present such a technique for localization with limited annotation, in which the number of images with bounding boxes can be a small fraction of the total dataset (e.g. less than 1%); all other images only possess a whole image label and no bounding box. We propose a novel loss function for tackling this problem; the loss is a continuous relaxation of a well-defined discrete formulation of weakly supervised learning and is numerically well-posed. Furthermore, we propose a new architecture which accounts for both patch dependence and shift-invariance, through the inclusion of CRF layers and anti-aliasing filters, respectively. We apply our technique to the localization of thoracic diseases in chest X-ray images and demonstrate state-of-the-art localization performance on the ChestX-ray14 dataset.
Learning beamforming in ultrasound imaging
RepMet: Representative-based metric learning for classification and one-shot object detection
Distance metric learning (DML) has been successfully applied to object classification, both in the standard regime of rich training data and in the few-shot scenario, where each category is represented by only few examples. In this work, we propose a new method for DML, featuring a joint learning of the embedding space and of the data distribution of the training categories, in a single training process. Our method improves upon leading algorithms for DML-based object classification. Furthermore, it opens the door for a new task in computer vision — a few-shot object detection, since the proposed DML architecture can be naturally embedded as the classification head of any standard object detector. In numerous experiments, we achieve state-of-the-art classification results on a variety of fine-grained datasets, and offer the community a benchmark on the few-shot detection task, performed on the Imagenet-LOC dataset.
Self-supervised learning of dense shape correspondence
We introduce the first completely unsupervised correspondence learning approach for deformable 3D shapes. Key to our model is the understanding that natural deformations (such as changes in the pose) approximately preserve the metric structure of the surface, yielding a natural criterion to drive the learning process toward distortion-minimizing predictions. On this basis, we overcome the need for annotated data and replace it with a purely geometric criterion. The resulting learning model is class-agnostic and is able to leverage any type of deformable geometric data for the training phase. In contrast to existing supervised approaches which specialize in the class seen at training time, we demonstrate stronger generalization as well as applicability to a variety of challenging settings. We showcase our method on a wide selection of correspondence benchmarks, where we outperform other methods in terms of accuracy, generalization, and efficiency.
LaSO: Label-Set Operations networks for multi-label few-shot learning
Example synthesis is one of the leading methods to tackle the problem of few-shot learning, where only a small number of samples per class are available. However, current synthesis approaches only address the scenario of a single category label per image. In this work, we propose a novel technique for synthesizing samples with multiple labels for the (yet unhandled) multi-label few-shot classification scenario. We propose to combine pairs of given examples in feature space, so that the resulting synthesized feature vectors will correspond to examples whose label sets are obtained through certain set operations on the label sets of the corresponding input pairs. Thus, our method is capable of producing a sample containing the intersection, union or set-difference of labels present in two input samples. As we show, these set operations generalize to labels unseen during training. This enables performing augmentation on examples of novel categories, thus, facilitating multi-label few-shot classifier learning. We conduct numerous experiments showing promising results for the label-set manipulation capabilities of the proposed approach, both directly (using the classification and retrieval metrics), and in the context of performing data augmentation for multi-label few-shot learning. We propose a benchmark for this new and challenging task and show that our method compares favorably to all the common baselines.
Intel RealSense SR300 Coded light depth Camera
Intel RealSense SR300 is a depth camera capable of providing a VGA-size depth map at 60 fps and 0.125mm depth resolution. In addition, it outputs an infrared VGA-resolution image and a 1080p color texture image at 30 fps.
SR300 form-factor enables it to be integrated into small consumer products and as a front-facing camera in laptops and Ultrabooks. The SR300 depth camera is based on a coded-light technology where triangulation between projected patterns and images captured by a dedicated sensor is used to produce the depth map. Each projected line is coded by a special temporal optical code, that enables a dense depth map reconstruction from its reflection. The solid mechanical assembly of the camera allows it to stay calibrated throughout temperature and pressure changes, drops, and hits. In addition, active dynamic control maintains a calibrated depth output. An extended API LibRS released with the camera allows developers to integrate the camera in various applications. Algorithms for 3D scanning, facial analysis, hand gesture recognition, and tracking are within reach for applications using the SR300. In this paper, we describe the underlying technology, hardware, and algorithms of the SR300, as well as its calibration procedure, and outline some use cases. We believe that this paper will provide a full case study of a mass-produced depth sensing product and technology.
Towards learning of filter-level heterogeneous compression of convolutional neural networks
Recently, deep learning has become a de facto standard in machine learning with convolutional neural networks (CNNs) demonstrating spectacular success on a wide variety of tasks. However, CNNs are typically very demanding computationally at inference time. One of the ways to alleviate this burden on certain hardware platforms is quantization relying on the use of low-precision arithmetic representation for the weights and the activations. Another popular method is the pruning of the number of filters in each layer. While mainstream deep learning methods train the neural networks weights while keeping the network architecture fixed, the emerging neural architecture search (NAS) techniques make the latter also amenable to training. In this paper, we formulate optimal arithmetic bit length allocation and neural network pruning as a NAS problem, searching for the configurations satisfying a computational complexity budget while maximizing the accuracy. We use a differentiable search method based on the continuous relaxation of the search space proposed by Liu et al. (2019a). We show, by grid search, that heterogeneous quantized networks suffer from a high variance which renders the benefit of the search questionable. For pruning, improvement over homogeneous cases is possible, but it is still challenging to find those configurations with the proposed method. The code is publicly available at https://github.com/yochaiz/Slimmable and https://github.com/yochaiz/darts-UNIQ.
Joint learning of Cartesian undersampling and reconstruction for accelerated MRI
Magnetic Resonance Imaging (MRI) is considered today the golden-standard modality for soft tissues. The long acquisition times, however, make it more prone to motion artifacts as well as contribute to the relatively high costs of this examination. Over the years, multiple studies concentrated on designing reduced measurement schemes and image reconstruction schemes for MRI, however, these problems have been so far addressed separately. On the other hand, recent works in optical computational imaging have demonstrated growing success of the simultaneous learning-based design of the acquisition and reconstruction schemes manifesting significant improvement in the reconstruction quality with a constrained time budget. Inspired by these successes, in this work, we propose to learn accelerated MR acquisition schemes (in the form of Cartesian trajectories) jointly with the image reconstruction operator. To this end, we propose an algorithm for training the combined acquisition-reconstruction pipeline end-to-end in a differentiable way. We demonstrate the significance of using the learned Cartesian trajectories at different speed up rates.
Feature map transform coding for energy-efficient CNN inference
Convolutional neural networks (CNNs) achieve state-of-the-art accuracy in a variety of tasks in computer vision and beyond. One of the major obstacles hindering the ubiquitous use of CNNs for inference on low-power edge devices is their relatively high computational complexity and memory bandwidth requirements. The latter often dominates the energy footprint on modern hardware. In this paper, we introduce a lossy transform coding approach, inspired by image and video compression, designed to reduce the memory bandwidth due to the storage of intermediate activation calculation results. Our method exploits the high correlations between feature maps and adjacent pixels and allows to halve the data transfer volumes to the main memory without re-training. We analyze the performance of our approach on a variety of CNN architectures and demonstrated FPGA implementation of ResNet18 with our approach results in a reduction of around 40% in the memory energy footprint compared to quantized network with negligible impact on accuracy. A reference implementation accompanies the paper.
Baby steps towards few-shot learning with multiple semantics
Learning from one or few visual examples is one of the key capabilities of humans since early infancy, but is still a significant challenge for modern AI systems. While considerable progress has been achieved in few-shot learning from a few image examples, much less attention has been given to the verbal descriptions that are usually provided to infants when they are presented with a new object. In this paper, we focus on the role of additional semantics that can significantly facilitate few-shot visual learning. Building upon recent advances in few-shot learning with additional semantic information, we demonstrate that further improvements are possible using richer semantics and multiple semantic sources. Using these ideas, we offer the community a new result on the one-shot test of the popular miniImageNet benchmark, comparing favorably to the previous state-of-the-art results for both visual only and visual plus semantics-based approaches. We also performed an ablation study investigating the components and design choices of our approach.
Correspondence-free region localization for partial shape similarity via Hamiltonian spectrum alignment
We consider the problem of localizing relevant subsets of non-rigid geometric shapes given only a partial 3D query as the input. Such problems arise in several challenging tasks in 3D vision and graphics, including partial shape similarity, retrieval, and non-rigid correspondence. We phrase the problem as one of alignment between short sequences of eigenvalues of basic differential operators, which are constructed upon a scalar function defined on the 3D surfaces. Our method therefore seeks for a scalar function that entails this alignment. Differently from existing approaches, we do not require solving for a correspondence between the query and the target, therefore greatly simplifying the optimization process; our core technique is also descriptor-free, as it is driven by the geometry of the two objects as encoded in their operator spectra. We further show that our spectral alignment algorithm provides a remarkably simple alternative to the recent shape-from-spectrum reconstruction approaches. For both applications, we demonstrate improvement over the state-of-the-art either in terms of accuracy or computational cost.
Self-supervised learning of inverse problem solvers in medical imaging
In the past few years, deep learning-based methods have demonstrated enormous success for solving inverse problems in medical imaging. In this work, we address the following question: Given a set of measurements obtained from real imaging experiments, what is the best way to use a learnable model and the physics of the modality to solve the inverse problem and reconstruct the latent image? Standard supervised learning based methods approach this problem by collecting data sets of known latent images and their corresponding measurements. However, these methods are often impractical due to the lack of availability of appropriately sized training sets, and, more generally, due to the inherent difficulty in measuring the “groundtruth” latent image. In light of this, we propose a self-supervised approach to training inverse models in medical imaging in the absence of aligned data. Our method only requiring access to the measurements and the forward model at training. We showcase its effectiveness on inverse problems arising in accelerated magnetic resonance imaging (MRI).
Beholder-GAN: Generation and beautification of facial images with conditioning on their beauty level
Beauty is in the eye of the beholder. This maxim, emphasizing the subjectivity of the perception of beauty, has enjoyed a wide consensus since ancient times. In the digital era, data-driven methods have been shown to be able to predict human-assigned beauty scores for facial images. In this work, we augment this ability and train a generative model that generates faces conditioned on a requested beauty score. In addition, we show how this trained generator can be used to beautify an input face image. By doing so, we achieve an unsupervised beautification model, in the sense that it relies on no ground truth target images.
DIMAL: Deep isometric manifold learning using sparse geodesic sampling
This paper explores a fully unsupervised deep learning approach for computing distance-preserving maps that generate low-dimensional embeddings for a certain class of manifolds. We use the Siamese configuration to train a neural network to solve the problem of least squares multidimensional scaling for generating maps that approximately preserve geodesic distances. By training with only a few landmarks, we show a significantly improved local and nonlocal generalization of the isometric mapping as compared to analogous non-parametric counterparts. Importantly, the combination of a deep-learning framework with a multidimensional scaling objective enables a numerical analysis of network architectures to aid in understanding their representation power. This provides a geometric perspective to the generalizability of deep learning.
Partial single- and multi-shape dense correspondence using functional maps
Shape correspondence is a fundamental problem in computer graphics and vision, with applications in various problems including animation, texture mapping, robotic vision, medical imaging, archaeology and many more. In settings where the shapes are allowed to undergo non-rigid deformations and only partial views are available, the problem becomes very challenging. In this chapter we describe recent techniques designed to tackle such problems. Specifically, we explain how the renown functional maps framework can be extended to tackle the partial setting. We then present a further extension to the mutli-part case in which one tries to establish correspondence between a collection of shapes. Finally, we focus on improving the technique efficiency, by disposing of its spatial ingredient and thus keeping the computation in the spectral domain. Extensive experimental results are provided along with the theoretical explanations, to demonstrate the effectiveness of the described methods in these challenging scenarios.
The various multidimensional scaling models can be broadly classified into metric vs. non-metric, and strain (classical scaling) vs. stress (distance scaling) based MDS models. In metric MDS the goal is to maintain the distances in the embedding space as close as possible to the given dissimilarities, while in nonmetric MDS only the order relations between the dissimilarities are important. Strain-based MDS is an algebraic version of the problem that can be solved by eigenvalue decomposition. Stress-based MDS uses a geometric distortion criterion which results in a non-linear and non-convex optimization problem. Each of these models has its own merits and drawbacks, both numerically and application-wise. On top of these basic models, there exist numerous generalizations, including embedding into non-Euclidean domains, working with different stress models, working in different subspaces, and incorporating machine learning approaches to obtain faster, more accurate and more robust embeddings. This chapter reviews these models, with emphasis on their role in computer vision applications.
∆-encoder: an effective sample synthesis method for few-shot object recognition
Learning to classify new categories based on just one or a few examples is a long-standing challenge in modern computer vision. In this work, we propose a simple yet effective method for few-shot (and one-shot) object recognition. Our approach is based on a modified auto-encoder, denoted ∆-encoder, that learns to synthesize new samples for an unseen category just by seeing few examples from it. The synthesized samples are then used to train a classifier. The proposed approach learns to both extract transferable intra-class deformations, or “deltas”, between same-class pairs of training examples, and to apply those deltas to the few provided examples of a novel class (unseen during training) in order to efficiently synthesize samples from that new class. The proposed method improves over the state-of-the-art in one-shot object-recognition and compares favorably in the few-shot case.
Functional maps representation on product manifolds
We consider the tasks of representing, analyzing and manipulating maps between shapes. We model maps as densities over the product manifold of the input shapes; these densities can be treated as scalar functions and therefore are manipulable using the language of signal processing on manifolds. Being a manifold itself, the product space endows the set of maps with a geometry of its own, which we exploit to define map operations in the spectral domain; we also derive relationships with other existing representations (soft maps and functional maps). To apply these ideas in practice, we discretize product manifolds and their Laplace-Beltrami operators, and we introduce localized spectral analysis of the product manifold as a novel tool for map processing. Our framework applies to maps defined between and across 2D and 3D shapes without requiring special adjustment, and it can be implemented efficiently with simple operations on sparse matrices.
NICE: noise injection and clamping estimation for neural network quantization
Convolutional Neural Networks (CNN) are very popular in many fields including computer vision, speech recognition, natural language processing, to name a few. Though deep learning leads to groundbreaking performance in these domains, the networks used are very demanding computationally and are far from real-time even on a GPU, which is not power efficient and therefore does not suit low power systems such as mobile devices. To overcome this challenge, some solutions have been proposed for quantizing the weights and activations of these networks, which accelerate the runtime significantly. Yet, this acceleration comes at the cost of a larger error. The uniqname method proposed in this work trains quantized neural networks by noise injection and a learned clamping, which improve the accuracy. This leads to state-of-the-art results on various regression and classification tasks, e.g., ImageNet classification with architectures such as ResNet-18/34/50 with low as 3-bit weights and activations. We implement the proposed solution on an FPGA to demonstrate its applicability for low power real-time applications.
ForestHash: Semantic hashing with shallow random forests and tiny convolutional networks
Hash codes are efficient data representations for coping with the ever growing amounts of data. In this paper, we introduce a random forest semantic hashing scheme that embeds tiny convolutional neural networks (CNN) into shallow random forests, with near-optimal information-theoretic code aggregation among trees. We start with a simple hashing scheme, where random trees in a forest act as hashing functions by setting `1′ for the visited tree leaf, and `0′ for the rest. We show that traditional random forests fail to generate hashes that preserve the underlying similarity between the trees, rendering the random forests approach to hashing challenging. To address this, we propose to first randomly group arriving classes at each tee split node into two groups, obtaining a significantly simplified two-class classification problem, which can be handled using a light-weight CNN weak learner. Such random class grouping scheme enables code uniqueness by enforcing each class to share its code with different classes in different trees. A non-conventional low-rank loss is further adopted for the CNN weak learners to encourage code consistency by minimizing intra-class variations and maximizing inter-class distance for the two random class groups. Finally, we introduce an information-theoretic approach for aggregating codes of individual trees into a single hash code, producing a near-optimal unique hash for each class. The proposed approach significantly outperforms state-of-the-art hashing methods for image retrieval tasks on large-scale public datasets, while performing at the level of other state-of-the-art image classification techniques while utilizing a more compact and efficient scalable representation. This work proposes a principled and robust procedure to train and deploy in parallel an ensemble of light-weight CNNs, instead of simply going deeper.
Class-aware fully-convolutional Gaussian and Poisson denoising
We propose a fully-convolutional neural-network architecture for image denoising which is simple yet powerful. Its structure allows to exploit the gradual nature of the denoising process, in which shallow layers handle local noise statistics, while deeper layers recover edges and enhance textures. Our method advances the state-of-the-art when trained for different noise levels and distributions (both Gaussian and Poisson). In addition, we show that making the denoiser class-aware by exploiting semantic class information boosts performance, enhances textures and reduces artifacts.
NetLSD: Hearing the shape of a graph
Comparison among graphs is ubiquitous in graph analytics. However, it is a hard task in terms of the expressiveness of the employed similarity measure and the efficiency of its computation. Ideally, graph comparison should be invariant to the order of nodes and the sizes of compared graphs, adaptive to the scale of graph patterns, and scalable. Unfortunately, these properties have not been addressed together. Graph comparisons still rely on direct approaches, graph kernels, or representation-based methods, which are all inefficient and impractical for large graph collections. In this paper, we propose the Network Laplacian Spectral Descriptor (NetLSD): the first, to our knowledge, permutation- and size-invariant, scale-adaptive, and efficiently computable graph representation method that allows for straightforward comparisons of large graphs. NetLSD extracts a compact signature that inherits the formal properties of the Laplacian spectrum, specifically its heat or wave kernel; thus, it hears the shape of a graph. Our evaluation on a variety of real-world graphs demonstrates that it outperforms previous works in both expressiveness and efficiency.
High frame-rate cardiac ultrasound imaging with deep learning
Cardiac ultrasound imaging requires a high frame rate in order to capture rapid motion. This can be achieved by multi-line acquisition (MLA), where several narrow-focused received lines are obtained from each wide-focused transmitted line. This shortens the acquisition time at the expense of introducing block artifacts. In this paper, we propose a data-driven learning-based approach to improve the MLA image quality. We train an end-to-end convolutional neural network on pairs of real ultrasound cardiac data, acquired through MLA and the corresponding single-line acquisition (SLA). The network achieves a significant improvement in image quality for both 5- and 7-line MLA resulting in a decorrelation measure similar to that of SLA while having the frame rate of MLA.
High quality ultrasonic multi-line transmission through deep learning
Frame rate is a crucial consideration in cardiac ultrasound imaging and 3D sonography. Several methods have been proposed in the medical ultrasound literature aiming at accelerating the image acquisition. In this paper, we consider one such method called multi-line transmission (MLT), in which several evenly separated focused beams are transmitted simultaneously. While MLT reduces the acquisition time, it comes at the expense of a heavy loss of contrast due to the interactions between the beams (cross-talk artifact). In this paper, we introduce a data-driven method to reduce the artifacts arising in MLT. To this end, we propose to train an end-to-end convolutional neural network consisting of correction layers followed by a constant apodization layer. The network is trained on pairs of raw data obtained through MLT and the corresponding single-line transmission (SLT) data. Experimental evaluation demonstrates signicant improvement both in the visual image quality and in objective measures such as contrast ratio and contrast-to-noise ratio, while preserving resolution unlike traditional apodization-based methods. We show that the proposed method is able to generalize
well across dierent patients and anatomies on real and phantom data.
SGR: Self-supervised spectral graph representation learning
Representing a graph as a vector is a challenging task; ideally, the representation should be easily computable and conducive to efficient comparisons among graphs, tailored to the particular data and an analytical task at hand. Unfortunately, a “one-size-fits-all” solution is unattainable, as different analytical tasks may require different attention to global or local graph features. We develop SGR, the first, to our knowledge, method for learning graph representations in a self-supervised manner. Grounded on spectral graph analysis, SGR seamlessly combines all aforementioned desirable properties. In extensive experiments, we show how our approach works on large graph collections, facilitates self-supervised representation learning across a variety of application domains, and performs competitively to state-of-the-art methods without re-training.
DeepISP: Towards learning an end-to-end image processing pipeline
We present DeepISP, a full end-to-end deep neural model of the camera image signal processing (ISP) pipeline. Our model learns a mapping from the raw low-light mosaiced image to the final visually compelling image and encompasses low-level tasks such as demosaicing and denoising as well as higher-level tasks such as color correction and image adjustment. The training and evaluation of the pipeline were performed on a dedicated dataset containing pairs of low-light and well-lit images captured by a Samsung S7 smartphone camera in both raw and processed JPEG formats. The proposed solution achieves state-of-the-art performance in the objective evaluation of PSNR on the subtask of joint denoising and demosaicing. For the full end-to-end pipeline, it achieves better visual quality compared to the manufacturer ISP, in both a subjective human assessment and when rated by a deep model trained for assessing image quality.
Depth estimation from a single image using deep learned phase coded mask
Depth estimation from a single image is a well-known challenge in computer vision. With the advent of deep learning, several approaches for monocular depth estimation have been proposed, all of which have inherent limitations due to the scarce depth cues that exist in a single image. Moreover, these methods are very demanding computationally, which makes them inadequate for systems with limited processing power. In this paper, a phase-coded aperture camera for depth estimation is proposed. The camera is equipped with an optical phase mask that provides unambiguous depth-related color characteristics for the captured image. These are used for estimating the scene depth map using a fully convolutional neural network. The phase-coded aperture structure is learned jointly with the network weights using backpropagation. The strong depth cues (encoded in the image by the phase mask, designed together with the network weights) allow a much simpler neural network architecture for faster and more accurate depth estimation. Performance achieved on simulated images as well as on a real optical setup is superior to the state-of-the-art monocular depth estimation methods (both with respect to the depth accuracy and required processing power), and is competitive with more complex and expensive depth estimation methods such as light-field cameras.
Passive electric impedance tomography
We introduce an electric impedance tomography modality without any active current injection. By loading the probe electrodes with a time-varying network of impedances, the proposed technique exploits electrical fields existing in the medium due to biological activity or EM interference from the environment or an implantable device. A phantom validation of the technique is presented.
Printable anisotropic phantom for EEG with distributed current sources
We introduce an electric impedance tomography modality without any active current injection. By loading the probe electrodes with a time-varying network of impedances, the proposed technique exploits electrical fields existing in the medium due to biological activity or EM interference from the environment or an implaPresented is the phantom mimicking the electromagnetic properties of the human head. The fabrication is based on the additive manufacturing (3d-printing) technology combined with the electrically conductive gel. The novel key features of the phantom are the controllable anisotropic electrical conductivity of the skull and the densely packed actively multiplexed monopolar current sources permitting interpolation of the measured gain function to any dipolar current source position and orientation within the head. The phantom was tested in realistic environment successfully simulating the possible signals from neural activations situated at any depth within the brain as well as EMI and motion artifacts. The proposed design can be readily repeated in any lab having an access to a standard 100 micron precision 3d-printer. The meshes of the phantom are available from the corresponding author.ntable device. A phantom validation of the technique is presented.
VibroEEG: Improved EEG source reconstruction by combined acoustic-electric imaging
Electroencephalography (EEG) is the electrical neural activity recording modality with high temporal and low spatial resolution. Here we propose a novel technique that we call vibroEEG improving significantly the source localization accuracy of EEG. Our method combines electric potential acquisition in concert with acoustic excitation of the vibrational modes of the electrically active cerebral cortex which displace periodically the sources of the low frequency neural electrical activity. The sources residing on the maxima of the induced modes will be maximally weighted in the corresponding spectral components of the broadband signals measured on the noninvasive electrodes. In vibroEEG, for the first time the rich internal geometry of the cerebral cortex can be utilized to separate sources of neural activity lying close in the sense of the Euclidean metric. When the modes are excited locally using phased arrays the neural activity can essentially be probed at any cortical location. When a single transducer is used to induce the excitations, the EEG gain matrix is still being enriched with numerous independent gain vectors increasing its rank. We show theoretically and on numerical simulation that in both cases the source localization accuracy improves substantially.
Streaming architectures for large-scale quantized neural networks on an FPGA-based dataflow platform
Deep neural networks (DNNs) are used by different applications that are executed on a range of computer architectures, from IoT devices to supercomputers. The footprint of these networks is huge as well as their computational and communication needs. In order to ease the pressure on resources, research indicates that in many cases a low precision representation (1-2 bit per parameter) of weights and other parameters can achieve similar accuracy while requiring less resources. Using quantized values enables the use of FPGAs to run NNs, since FPGAs are well fitted to these primitives; e.g., FPGAs provide efficient support for bitwise operations and can work with arbitrary-precision representation of numbers. This paper presents a new streaming architecture for running QNNs on FPGAs. The proposed architecture scales out better than alternatives, allowing us to take advantage of systems with multiple FPGAs. We also included support for skip connections, that are used in state-of-the art NNs, and shown that our architecture allows to add those connections almost for free. All this allowed us to implement an 18-layer ResNet for 224×224 images classification, achieving 57.5% top-1 accuracy. In addition, we implemented a full-sized quantized AlexNet. In contrast to previous works, we use 2-bit activations instead of 1-bit ones, which improves AlexNet’s top-1 accuracy from 41.8% to 51.03% for the ImageNet classification. Both AlexNet and ResNet can handle 1000-class real-time classification on an FPGA. Our implementation of ResNet-18 consumes 5× less power and is 4× slower for ImageNet, when compared to the same NN on the latest Nvidia GPUs. Smaller NNs, that fit a single FPGA, are running faster then on GPUs on small (32×32) inputs, while consuming up to 20× less energy and power.
Tradeoffs between convergence speed and reconstruction accuracy in inverse problems
Solving inverse problems with iterative algorithms is popular, especially for large data. Due to time constraints, the number of possible iterations is usually limited, potentially affecting the achievable accuracy. Given an error one is willing to tolerate, an important question is whether it is possible to modify the original iterations to obtain faster convergence to a minimizer achieving the allowed error without increasing the computational cost of each iteration considerably. Relying on recent recovery techniques developed for settings in which the desired signal belongs to some low-dimensional set, we show that using a coarse estimate of this set may lead to faster convergence at the cost of an additional reconstruction error related to the accuracy of the set approximation. Our theory ties to recent advances in sparse recovery, compressed sensing, and deep learning. Particularly, it may provide a possible explanation to the successful approximation of the L1-minimization solution by neural networks with layers representing iterations, as practiced in the learned iterative shrinkage-thresholding algorithm.
Towards CT-quality ultrasound imaging using deep learning
The cost-effectiveness and practical harmlessness of ultra- sound imaging have made it one of the most widespread tools for medical diagnosis. Unfortunately, the beam-forming based image formation produces granular speckle noise, blur- ring, shading and other artifacts. To overcome these effects, the ultimate goal would be to reconstruct the tissue acoustic properties by solving a full wave propagation inverse prob- lem. In this work, we make a step towards this goal, using Multi-Resolution Convolutional Neural Networks (CNN). As a result, we are able to reconstruct CT-quality images from the reflected ultrasound radio-frequency(RF) data obtained by simulation from real CT scans of a human body. We also show that CNN is able to imitate existing computationally heavy despeckling methods, thereby saving orders of magni- tude in computations and making them amenable to real-time applications.
Deep Functional Maps: Structured prediction for dense shape correspondence
We introduce a new framework for learning dense correspondence between deformable 3D shapes. Existing learning based approaches model shape correspondence as a labelling problem, where each point of a query shape receives a label identifying a point on some reference domain; the correspondence is then constructed a posteriori by composing the label predictions of two input shapes. We propose a paradigm shift and design a structured prediction model in the space of functional maps, linear operators that provide a compact representation of the correspondence. We model the learning process via a deep residual network which takes dense descriptor fields defined on two shapes as input, and outputs a soft map between the two given objects. The resulting correspondence is shown to be accurate on several challenging benchmarks comprising multiple categories, synthetic models, real scans with acquisition artifacts, topological noise, and partiality.
Efficient deformable shape correspondence via kernel matching
We present a method to match three dimensional shapes under non-isometric deformations, topology changes and partiality. We formulate the problem as matching between a set of pair-wise and point-wise descriptors, imposing a continuity prior on the mapping, and propose a projected descent optimization procedure inspired by difference of convex functions (DC) programming. Surprisingly, in spite of the highly non-convex nature of the resulting quadratic assignment problem, our method converges to a semantically meaningful and continuous mapping in most of our experiments, and scales well. We provide preliminary theoretical analysis and several interpretations of the method.
White matter fiber representation using continuous dictionary learning
With increasingly sophisticated Diffusion Weighted MRI acquisition methods and modelling techniques, very large sets of streamlines (fibers) are presently generated per imaged brain. These reconstructions of white matter architecture, which are important for human brain research and pre-surgical planning, require a large amount of storage and are often unwieldy and difficult to manipulate and analyze. This work proposes a novel continuous parsimonious framework in which signals are sparsely represented in a dictionary with continuous atoms. The significant innovation in our new methodology is the ability to train such continuous dictionaries, unlike previous approaches that either used pre-fixed continuous transforms or training with finite atoms. This leads to an innovative fiber representation method, which uses Continuous Dictionary Learning to sparsely code each fiber with high accuracy. This method is tested on numerous tractograms produced from the Human Connectome Project data and achieves state-of-the-art performances in compression ratio and reconstruction error.
Product Manifold Filter: Non-rigid shape correspondence via kernel density estimation in the product space
Many algorithms for the computation of correspondences between deformable shapes rely on some variant of nearest neighbor matching in a descriptor space. Such are, for example, various point-wise correspondence recovery algorithms used as a post-processing stage in the functional correspondence framework. Such frequently used techniques implicitly make restrictive assumptions (e.g., near-isometry) on the considered shapes and in practice suffer from a lack of accuracy and result in poor surjectivity. We propose an alternative recovery technique capable of guaranteeing a bijective correspondence and producing significantly higher accuracy and smoothness. Unlike other methods, our approach does not depend on the assumption that the analyzed shapes are isometric. We derive the proposed method from the statistical framework of kernel density estimation and demonstrate its performance on several challenging deformable 3D shape matching datasets.
Fully spectral partial shape matching
We propose an efficient procedure for calculating partial dense intrinsic correspondence between deformable shapes performed entirely in the spectral domain. Our technique relies on the recently introduced partial functional maps formalism and on the joint approximate diagonalization (JAD) of the Laplace-Beltrami operators previously introduced for matching non-isometric shapes. We show that a variant of the JAD problem with an appropriately modified coupling term (surprisingly) allows to construct quasi-harmonic bases localized on the latent corresponding parts. This circumvents the need to explicitly compute the unknown parts by means of the cumbersome alternating minimization used in the previous approaches, and allows performing all the calculations in the spectral domain with constant complexity independent of the number of shape vertices. We provide an extensive evaluation of the proposed technique on standard non-rigid correspondence benchmarks and show state-of-the-art performance in various settings, including partiality and the presence of topological noise.
Subspace least squares multidimensional scaling
Multidimensional Scaling (MDS) is one of the most popular methods for dimensionality reduction and visualization of high dimensional data. Apart from these tasks, it also found applications in the field of geometry processing for the analysis and reconstruction of non-rigid shapes. In this regard, MDS can be thought of as a shape from metric algorithm, consisting of finding a configuration of points in the Euclidean space that realize, as isometrically as possible, some given distance structure. In the present work we cast the least squares variant of MDS (LS-MDS) in the spectral domain. This uncovers a multiresolution property of distance scaling which speeds up the optimization by a significant amount, while producing comparable, and sometimes even better, embeddings.
Deep class-aware image denoising
The increasing demand for high image quality in mobile devices brings forth the need for better computational enhancement techniques, and image denoising in particular. To this end, we propose a new fully convolutional deep neural network architecture which is simple yet powerful and achieves state-of-the-art performance for additive Gaussian noise removal. Furthermore, we claim that the personal photo-collections can usually be categorized into a small set of semantic classes. However simple, this observation has not been exploited in image denoising until now. We show that a significant boost in performance of up to 0.4dB PSNR can be achieved by making our network class-aware, namely, by fine-tuning it for images belonging to a specific semantic class. Relying on the hugely successful existing image classifiers, this research advocates for using a class-aware approach in all image enhancement tasks.
Cloud Dictionary: Sparse coding and modeling for point clouds
With the development of range sensors such as LIDAR and time-of-flight cameras, 3D point cloud scans have become ubiquitous in computer vision applications, the most prominent ones being gesture recognition and autonomous driving. Parsimony-based algorithms have shown great success on images and videos where data points are sampled on a regular Cartesian grid. We propose an adaptation of these techniques to irregularly sampled signals by using continuous dictionaries. We present an example application in the form of point cloud denoising.
Deep class-aware denoising
The increasing demand for high image quality in mobile devices brings forth the need for better computational enhancement techniques, and image denoising in particular. At the same time, the images captured by these devices can be categorized into a small set of semantic classes. However simple, this observation has not been exploited in image denoising until now. In this paper, we demonstrate how the reconstruction quality improves when a denoiser is aware of the type of content in the image. To this end, we first propose a new fully convolutional deep neural network architecture which is simple yet powerful as it achieves state-of-the-art performance even without be- ing class-aware. We further show that a significant boost in performance of up to 0.4 dB PSNR can be achieved by making our network class-aware, namely, by fine-tuning it for images belonging to a specific semantic class. Relying on the hugely successful existing image classifiers, this research advocates for using a class-aware approach in all image enhancement tasks.
Deep convolutional denoising of low-light images
Poisson distribution is used for modeling noise in photon-limited imaging. While canonical examples include relatively exotic types of sensing like spectral imaging or astronomy, the problem is relevant to regular photography now more than ever due to the booming market for mobile cameras. Restricted form factor limits the amount of absorbed light, thus computational post-processing is called for. In this paper, we make use of the powerful framework of deep convolutional neural networks for Poisson denoising. We demonstrate how by training the same network with images having a specific peak value, our denoiser outperforms previous state-of-the-art by a large margin both visually and quantitatively. Being flexible and data-driven, our solution resolves the heavy ad hoc engineering used in previous methods and is an order of magnitude faster. We further show that by adding a reasonable prior on the class of the image being processed, another significant boost in performance is achieved.
ASIST: Automatic Semantically Invariant Scene Transformation
We present ASIST, a technique for transforming point clouds by replacing objects with their semantically equivalent counterparts. Transformations of this kind have applications in virtual reality, repair of fused scans, and robotics. ASIST is based on a unified formulation of semantic labeling and object replacement; both result from minimizing a single objective. We present numerical tools for the efficient solution of this optimization problem. The method is experimentally assessed on new datasets of both synthetic and real point clouds, and is additionally compared to two recent works on object replacement on data from the corresponding papers.
Computing and processing correspondences with functional maps
Notions of similarity and correspondence between geometric shapes and images are central to many tasks in geometry processing, computer vision, and computer graphics. The goal of this course is to familiarize the audience with a set of recent techniques that greatly facilitate the computation of mappings or correspondences between geometric datasets, such as 3D shapes or 2D images by formulating them as mappings between functions rather than points or triangles. Methods based on the functional map framework have recently led to state-of-the-art results in problems as diverse as non-rigid shape matching, image co-segmentation and even some aspects of tangent vector field design. One challenge in adopting these methods in practice, however, is that their exposition often assumes a significant amount of background in geometry processing, spectral methods and functional analysis, which can make it difficult to gain an intuition about their performance or about their applicability to real-life problems. In this course, we try to provide all the tools necessary to appreciate and use these techniques, while assuming very little background knowledge. We also give a unifying treatment of these techniques, which may be difficult to extract from the individual publications and, at the same time, hint at the generality of this point of view, which can help tackle many problems in the analysis and creation of visual content. This course is structured as a half day course. We will assume that the participants have knowledge of basic linear algebra and some knowledge of differential geometry, to the extent of being familiar with the concepts of a manifold and a tangent vector space. We will discuss in detail the functional approach to finding correspondences between non-rigid shapes, the design and analysis of tangent vector fields on surfaces, consistent map estimation in networks of shapes and applications to shape and image segmentation, shape variability analysis, and other areas.
Hamiltonian operator for spectral shape analysis
Many shape analysis methods treat the geometry of an object as a metric space that can be captured by the Laplace-Beltrami operator. In this paper, we propose to adapt the classical Hamiltonian operator from quantum me- chanics to the field of shape analysis. To this end we study the addition of a potential function to the Laplacian as a generator for dual spaces in which shape processing is performed. We present a general optimization approach for solving variational problems involving the basis defined by the Hamilto- nian using perturbation theory for its eigenvectors. The suggested operator is shown to produce better functional spaces to operate with, as demon- strated on different shape analysis tasks.
Consistent discretization and minimization of the L1 norm on manifolds
The L1 norm has been tremendously popular in signal and image processing in the past two decades due to its sparsity-promoting properties. More recently, its generalization to non-Euclidean domains has been found useful in shape analysis applications. For example, in conjunction with the minimization of the Dirichlet energy, it was shown to produce a compactly supported quasi-harmonic orthonormal basis, dubbed as compressed manifold modes. The continuous L1 norm on the manifold is often replaced by the vector l1 norm applied to sampled functions. We show that such an approach is incorrect in the sense that it does not consistently discretize the continuous norm and warn against its sensitivity to the specific sampling. We propose two alternative discretizations resulting in an iteratively-reweighed l2 norm. We demonstrate the proposed strategy on the compressed modes problem, which reduces to a sequence of simple eigendecomposition problems not requiring non-convex optimization on Stiefel manifolds and producing more stable and accurate results.
SpectroMeter: Amortized sublinear spectral approximation of distance on graphs
We present a method to approximate pairwise distance on a graph, having an amortized sub-linear complexity in its size. The proposed method follows the so-called heat method due to Crane et al. The only additional input is the values of the eigenfunctions of the graph Laplacian at a subset of the vertices. Using these values we estimate a random walk from the source points, and normalize the result into a unit gradient function. The eigenfunctions are then used to synthesize distance values abiding by these constraints at desired locations. We show that this method works in practice on different types of inputs ranging from triangular meshes to general graphs. We also demonstrate that the resulting approximate distance is accurate enough to be used as the input to a recent method for intrinsic shape correspondence computation.
FPGA system for real-time computational extended depth of field imaging using phase aperture coding
We present a proof-of-concept end-to-end system for computational extended depth of field (EDOF) imaging. The acquisition is performed through a phase-coded aperture implemented by placing a thin wavelength-dependent op- tical mask inside the pupil of a conventional camera lens, as a result of which, each color channel is focused at a different depth. The reconstruction process re- ceives the raw Bayer image as the input, and performs blind estimation of the output color image in focus at an extended range of depths using a patch-wise sparse prior. We present a fast non-iterative reconstruction algorithm operating with constant latency in fixed-point arithmetics and achieving real-time perfor- mance in a prototype FPGA implementation. The output of the system, on simu- lated and real-life scenes, is qualitatively and quantitatively better than the result of clear-aperture imaging followed by state-of-the-art blind deblurring.
Deep neural networks with random Gaussian weights: A universal classification strategy?
Three important properties of a classification machinery are: (i) the system preserves the important information of the input data; (ii) the training examples convey information for unseen data; and (iii) the system is able to treat differently points from different classes. In this work, we show that these fundamental properties are inherited by the architecture of deep neural networks. We formally prove that these networks with random Gaussian weights perform a distance-preserving embedding of the data, with a special treatment for in-class and out-of-class data. Similar points at the input of the network are likely to have the same The theoretical analysis of deep networks here presented exploits tools used in the compressed sensing and dictionary learning literature, thereby making a formal connection between these important topics. The derived results allow drawing conclusions on the metric learning properties of the network and their relation to its structure; and provide bounds on the required size of the training set such that the training examples would represent faithfully the unseen data. The results are validated with state-of-the-art trained networks.
Shape correspondence is a fundamental problem in computer graphics and vision, with applications in various problems including animation, texture mapping, robotic vision, medical imaging, archaeology and many more. In settings where the shapes are allowed to undergo non-rigid deformations and only partial views are available, the problem becomes very challenging. To this end, we present a non-rigid multi-part shape matching algorithm. We assume to be given a reference shape and its multiple parts undergoing a non-rigid deformation. Each of these query parts can be additionally contaminated by clutter, may overlap with other parts, and there might be missing parts or redundant ones. Our method simultaneously solves for the segmentation of the reference model, and for a dense correspondence to (subsets of) the parts. Experimental results on synthetic as well as real scans demonstrate the effectiveness of our method in dealing with this challenging matching scenario.
Sparsity and nullity: paradigms for analysis dictionary learning
Sparse models in dictionary learning have been successfully applied in a wide variety of machine learning and computer vision problems, and as a result, have recently attracted increased research interest. Another interesting related problem based on linear equality constraints, namely the sparse null space (SNS) problem, first appeared in 1986 and has since inspired results on sparse basis pursuit. In this paper, we investigate the relation between the SNS problem and the analysis dictionary learning (ADL) problem, and show that the SNS problem plays a central role, and may be utilized to solve dictionary learning problems. Moreover, we propose an efficient algorithm of sparse null space basis pursuit (SNS-BP) and extend it to a solution of ADL. Experimental results on numerical synthetic data and real-world data are further presented to validate the performance of our method.
Shape retrieval of non-rigid 3D human models
3D models of humans are commonly used within computer graphics and vision, and so the ability to distinguish between body shapes is an important shape retrieval problem. We extend our recent paper which provided a benchmark for testing non-rigid 3D shape retrieval algorithms on 3D human models. This benchmark provided a far stricter challenge than previous shape benchmarks.We have added 145 new models for use as a separate training set, in order to standardise the training data used and provide a fairer comparison. We have also included experiments with the FAUST dataset of human scans. All participants of the previous benchmark study have taken part in the new tests reported here, many providing updated results using the new data. In addition, further participants have also taken part, and we provide extra analysis of the retrieval results. A total of 25 different shape retrieval methods are compared.
Multimodal manifold analysis using simultaneous diagonalization of Laplacians
We construct an extension of spectral and diffusion geometry to multiple modalities through simultaneous diagonalization of Laplacian matrices. This naturally extends classical data analysis tools based on spectral geometry, such as diffusion maps and spectral clustering. We provide several synthetic and real examples of manifold learning, retrieval, and clustering demonstrating that the joint spectral geometry frequently better captures the inherent structure of multi-modal data. We also show the relation of many previous approaches to multimodal manifold analysis to our framework, of which the can be seen as particular cases.
A Picture is Worth a Billion Bits: Real-time image reconstruction from dense binary pixels
The pursuit of smaller pixel sizes at ever-increasing resolution in digital image sensors is mainly driven by the stringent price and form-factor requirements of sensors and optics in the cellular phone market. Recently, Eric Fossum proposed a novel concept of an image sensor with dense sub-diffraction limit one-bit pixels (jots), which can be considered a digital emulation of silver halide photographic film. This idea has been recently embodied as the EPFL Gigavision camera. A major bottleneck in the design of such sensors is the image reconstruction process, producing a continuous high dynamic range image from oversampled bi- nary measurements. The extreme quantization of the Pois- son statistics is incompatible with the assumptions of most standard image processing and enhancement frameworks. The recently proposed maximum-likelihood (ML) approach addresses this difficulty, but suffers from image artifacts and has impractically high computational complexity. In this work, we study a variant of a sensor with binary thresh- old pixels and propose a reconstruction algorithm combin- ing an ML data fitting term with a sparse synthesis prior. We also show an efficient hardware-friendly real-time approximation of this inverse operator. Promising results are shown on synthetic data as well as on HDR data emulated using multiple exposures of a regular CMOS sensor.
Computational all-in-focus imaging using an optical phase mask
A method for extended depth of field imaging based on image acquisition through a thin binary phase plate followed by fast automatic computational post-processing is presented. By placing a wavelength dependent optical mask inside the pupil of a conventional camera lens, one acquires a unique response for each of the three main color channels, which adds valuable information that allows blind reconstruction of blurred images without the need of an iterative search process for estimating the blurring kernel. The presented simulation as well as capture of a real life scene show how acquiring a one-shot image focused at a single plane, enable generating a de-blurred scene over an extended range in space.
GMD: Global model detection via inlier rate estimation
This work presents a novel approach for detecting inliers in a given set of correspondences (matches). It does so without explicitly identifying any consensus set, based on a method for inlier rate estimation (IRE). Given such an estimator for the inlier rate, we also present an algorithm that detects a globally optimal transformation. We provide a theoretical analysis of the IRE method using a stochastic generative model on the continuous spaces of matches and transformations. This model allows rigorous investigation of the limits of our IRE method for the case of 2D translation, further giving bounds and insights for the more general case. Our theoretical analysis is validated empirically and is shown to hold in practice for the more general case of 2D affinities. In addition, we show that the combined framework works on challenging cases of 2D homography estimation, with very few and possibly noisy inliers, where RANSAC generally fails.
SHREC'15 Track: Scalability of non-rigid 3D shape retrieval
Due to recent advances in 3D acquisition and modeling, increasingly large amounts of 3D shape data become available in many application domains. This rises not only the need for effective methods for 3D shape retrieval, but also efficient retrieval and robust implementations. Previous 3D retrieval challenges have mainly considered data sets in the range of a few thousands of queries. In the 2015 SHREC track on Scalability of 3D Shape Retrieval we provide a benchmark with more than 96 thousand shapes. The data set is based on a non-rigid retrieval benchmark enhanced by other existing shape benchmarks. From the baseline models, a large set of partial objects were automatically created by simulating a range-image acquisition process. Four teams have participated in the track, with most methods providing very good to near-perfect retrieval results, and one less complex baseline method providing fair performance. Timing results indicate that three of the methods including the latter baseline one provide near- interactive time query execution. Generally, the cost of data pre-processing varies depending on the method.
Sparse null space basis pursuit and analysis dictionary learning for high-dimensional data analysis
Sparse models in dictionary learning have been successfully applied in a wide variety of machine learning and computer vision problems, and have also recently been of increasing research interest. Another interesting related problem based on a linear equality constraint, namely the sparse null space problem (SNS), first appeared in 1986, and has since inspired results on sparse basis pursuit. In this paper, we investigate the relation between the SNS problem and the analysis dictionary learning problem, and show that the SNS problem plays a central role, and may be utilized to solve dictionary learning problems. Moreover, we propose an efficient algorithm of sparse null space basis pursuit, and extend it to a solution of analysis dictionary learning. Experimental results on numerical synthetic data and realworld data are further presented to validate the performance of our method.
On convex relaxation of graph isomorphism
We consider the problem of exact and inexact matching of weighted undirected graphs, in which a bijective correspondence is sought to minimize a quadratic weight disagreement. This computationally challenging problem is often relaxed as a convex quadratic program, in which the space of permutations is replaced by the space of doubly stochastic matrices. However, the applicability of such a relaxation is poorly understood. We define a broad class of friendly graphs characterized by an easily verifiable spectral property. We prove that for friendly graphs, the convex relaxation is guaranteed to find the exact isomorphism or certify its inexistence. This result is further extended to approximately isomorphic graphs, for which we develop an explicit bound on the amount of weight disagreement under which the relaxation is guaranteed to find the globally optimal approximate isomorphism. We also show that in many cases, the graph matching problem can be further harmlessly relaxed to a convex quadratic program with only n separable linear equality constraints, which is substantially more efficient than the standard relaxation involving 2n equality and n2 inequality constraints. Finally, we show that our results are still valid for unfriendly graphs if additional information in the form of seeds or attributes is allowed, with the latter satisfying an easy to verify spectral characteristic.
Learning efficient sparse and low-rank models
Parsimony, including sparsity and low rank, has been shown to successfully model data in numerous machine learning and signal processing tasks. Traditionally, parsimonious modeling approaches rely on an iterative algorithm that minimizes an objective function with parsimony-promoting terms. The inherently sequential structure and data-dependent complexity and latency of iterative optimization constitute a major limitation in many applications requiring real-time performance or involving large-scale data. Another limitation encountered by these models is the difficulty of their inclusion in supervised learning scenarios, where the higher-level training objective would depend on the solution of the lower-level pursuit problem. The resulting bilevel optimization problems are in general notoriously difficult to solve. In this paper, we propose to move the emphasis from the model to the pursuit algorithm, and develop a process-centric view of parsimonious modeling, in which a deterministic fixed-complexity pursuit process is used in lieu of iterative optimization. We show a principled way to construct learnable pursuit process architectures for structured sparse and robust low rank models from the iteration of proximal descent algorithms. These architectures approximate the exact parsimonious representation with a fraction of the complexity of the standard optimization methods. We also show that carefully chosen training regimes allow to naturally extend parsimonious models to discriminative settings. State-of-the-art results are demonstrated on several challenging problems in image and audio processing with several orders of magnitude speedup compared to the exact optimization algorithms.
Supervised non-negative matrix factorization for audio source separation
Source separation is a widely studied problems in signal processing. Despite the permanent progress reported in the literature it is still considered a significant challenge. This chapter first reviews the use of non-negative matrix factorization (NMF) algorithms for solving source separation problems, and proposes a new way for the supervised training in NMF. Matrix factorization methods have received a lot of attention in recent year in the audio processing community, producing particularly good results in source separation. Traditionally, NMF algorithms consist of two separate stages: a training stage, in which a generative model is learned; and a testing stage in which the pre-learned model is used in a high level task such as enhancement, separation, or classification. As an alternative, we propose a tasksupervised NMF method for the adaptation of the basis spectra learned in the first stage to enhance the performance on the specific task used in the second stage. We cast this problem as a bilevel optimization program efficiently solved via stochastic gradient descent. The proposed approach is general enough to handle sparsity priors of the activations, and allow non-Euclidean data terms such as beta-divergences. The framework is evaluated on speech enhancement.
Random forests can hash
Hash codes are a very efficient data representation needed to be able to cope with the ever growing amounts of data. We introduce a random forest semantic hashing scheme with information-theoretic code aggregation, showing for the first time how random forest, a technique that together with deep learning have shown spectacular results in classification, can also be extended to large-scale retrieval. Traditional random forest fails to enforce the consistency of hashes generated from each tree for the same class data, i.e., to preserve the underlying similarity, and it also lacks a principled way for code aggregation across trees. We start with a simple hashing scheme, where independently trained random trees in a forest are acting as hashing functions. We the propose a subspace model as the splitting function, and show that it enforces the hash consistency in a tree for data from the same class. We also introduce an information-theoretic approach for aggregating codes of individual trees into a single hash code, producing a near-optimal unique hash for each class. Experiments on large-scale public datasets are presented, showing that the proposed approach significantly outperforms state-of-the-art hashing methods for retrieval tasks.
Supervised non-Euclidean sparse NMF via bilevel optimization with applications to speech enhancement
Traditionally, NMF algorithms consist of two separate stages: a training stage, in which a generative model is learned; and a testing stage in which the pre-learned model is used in a high level task such as enhancement, separation, or classification. As an alternative, we propose a task-supervised NMF method for the adaptation of the basis spectra learned in the first stage to enhance the performance on the specific task used in the second stage. We cast this problem as a bilevel optimization program that can be efficiently solved via stochastic gradient descent. The proposed approach is general enough to handle sparsity priors of the activations, and allow non-Euclidean data terms such as beta-divergences. The framework is evaluated on single-channel speech enhancement tasks.
Probably approximately symmetric: Fast rigid symmetry detection with global guarantees
We present a fast algorithm for global 3D symmetry detection with approximation guarantees. The algorithm is guaranteed to find the best approximate symmetry of a given shape, to within a user-specified threshold, with very high probability. Our method uses a carefully designed sampling of the transformation space, where each transformation is efficiently evaluated using a sub-linear algorithm. We prove that the density of the sampling depends on the total variation of the shape, allowing us to derive formal bounds on the algorithm’s complexity and approximation quality. We further investigate different volumetric shape representations (in the form of truncated distance transforms), and in such a way control the total variation of the shape and hence the sampling density and the runtime of the algorithm. A comprehensive set of experiments assesses the proposed method, including an evaluation on the eight categories of the COSEG data-set. This is the first large-scale evaluation of any symmetry detection technique that we are aware of.
Supervised learning of bag-of-features shape descriptors using sparse coding
We present a method for supervised learning of shape descriptors for shape retrieval applications. Many content-based shape retrieval approaches follow the bag-of-features (BoF) paradigm commonly used in text and image retrieval by first computing local shape descriptors, and then representing them in a `geometric dictionary’ using vector quantization. A major drawback of such approaches is that the dictionary is constructed in an unsupervised manner using clustering, unaware of the last stage of the process (pooling of the local descriptors into a BoF, and comparison of the latter using some metric). In this paper, we replace the clustering with dictionary learning, where every atom acts as a feature, followed by sparse coding and pooling to get the final BoF descriptor. Both the dictionary and the sparse codes can be learned in the supervised regime via bi-level optimization using a task-specific objective that promotes invariance desired in the specific application. We show significant performance improvement on several standard shape retrieval benchmarks.
Real-time compressed imaging of scattering volumes
We propose a method and a prototype imaging system for real-time reconstruction of volumetric piecewise-smooth scattering media. The volume is illuminated by a sequence of structured binary patterns emitted from a fan beam projector, and the scattered light is collected by a two-dimensional sensor, thus creating an under-complete set of compressed measurements. We show a fixed-complexity and latency reconstruction algorithm capable of estimating the scattering coefficients in real-time. We also show a simple greedy algorithm for learning the optimal illumination patterns. Our results demonstrate faithful reconstruction from highly compressed measurements. Furthermore, a method for compressed registration of the measured volume to a known template is presented, showing excellent alignment with just a single projection. Though our prototype system operates in visible light, the presented methodology is suitable for fast x-ray scattering imaging, in particular in real-time vascular medical imaging.
Quantifying 3D shape similarity using maps: Recent trends, applications and perspectives
Shape similarity is an acute issue in Computer Vision and Computer Graphics that involves many aspects of human perception of the real world, including judged and perceived similarity concepts, deterministic and probabilistic decisions and their formalization. 3D models carry multiple information with them (e.g., geometry, topology, texture, time evolution, appearance), which can be thought as the filter that drives the recognition process. Assessing and quantifying the similarity between 3D shapes is necessary to explore large dataset of shapes, and tune the analysis framework to the userÕs needs. Many efforts have been done in this sense, including several attempts to formalize suitable notions of similarity and distance among 3D objects and their shapes. In the last years, 3D shape analysis knew a rapidly growing interest in a number of challenging issues, ranging from deformable shape similarity to partial matching and view-point selection. In this panorama, we focus on methods which quantify shape similarity (between two objects and sets of models) and compare these shapes in terms of their properties (i.e., global and local, geometric, differential and topological) conveyed by (sets of) maps. After presenting in detail the theoretical foundations underlying these methods, we review their usage in a number of 3D shape application domains, ranging from matching and retrieval to annotation and segmentation. Particular emphasis will be given to analyze the suitability of the different methods for specific classes of shapes (e.g. rigid or isometric shapes), as well as the flexibility of the various methods at the different stages of the shape comparison process. Finally, the most promising directions for future research developments are discussed.
Multimodal similarity-preserving hashing
We introduce an efficient computational framework for hashing data belonging to multiple modalities into a single representation space where they become mutually comparable. The proposed approach is based on a novel coupled siamese neural network architecture and allows unified treatment of intra- and inter-modality similarity learning. Unlike existing cross-modality similarity learning approaches, our hashing functions are not limited to binarized linear projections and can assume arbitrarily complex forms. We show experimentally that our method significantly outperforms state-of-the-art hashing approaches on multimedia retrieval tasks.
Sparse similarity-preserving hashing
In recent years, a lot of attention has been devoted to efficient nearest neighbor search by means of similarity-preserving hashing. One of the plights of existing hashing techniques is the intrinsic trade-off between performance and computational complexity: while longer hash codes allow for lower false positive rates, it is very difficult to increase the embedding dimensionality without incurring in very high false negatives rates or prohibiting computational costs. In this paper, we propose a way to overcome this limitation by enforcing the hash codes to be sparse. Sparse high-dimensional codes enjoy from the low false positive rates typical of long hashes, while keeping the false negative rates similar to those of a shorter dense hashing scheme with equal number of degrees of freedom. We use a tailored feed-forward neural network for the hashing function. Extensive experimental evaluation involving visual and multi-modal data shows the benefits of the proposed method.
Equi-affine invariant intrinsic geometries for bendable shapes analysis
Traditional models of bendable surfaces are based on the exact or approximate invariance to deformations that do not tear or stretch the shape, leaving intact an intrinsic geometry associated with it. Intrinsic geometries are typically defined using either the shortest path length (geodesic distance), or properties of heat diffusion (diffusion distance) on the surface. Both ways are implicitly derived from the metric induced by the ambient Euclidean space. In this paper, we depart from this restrictive assumption by observing that a different choice of the metric results in a richer set of geometric invariants. We extend the classic equi-affine arclength, defined on convex surfaces, to arbitrary shapes with non-vanishing gaussian curvature. As a result, a family of affine- invariant intrinsic geometries is obtained. The potential of this novel framework is explored in a wide range of applications such as shape matching and retrieval, symmetry detection, and computation of Voroni tessellation. We show that in some shape analysis tasks, our affine-invariant intrinsic geometries often outperform their Euclidean-based counterparts.
Shape retrieval of non-rigid 3D human models
We have created a new dataset for non-rigid 3D shape retrieval, one that is much more challenging than existing datasets. Our dataset features exclusively human models, in a variety of body shapes and poses. 3D models of humans are commonly used within computer graphics and vision, therefore the ability to distinguish between body shapes is an important feature for shape retrieval methods. In this track nine groups have submitted the results of a total of 22 different methods which have been tested on our new dataset.
Efficient supervised sparse analysis and synthesis operators
In this paper, we propose a new and computationally efficient framework for learning sparse models. We formulate a unified approach that contains as particular cases models promoting sparse synthesis and analysis type of priors, and mixtures thereof. The supervised training of the proposed model is formulated as a bilevel optimization problem, in which the operators are optimized to achieve the best possible performance on a specific task, e.g., reconstruction or classification. By restricting the operators to be shift invariant, our approach can be thought as a way of learning analysis+synthesis sparsity-promoting convolutional operators. Leveraging recent ideas on fast trainable regressors designed to approximate exact sparse codes, we propose a way of constructing feed-forward neural networks capable of approximating the learned models at a fraction of the computational cost of exact solvers. In the shift-invariant case, this leads to a principled way of constructing task-specific convolutional networks. We illustrate the proposed models on several experiments in music analysis and image processing applications.
Bilevel sparse models for polyphonic music transcription
In this work, we propose a trainable sparse model for automatic polyphonic music transcription, which incorporates several successful approaches into a unified optimization framework. Our model combines unsupervised synthesis models similar to latent component analysis and nonnegative factorization with metric learning techniques that allow supervised discriminative learning. We develop efficient stochastic gradient training schemes allowing unsupervised, semi-, and fully supervised training of the model as well its adaptation to test data. We show efficient fixed complexity and latency approximation that can replace iterative minimization algorithms in time-critical applications. Experimental evaluation on synthetic and real data shows promising initial results.
Sparse modeling of intrinsic correspondences
We present a novel sparse modeling approach to non-rigid shape matching using only the ability to detect repeatable regions. As the input to our algorithm, we are given only two sets of regions in two shapes; no descriptors are provided so the correspondence between the regions is not know, nor we know how many regions correspond in the two shapes. We show that even with such scarce information, it is possible to establish very accurate correspondence between the shapes by using methods from the field of sparse modeling, being this, the first non-trivial use of sparse models in shape correspondence. We formulate the problem of permuted sparse coding, in which we solve simultaneously for an unknown permutation ordering the regions on two shapes and for an unknown correspondence in functional representation. We also propose a robust variant capable of handling incomplete matches. Numerically, the problem is solved efficiently by alternating the solution of a linear assignment and a sparse coding problem. The proposed methods are evaluated qualitatively and quantitatively on standard benchmarks containing both synthetic and scanned objects.
Coupled quasi-harmonic bases
State-of-the-art approaches to shape analysis, synthesis, and correspondence rely on these natural harmonic bases that allow using classical tools from harmonic analysis on manifolds. However, many applications involving multiple shapes are obstacled by the fact that Laplacian eigenbases computed independently on different shapes are often incompatible with each other. In this paper, we propose the construction of common approximate eigenbases for multiple shapes using approximate joint diagonalization algorithms, taking as input a set of corresponding functions (e.g. indicator functions of stable regions) on the two shapes. We illustrate the benefits of the proposed approach on tasks from shape editing, pose transfer, correspondence, and similarity.
Audio restoration from multiple copies
A method for removing impulse noise from audio signals by fusing multiple copies of the same recording is introduced in this paper. The proposed algorithm exploits the fact that while in general multiple copies of a given recording are available, all sharing the same master, most degradations in audio signals are record-dependent. Our method first seeks for the optimal non-rigid alignment of the signals that is robust to the presence of sparse outliers with arbitrary magnitude. Unlike previous approaches, we simultaneously find the optimal alignment of the signals and impulsive degradation. This is obtained via continuous dynamic time warping computed solving an Eikonal equation. We propose to use our approach in the derivative domain, reconstructing the signal by solving an inverse problem that resembles the Poisson image editing technique. The proposed framework is here illustrated and tested in the restoration of old gramophone recordings showing promising results; however, it can be used in other application where different copies of the signal of interest are available and the degradations are copy-dependent.
Learnable low rank sparse models for speech denoising
In this paper we present a framework for real time enhancement of speech signals. Our method leverages a new process-centric approach for sparse and parsimonious models, where the representation pursuit is obtained applying a deterministic function or process rather than solving an optimization problem. We first propose a rank-regularized robust version of non-negative matrix factorization (NMF) for modeling time-frequency representations of speech signals in which the spectral frames are decomposed as sparse linear combinations of atoms of a low-rank dictionary. Then, a parametric family of pursuit processes is derived from the iteration of the proximal descent method for solving this model. We present several experiments showing successful results and the potential of the proposed framework. Incorporating discriminative learning makes the proposed method significantly outperform exact NMF algorithms, with fixed latency and at a fraction of it’s computational complexity.
Geometric and photometric data fusion in non-rigid shape analysis
In this paper, we explore the use of the diffusion geometry framework for the fusion of geometric and photometric information in local and global shape descriptors. Our construction is based on the definition of a diffusion process on the shape manifold embedded into a high-dimensional space where the embedding coordinates represent the photometric information. Experimental results show that such data fusion is useful in coping with different challenges of shape analysis where pure geometric and pure photometric methods fail.
Partial shape matching without point-wise correspondence
Partial similarity of shapes in a challenging problem arising in many important applications in computer vision, shape analysis, and graphics, e.g. when one has to deal with partial information and acquisition artifacts. The problem is especially hard when the underlying shapes are non-rigid and are given up to a deformation. Partial matching is usually approached by computing local descriptors on a pair of shapes and then establishing a point-wise non-bijective correspondence between the two, taking into account possibly different parts. In this paper, we introduce an alternative correspondence-less approach to matching fragments to an entire shape undergoing a non-rigid deformation. We use diffusion geometric descriptors and optimize over the integration domains on which the integral descriptors of the two parts match. The problem is regularized using the Mumford-Shah functional. We show an efficient discretization based on the Ambrosio-Tortorelli approximation generalized to triangular meshes and point clouds, and present experiments demonstrating the success of the proposed method.
Learning spectral descriptors for deformable shape correspondence
Informative and discriminative feature descriptors play a fundamental role in deformable shape analysis. For example, they have been successfully employed in correspondence, registration, and retrieval tasks. In the recent years, significant attention has been devoted to descriptors obtained from the spectral decomposition of the Laplace-Beltrami operator associated with the shape. Notable examples in this family are the heat kernel signature (HKS) and the recently introduced wave kernel signature (WKS). Laplacian-based descriptors achieve state-of-the-art performance in numerous shape analysis tasks; they are computationally efficient, isometry-invariant by construction, and can gracefully cope with a variety of transformations. In this paper, we formulate a generic family of parametric spectral descriptors. We argue that in order to be optimized for a specific task, the descriptor should take into account the statistics of the corpus of shapes to which it is applied (the “signal”) and those of the class of transformations to which it is made insensitive (the “noise”). While such statistics are hard to model axiomatically, they can be learned from examples. Following the spirit of the Wiener filter in signal processing, we show a learning scheme for the construction of optimized spectral descriptors and relate it to Mahalanobis metric learning. The superiority of the proposed approach in generating correspondences is demonstrated on synthetic and scanned human figures. We also show that the learned descriptors are robust enough to be learned on synthetic data and transferred successfully to scanned shapes.
Real-time online singing voice separation from monaural recordings using robust low-rank modeling
Separating the leading vocals from the musical accompaniment is a challenging task that appears naturally in several music processing applications. Robust principal component analysis (RPCA) has been recently employed to this problem producing very successful results. The method decomposes the signal into a low-rank component corresponding to the accompaniment with its repetitive structure, and a sparse component corresponding to the voice with its quasi-harmonic structure. In this paper, we first introduce a non-negative variant of RPCA, termed as robust low-rank non-negative matrix factorization (RNMF). This new framework better suits audio applications. We then propose two efficient feed-forward architectures that approximate the RPCA and RNMF with low latency and a fraction of the complexity of the original optimization method. These approximants allow incorporating elements of unsupervised, semi- and fully-supervised learning into the RPCA and RNMF frameworks. Our basic implementation shows several orders of magnitude speedup compared to the exact solvers with no performance degradation, and allows online and faster-than-real-time processing. Evaluation on the MIR-1K dataset demonstrates state-of-the-art performance.
Putting the pieces together: regularized multi-shape partial matching
Multi-part shape matching in an important class of problems, arising in many fields such as computational archaeology, biology, geometry processing, computer graphics and vision. In this paper, we address the problem of simultaneous matching and segmentation of multiple shapes. We assume to be given a reference shape and multiple parts partially matching the reference. Each of these parts can have additional clutter, have overlap with other parts, or there might be missing parts. We show experimental results of efficient and accurate assembly of fractured synthetic and real objects.
Stable spectral mesh filtering
The rapid development of 3D acquisition technology has brought with itself the need to perform standard signal processing operations such as filters on 3D data. It has been shown that the eigenfunctions of the Laplace-Beltrami operator (manifold harmonics) of a surface play the role of the Fourier basis in the Euclidean space; it is thus possible to formulate signal analysis and synthesis in the manifold harmonics basis. In particular, geometry filtering can be carried out in the manifold harmonics domain by decomposing the embedding coordinates of the shape in this basis. However, since the basis functions depend on the shape itself, such filtering is valid only for weak (near all-pass) filters, and produces severe artifacts otherwise. In this paper, we analyze this problem and propose the fractional filtering approach, wherein we apply iteratively weak fractional powers of the filter, followed by the update of the basis functions. Experimental results show that such a process produces more plausible and meaningful results.
Intrinsic shape context descriptors for deformable shapes
In this work, we present intrinsic shape context (ISC) descriptors for 3D shapes. We generalize to surfaces the polar sampling of the image domain used in shape contexts; for this purpose, we chart the surface by shooting geodesic outwards from the point being analyzed; ‘angle’ is treated as tantamount to geodesic shooting direction, and radius as geodesic distance. To deal with orientation ambiguity, we exploit properties of the Fourier transform. Our charting method is intrinsic, i.e., invariant to isometric shape transformations. The resulting descriptor is a meta-descriptor that can be applied to any photometric or geometric property field defined on the shape, in particular, we can leverage recent developments in intrinsic shape analysis and construct ISC based on state-of-the-art dense shape descriptors such as heat kernel signatures. Our experiments demonstrate a notable improvement in shape matching on standard benchmarks.
A game-theoretic approach to deformable shape matching
We consider the problem of minimum distortion intrinsic correspondence between deformable shapes, many useful formulations of which give rise to the NP-hard quadratic assignment problem (QAP). Previous attempts to use the spectral relaxation have had limited success due to the lack of sparsity of the obtained “fuzzy” solution. In this paper, we adopt the recently introduced alternative L1 relaxation of the QAP based on the principles of game theory. We relate it to the Gromov and Lipschitz metrics between metric spaces and demonstrate on state-of-the-art benchmarks that the proposed approach is capable of finding very accurate sparse correspondences between deformable shapes.
Eurographics Workshop on 3D Object Retrieval
This book contains the research work presented at fifth Eurographics Workshop on 3D Object Retrieval (3DOR) held in Cagliari, Italy on May 13, 2012. The 3DOR workshop series was started in Crete (2008), and then held in Munich (2009), Norrkoping (2010) and Llandudno (2011), always as a co-event of the Annual Conference of the European Association for Computer Graphics (Eurographics). All five such workshops are successful examples of international cooperation and the attendance demonstrates the relevance of focused topics. Demonstrating the increasing importance of the workshop, a record number of 23 papers were submitted this year. These papers were reviewed by an international Program Committee of 35 external experts in the area. Based on their recommendations, a selection of nine long papers was accepted for presentation at the workshop, giving an acceptance rate below 40%. Additionally, six poster presentations describing timely research results of high quality were included in the workshop program. Similarly to the previous editions of the 3DOR workshop, this year’s event hosted the seventh Shape Retrieval Contest (SHREC’12). The goal of the contest is to evaluate the effectiveness of 3D-shape retrieval algorithms, thus playing an important role in the evolution of 3D Object Retrieval research. SHREC’12 contributes to the proceedings with four additional papers that detail the results of the competition. We are grateful to the Eurographics association for their support, and to all reviewers for ensuring a high-quality program despite the tight schedule. Special thanks are also to Stefanie Behnke for her constant and timely attention. Finally, we hope that this workshop proves useful to all participants and sets the ground for long-term interaction, collaboration, and identification of future directions and potential problems in the field.
Stable volumetric features in deformable shapes
Region feature detectors and descriptors have become a successful and popular alternative to point descriptors in image analysis due to their high robustness and repeatability, leading to a significant interest in the shape analysis community in finding analogous approaches in the 3D world. Recent works have successfully extended the maximally stable extremal region (MSER) detection algorithm to surfaces. In many applications, however, a volumetric shape model is more appropriate, and modeling shape deformations as approximate isometries of the volume of an object, rather than its boundary, better captures natural behavior of non-rigid deformations. In this paper, we formulate a diffusion-geometric framework for volumetric stable component detection and description in deformable shapes. An evaluation of our method on the SHREC’11 feature detection benchmark and SCAPE human body scans shows its potential as a source of high-quality features. Examples demonstrating the drawbacks of surface stable components and the advantage of their volumetric counterparts are also presented.
Group-valued regularization for analysis of articulated motion
We present a novel method for estimation of articulated motion in depth scans. The method is based on a framework for regularization of vector- and matrix- valued functions on parametric surfaces. We extend augmented-Lagrangian total variation regularization to smooth rigid motion cues on the scanned 3D surface obtained from a range scanner. We demonstrate the resulting smoothed motion maps to be a powerful tool in articulated scene understanding, providing a basis for rigid parts segmentation, with little prior assumptions on the scene, despite the noisy depth measurements that often appear in commodity depth scanners.
Learning efficient structured sparse models
We present a comprehensive framework for structured sparse coding and modeling extending the recent ideas of using learnable fast regressors to approximate exact sparse codes. For this purpose, we propose an efficient feed forward architecture derived from the iteration of the block-coordinate algorithm. This architecture approximates the exact structured sparse codes with a fraction of the complexity of the standard optimization methods. We also show that by using different training objective functions, the proposed learnable sparse encoders are not only restricted to be approximants of the exact sparse code for a pre-given dictionary, but can be rather used as full-featured sparse encoders or even modelers. A simple implementation shows several orders of magnitude speedup compared to the state-of-the-art exact optimization algorithms at minimal performance degradation, making the proposed framework suitable for real time and large-scale applications.
Parallelized algorithms for rigid surface alignment on GPU
Alignment and registration of rigid surfaces is a fundamental computational geometric problem with applications ranging from medical imaging, automated target recognition, and robot navigation just to mention a few. The family of the iterative closest point (ICP) algorithms introduced by Chen and Medioni and Besl and McKey and improved over the three decades that followed constitute a classical to the problem. However, with the advent of geometry acquisition technologies and applications they enable, it has become necessary to align in real time dense surfaces containing millions of points. The classical ICP algorithms, being essentially sequential procedures, are unable to address the need. In this study, we follow the recent work by Mitra et al. considering ICP from the point of view of point-to-surface Euclidean distance map approximation. We propose a variant of a k-d tree data structure to store the approximation, and show its efficient parallelization on modern graphics processors. The flexibility of our implementation allows using different distance approximation schemes with controllable trade-off between accuracy and complexity. It also allows almost straightforward adaptation to richer transformation groups. Experimental evaluation of the proposed approaches on a state-of-the-art GPU on very large datasets containing around 106 vertices shows real-time performance superior by up to three orders of magnitude compared to an efficient CPU-based version.
Articulated motion segmentation of point clouds by group-valued regularization
Motion segmentation for articulated objects is an important topic of research. Yet such a segmentation should be as free as possible from underlying assumptions so as to fit general scenes and objects. In this paper we demonstrate an algorithm for articulated motion segmentation of 3D point clouds, free of any assumptions on the underlying model and yet firmly set in a well-defined variational framework. Results on scanned images show the generality of the proposed technique and its robustness to scanning artifacts and noise.
Affine-invariant photometric heat kernel signatures
In this paper, we explore the use of the diffusion geometry framework for the fusion of geometric and photometric information in local shape descriptors. Our construction is based on the definition of a modified metric, which combines geometric and photometric information, and then the diffusion process on the shape manifold is simulated. Experimental results show that such data fusion is useful in coping with shape retrieval experiments, where pure geometric and pure photometric methods fail. Apart from retrieval task the proposed diffusion process may be employed in other applications.
LDAHash: improved matching with smaller descriptors
SIFT-like local feature descriptors are ubiquitously employed in such computer vision applications as content-based retrieval, video analysis, copy detection, object recognition, photo-tourism, and 3D reconstruction from multiple views. Feature descriptors can be designed to be invariant to certain classes of photometric and geometric transformations, in particular, affine and intensity scale transformations. However, real transformations that an image can undergo can only be approximately modeled in this way, and thus most descriptors are only approximately invariant in practice. Secondly, descriptors are usually high-dimensional (e.g. SIFT is represented as a 128-dimensional vector). In large-scale retrieval and matching problems, this can pose challenges in storing and retrieving descriptor data. We propose mapping the descriptor vectors into the Hamming space, in which the Hamming metric is used to compare the resulting representations. This way, we reduce the size of the descriptors by representing them as short binary strings and learn descriptor invariance from examples. We show extensive experimental validation, demonstrating the advantage of the proposed approach.
Scale Space and Variational Methods in Computer Vision
The International Conference on Scale Space and Variational Methods in Computer Vision (SSVM 2011) is the third issue of the conference born in 2007 as the joint edition of the Scale-Space Conferences (since 1997, Utrecht) and the Workshop on Variational, Geometric, and Level set Methods (VLSM) that first took place in Vancouver in 2001. Previous issues in Ischia, Italy (2007) and Voss, Norway (2009) were very successful, materializing the hope of the first SSVM organizers, Prof. Sgallari, Murli and Paragios, that the conference would ‘become a reference in the domain’. This year, SSVM was held in Kibbutz Ein-Gedi, Israel – a unique place on the shores of the Dead Sea, the global minimum on earth. Despite its small size, Israel plays an important role on the worldwide scientific arena, and in particular in the fields on computer vision and image processing. Following the tradition of the previous SSVM conferences, we invited outstanding scientists to give keynote presentations. This year, it was our pleasure to welcome Prof. Haim Brezis (Université Pierre et Marie Curie, France), Dr. Remco Duits, (Eindhoven University, The Netherlands), Prof. Stèphane Mallat (École Polytechnique, France), and Prof. Joachim Weickert (Saarland University, Germany). Additionally, we had six review lectures on topics of broad interest, given by experts in the field, Profs. Philip Rosenau (Tel Aviv University, Israel), Jing Yuan (University of Western Ontario, Canada), Patrizio Frosini (University of Bologna, Italy), Radu Horaud (INRIA, France), Gérard Medioni (University of Southern California, USA), and Elisabetta Carlini (La Sapienza, Italy). Out of 78 submitted papers, 24 were selected to be presented orally and 44 as posters. Over 100 people attended the conference, representing countries from all over the world, including Austria, China, France, Germany, Hong-Kong, Israel, Italy, Japan, Korea, the Netherlands, Norway, Singapore, Slovakia, Switzerland, Turkey, and USA. We would like to thank the authors for their contributions, the members of the Program Committee for their dedication and timely review process, and to Yana Katz and Boris Princ for local arrangements and organization without which this conference would not be possible. Finally, our special thanks to the Technion Department of Computer Science, HP Laboratories Israel, Haifa, Rafael Ltd., Israel, BBK Technologies Ltd., Israel, and the European Community’s FP7 ERC/FIRST programs for their generous sponsorship.
Stable semi-local features for non-rigid shapes
Feature-based analysis is becoming a very popular approach for geometric shape analysis. Following the success of this approach in image analysis, there is a growing interest in finding analogous methods in the 3D world. Maximally stable component detection is a low computation cost and high repeatability method for feature detection in images. In this study, a diffusion-geometry based framework for stable component detection is presented, which can be used for geometric feature detection in deformable shapes. The vast majority of studies of deformable 3D shapes models them as the two-dimensional boundary of the volume of the shape. Recent works have shown that a volumetric shape model is advantageous in numerous ways as it better captures the natural behavior of non-rigid deformations. We show that our framework easily adapts to this volumetric approach, and even demonstrates superior performance. A quantitative evaluation of our methods on the SHREC’10 and SHREC’11 feature detection benchmarks as well as qualitative tests on the SCAPE dataset show its potential as a source of high-quality features. Examples demonstrating the drawbacks of surface stable components and the advantage of their volumetric counterparts are also presented.
Group-valued regularization for motion segmentation of articulated shapes
Motion-based segmentation is an important tool for the analysis of articulated shapes. As such, it plays an important role in mechanical engineering, computer graphics, and computer vision. In this chapter, we study motion-based segmentation of 3D articulated shapes. We formulate motion-based surface segmentation as a piecewise-smooth regularization problem for the transformations between several poses. Using Lie-group representation for the transformation at each surface point, we obtain a simple regularized fitting problem. An Ambrosio-Tortorelli scheme of a generalized Mumford-Shah model gives us the segmentation functional without assuming prior knowledge on the number of parts or even the articulated nature of the object. Experiments on several standard datasets compare the results of the proposed method to state-of-the-art algorithms.
3D features, surface descriptors, and object descriptors
The computer vision and pattern recognition communities have recently witnessed a surge of feature-based methods in numerous applications including object recognition and image retrieval. Similar concepts and analogous approaches are penetrating the world of 3D shape analysis, in a variety of areas including non-rigid shape retrieval and matching. In this chapter, we present the state-of-the-art of feature-based approaches in 3D shape analysis.
Are MSER features really interesting?
Detection and description of affine-invariant features is a cornerstone component in numerous computer vision applications. In this note, we analyze the notion of maximally stable extremal regions (MSER) through the prism of the curvature scale space, and conclude that in its original definition, MSER prefers regular (round) regions. Arguing that interesting features in natural images usually have irregular shapes, we propose alternative definitions of MSER which are free of this bias, yet maintain their invariance properties.
Spectral descriptors for deformable shapes
Informative and discriminative feature descriptors play a fundamental role in deformable shape analysis. For example, they have been successfully employed in correspondence, registration, and retrieval tasks. In the recent years, significant attention has been devoted to descriptors obtained from the spectral decomposition of the Laplace-Beltrami operator associated with the shape. Notable examples in this family are the heat kernel signature (HKS) and the wave kernel signature (WKS). Laplacian-based descriptors achieve state-of-the-art performance in numerous shape analysis tasks; they are computationally efficient, isometry-invariant by construction, and can gracefully cope with a variety of transformations. In this paper, we formulate a generic family of parametric spectral descriptors. We argue that in order to be optimal for a specific task, the descriptor should take into account the statistics of the corpus of shapes to which it is applied (the “signal”) and those of the class of transformations to which it is made insensitive (the “noise”). While such statistics are hard to model axiomatically, they can be learned from examples. Following the spirit of the Wiener filter in signal processing, we show a learning scheme for the construction of optimal spectral descriptors and relate it to Mahalanobis metric learning. The superiority of the proposed approach is demonstrated on the SHREC’10 benchmark.
Affine-invariant diffusion geometry for the analysis of deformable 3D shapes
We introduce an (equi-)affine invariant diffusion geometry by which surfaces that go through squeeze and shear transformations can still be properly analyzed. The definition of an affine invariant metric enables us to construct an invariant Laplacian from which local and global geometric structures are extracted. Applications of the proposed framework demonstrate its power in generalizing and enriching the existing set of tools for shape analysis.
Shape recognition with spectral distances
Recent works have shown the use of diffusion geometry for various pattern recognition applications, including non-rigid shape analysis. In this paper, we introduce spectral shape distance as a general framework for distribution-based shape similarity and show that two recent methods for shape similarity due to Rustamov and Mahmoudi & Sapiro are particular cases thereof.
A correspondence-less approach to matching of deformable shapes
Finding a match between partially available deformable shapes is a challenging problem with numerous applications. The problem is usually approached by computing local descriptors on a pair of shapes and then establishing a point-wise correspondence between the two. In this paper, we introduce an alternative correspondence-less approach to matching fragments to an entire shape undergoing a non-rigid deformation. We use diffusion geometric descriptors and optimize over the integration domains on which the integral descriptors of the two parts match. The problem is regularized using the Mumford-Shah functional. We show an efficient discretization based on the Ambrosio-Tortorelli approximation generalized to triangular meshes. Experiments demonstrating the success of the proposed method are presented.
Photometric heat kernel signatures
In this paper, we explore the use of the diffusion geometry framework for the fusion of geometric and photometric information in local heat kernel signature shape descriptors. Our construction is based on the definition of a diffusion process on the shape manifold embedded into a high-dimensional space where the embedding coordinates represent the photometric information. Experimental results show that such data fusion is useful in coping with different challenges of shape analysis where pure geometric and pure photometric methods fail.
Deformable shape retrieval by learning diffusion kernels
In classical signal processing, it is common to analyze and process signals in the frequency domain, by representing the signal in the Fourier basis, and filtering it by applying a transfer function on the Fourier coefficients. In some applications, it is possible to design an optimal filter. A classical example is the Wiener filter that achieves a minimum mean squared error estimate for signal denoising. Here, we adopt similar concepts to construct optimal diffusion geometric shape descriptors. The analogy of Fourier basis are the eigenfunctions of the Laplace-Beltrami operator, in which many geometric constructions such as diffusion metrics, can be represented. By designing a filter of the Laplace-Beltrami eigenvalues, it is theoretically possible to achieve invariance to different shape transformations, like scaling. Given a set of shape classes with different transformations, we learn the optimal filter by minimizing the ratio between knowingly similar and knowingly dissimilar diffusion distances it induces. The output of the proposed framework is a filter that is optimally tuned to handle transformations that characterize the training set.
Group-valued regularization framework for motion segmentation of dynamic non-rigid shapes
Understanding of articulated shape motion plays an important role in many applications in the mechanical engineering, movie industry, graphics, and vision communities. In this paper, we study motion-based segmentation of articulated 3D shapes into rigid parts. We pose the problem as finding a group-valued map between the shapes describing the motion, forcing it to favor piecewise rigid motions. Our computation follows the spirit of the Ambrosio-Tortorelli scheme for Mumford-Shah segmentation, with a diffusion component suited for the group nature of the motion model. Experimental results demonstrate the effectiveness of the proposed method in non-rigid motion segmentation.
Discrete minimum distortion correspondence problems for non-rigid shape matching
Similarity and correspondence are two fundamental archetype problems in shape analysis, encountered in numerous application in computer vision and pattern recognition. Many methods for shape similarity and correspondence boil down to the minimum-distortion correspondence problem, in which two shapes are endowed with certain structure, and one attempts to find the matching with smallest structure distortion between them. Defining structures invariant to some class of shape transformations results in an invariant minimum-distortion correspondence or similarity. In this paper, we model shapes using local and global structures, formulate the invariant correspondence problem as binary graph labeling, and show how different choice of structure results in invariance under various classes of deformations.
Shape palindromes: analysis of intrinsic symmetries in 2D articulated shapes
Analysis of intrinsic symmetries of non-rigid and articulated shapes is an important problem in pattern recognition with numerous applications ranging from medicine to computational aesthetics. Considering articulated planar shapes as closed curves, we show how to represent their extrinsic and intrinsic symmetries as self-similarities of local descriptor sequences, which in turn have simple interpretation in the frequency domain. The problem of symmetry detection and analysis thus boils down to analysis of descriptor sequence patterns. For that purpose, we show two efficient computational methods: one based on Fourier analysis, and another on dynamic programming. Metaphorically, the later can be compared to finding palindromes in text sequences.
Boosted metric learning for 3D multi-modal deformable registration
Defining a suitable metric is one of the biggest challenges in deformable image fusion from different modalities. In this paper, we propose a novel approach for multi-modal metric learning in the deformable registration framework that consists of embedding data from both modalities into a common metric space whose metric is used to parametrize the similarity. Specifically, we use image representation in the Fourier/Gabor space which introduces invariance to the local pose parameters, and the Hamming metric as the target embedding space, which allows constructing the embedding using boosted learning algorithms. The resulting metric is incorporated into a discrete optimization framework. Very promising results demonstrate the potential of the proposed method.
Affine-invariant geodesic geometry of deformable 3D shapes
Natural objects can be subject to various transformations yet still preserve properties that we refer to as invariants. Here, we use definitions of affine invariant arclength for surfaces in R3 in order to extend the set of existing non-rigid shape analysis tools. We show that by re-defining the surface metric as its equi-affine version, the surface with its modified metric tensor can be treated as a canonical Euclidean object on which most classical Euclidean processing and analysis tools can be applied. The new definition of a metric is used to extend the fast marching method technique for computing geodesic distances on surfaces, where now, the distances are defined with respect to an affine invariant arclength. Applications of the proposed framework demonstrate its invariance, efficiency, and accuracy in shape analysis.
Diffusion-geometric maximally stable component detection in deformable shapes
Maximally stable component detection is a very popular method for feature analysis in images, mainly due to its low computation cost and high repeatability. With the recent advance of feature-based methods in geometric shape analysis, there is significant interest in finding analogous approaches in the 3D world. In this paper, we formulate a diffusion-geometric framework for stable component detection in non-rigid 3D shapes, which can be used for geometric feature detection and description. A quantitative evaluation of our method on the SHREC’10 feature detection benchmark shows its potential as a source of high-quality features.
Shape Google: geometric words and expressions for invariant shape retrieval
The computer vision and pattern recognition communities have recently witnessed a surge of feature-based methods in object recognition and image retrieval applications. These methods allow representing images as collections of “visual words” and treat them using text search approaches following the “bag of features” paradigm. In this paper, we explore analogous approaches in the 3D world applied to the problem of non-rigid shape retrieval in large databases. Using multiscale diffusion heat kernels as “geometric words”, we construct compact and informative shape descriptors by means of the “bag of features” approach. We also show that considering pairs of geometric words (“geometric expressions”) allows creating spatially-sensitive bags of features with better discriminativity. Finally, adopting metric learning approaches, we show that shapes can be efficiently represented as binary codes. Our approach achieves state-of-the-art results on the SHREC 2010 large-scale shape retrieval benchmark.
Metric approaches to invariant shape similarity
Non-rigid shapes are ubiquitous in Nature and are encountered at all levels of life, from macro to nano. The need to model such shapes and understand their behavior arises in many applications in imaging sciences, pattern recognition, computer vision, and computer graphics. Of particular importance is understanding which properties of the shape are attributed to deformations and which are invariant, i.e., remain unchanged. This chapter presents an approach to non- rigid shapes from the point of view of metric geometry. Modeling shapes as metric spaces, one can pose the problem of shape similarity as the similarity of metric spaces and harness tools from theoretical metric geometry for the computation of such a similarity.
Affine-invariant geodesic geometry of deformable 3D shapes
Natural objects can be subject to various transformations yet still preserve properties that we refer to as invariants. Here, we use definitions of affine invariant arclength for surfaces in R3 in order to extend the set of existing non-rigid shape analysis tools. In fact, we show that by re-defining the surface metric as its equi-affine version, the surface with its modified metric tensor can be treated as a canonical Euclidean object on which most classical Euclidean processing and analysis tools can be applied. The new definition of a metric is used to extend the fast marching method technique for computing geodesic distances on surfaces, where now, the distances are defined with respect to an affine invariant arclength. Applications of the proposed framework demonstrate its invariance, efficiency, and accuracy in shape analysis.
Affine-invariant diffusion geometry for the analysis of deformable 3D shapes
We introduce an (equi-)affine invariant diffusion geometry by which surfaces that go through squeeze and shear transformations can still be properly analyzed. The definition of an affine invariant metric enables us to construct an invariant Laplacian from which local and global geometric structures are extracted. Applications of the proposed framework demon- strate its power in generalizing and enriching the existing set of tools for shape analysis.
Full and partial symmetries of non-rigid shapes
Symmetry and self-similarity is the cornerstone of Nature, exhibiting itself through the shapes of natural creations and ubiquitous laws of physics. Since many natural objects are symmetric, the absence of symmetry can often be an indication of some anomaly or abnormal behavior. Therefore, detection of asymmetries is important in numerous practical applications, including crystallography, medical imaging, and face recognition, to mention a few. Conversely, the assumption of underlying shape symmetry can facilitate solutions to many problems in shape reconstruction and analysis. Traditionally, symmetries are described as extrinsic geometric properties of the shape. While being adequate for rigid shapes, such a description is inappropriate for non-rigid ones: extrinsic symmetry can be broken as a result of shape deformations, while its intrinsic symmetry is preserved. In this paper, we present a generalization of symmetries for non-rigid shapes and a numerical framework for their analysis, addressing the problems of full and partial exact and approximate symmetry detection and classification.
A Gromov-Hausdorff framework with diffusion geometry for topologically-robust non-rigid shape matching
In this paper, the problem of non-rigid shape recognition is viewed from the perspective of metric geometry, and the applicability of diffusion distances within the Gromov-Hausdorff framework is explored. While the commonly used geodesic distance exploits the shortest path between points on the surface, the diffusion distance averages all paths connecting between the points. The diffusion distance provides an intrinsic distance measure which is robust, in particular to topological changes. Such changes may be a result of natural non-rigid deformations, as well as acquisition noise, in the form of holes or missing data, and representation noise due to inaccurate mesh construction. The presentation of the proposed framework is complemented with numerous examples demonstrating that in addition to the relatively low complexity involved in the computation of the diffusion distances between surface points, its recognition and matching performances favorably compare to the classical geodesic distances in the presence of topological changes between the non-rigid shapes.
Intrinsic regularity detection in 3D geometry
Automatic detection of symmetries, regularity, and repetitive structures in 3D geometry is a fundamental problem in shape analysis and pattern recognition with applications in computer vision and graphics. Especially challenging is to detect intrinsic regularity, where the repetitions are on an intrinsic grid, without any apparent Euclidean pattern to describe the shape, but rising out of (near) isometric deformation of the underlying surface. In this paper, we employ multidimensional scaling to reduce the problem of intrinsic structure detection to a simpler problem of 2D grid detection. Potential 2D grids are then identified using an autocorrelation analysis, refined using local fitting, validated, and finally projected back to the spatial domain. We test the detection algorithm on a variety of scanned plaster models in presence of imperfections like missing data, noise and outliers. We also present a range of applications including scan completion, shape editing, super-resolution, and structural correspondence.
Spatially-sensitive affine-invariant image descriptors
Invariant image descriptors play an important role in many computer vision and pattern recognition problems such as image search and retrieval. A dominant paradigm today is that of “bags of features”, a representation of images as distributions of primitive visual elements. The main disadvantage of this approach is the loss of spatial relations between features, which often carry important information about the image. In this paper, we show how to construct spatially-sensitive image descriptors in which both the features and their relation are affine-invariant. Our construction is based on a vocabulary of pairs of features coupled with a vocabulary of invariant spatial relations between the features. Experimental results show the advantage of our approach in image retrieval applications.
Data fusion through cross-modality metric learning using similarity-sensitive hashing
Visual understanding is often based on measuring similarity between observations. Learning similarities specific to a certain perception task from a set of examples has been shown advantageous in various computer vision and pattern recognition problems. In many important applications, the data that one needs to compare come from different representations or modalities, and the similarity between such data operates on objects that may have different and often incommensurable structure and dimensionality. In this paper, we propose a framework for supervised similarity learning based on embedding the input data from two arbitrary spaces into the Hamming space. The mapping is expressed as a binary classification problem with positive and negative examples, and can be efficiently learned using boosting algorithms. The utility and efficiency of such a generic approach is demonstrated on several challenging applications including cross-representation shape retrieval and alignment of multi-modal medical images.
Volumetric heat kernel signatures
Invariant shape descriptors are instrumental in numerous shape analysis tasks including deformable shape comparison, registration, classification, and retrieval. Most existing constructions model a 3D shape as a two-dimensional surface describing the shape boundary, typically represented as a triangular mesh or a point cloud. Using intrinsic properties of the surface, invariant descriptors can be designed. One such example is the recently introduced heat kernel signature, based on the Laplace-Beltrami operator of the surface. In many applications, however, a volumetric shape model is more natural and convenient. Moreover, modeling shape deformations as approximate isometries of the volume of an object, rather than its boundary, better captures natural behavior of non-rigid deformations in many cases. Here, we extend the idea of heat kernel signature to robust isometry-invariant volumetric descriptors, and show their utility in shape retrieval. The proposed approach achieves state-of-the-art results on the SHREC 2010 large-scale shape retrieval benchmark.
Diffusion symmetries of non-rigid shapes
Detection and modeling of self-similarity and symmetry is important in shape recognition, matching, synthesis, and reconstruction. While the detection of rigid shape symmetries is well-established, the study of symmetries in non- rigid shapes is a much less researched problem. A particularly challenging setting is the detection of symmetries in non-rigid shapes affected by topological noise and asymmetric connectivity. In this paper, we treat shapes as metric spaces, with the metric induced by heat diffusion properties, and define non-rigid symmetries as self-isometries with respect to the diffusion metric. Experimental results show the advantage of the diffusion metric over the previously proposed geodesic metric for exploring intrinsic symmetries of bendable shapes with possible topological irregularities
Nonlinear dimensionality reduction by topologically constrained isometric embedding
Many manifold learning procedures try to embed a given feature data into a flat space of low dimensionality while preserving as much as possible the metric in the natural feature space. The embedding process usually relies on distances between neighboring features, mainly since distances between features that are far apart from each other often provide an unreliable estimation of the true distance on the feature manifold due to its non-convexity. Distortions resulting from using long geodesics indiscriminately lead to a known limitation of the Isomap algorithm when used to map nonconvex manifolds. Presented is a framework for nonlinear dimensionality reduction that uses both local and global distances in order to learn the intrinsic geometry of flat manifolds with boundaries. The resulting algorithm filters out potentially problematic distances between distant feature points based on the properties of the geodesics connecting those points and their relative distance to the boundary of the feature manifold, thus avoiding an inherent limitation of the Isomap algorithm. Since the proposed algorithm matches non-local structures, it is robust to strong noise. We show experimental results demonstrating the advantages of the proposed approach over conventional dimensionality reduction techniques, both global and local in nature.
SHREC 2010: robust large-scale shape retrieval benchmark
SHREC’10 robust large-scale shape retrieval benchmark simulates a retrieval scenario, in which the queries include multiple modifications and transformations of the same shape. The benchmark allows evaluating how algorithms cope with certain classes of transformations and what is the strength of the transformations that can be dealt with. The present paper is a report of the SHREC’10 robust large-scale shape retrieval benchmark results.
SHREC 2010: robust feature detection and description benchmark
Feature-based approaches have recently become very popular in computer vision and image analysis application, and are becoming a promising direction in shape retrieval applications. SHREC’10 robust feature detection and description benchmark simulates feature detection and description stage of feature-based shape retrieval algorithms. The benchmark tests the performance of shape feature detectors and descriptors under a wide variety of different transformations. The benchmark allows evaluating how algorithms cope with certain classes of transformations and what is the strength of the transformations that can be dealt with. The present paper is a report of the SHREC’10 robust feature detection and description benchmark results.
SHREC 2010: robust correspondence benchmark
SHREC’10 robust correspondence benchmark simulates a one-to-one shape matching scenario, in which one of the shapes undergoes multiple modifications and transformations. The benchmark allows evaluating how correspondence algorithms cope with certain classes of transformations and what is the strength of the transformations that can be dealt with. The present paper is a report of the SHREC’10 robust correspondence benchmark results.
3D color video camera
We introduce a design of a coded light-based 3D color video camera optimized for build up cost as well as accuracy in depth reconstruction and acquisition speed. The components of the system include a monochromatic camera and an off-the-shelf LED projector synchronized by a miniature circuit. The projected patterns are captured and processed at a rate of 200 fps and allow for real-time reconstruction of both depth and color at video rates. The reconstruction and display are performed at around 30 depth profiles and color texture per second using a graphics processing unit (GPU).
ShapeGoogle: a computer vision approach for invariant shape retrieval
Feature-based methods have recently gained popularity in computer vision and pattern recognition communities, in applications such as object recognition and image retrieval. In this paper, we explore analogous approaches in the 3D world applied to the problem of non-rigid shape search and retrieval in large databases.
On reconstruction of non-rigid shapes with intrinsic regularization
Shape-from-X is a generic type of inverse problems in computer vision, in which a shape is reconstructed from some measurements. A specially challenging setting of this problem is the case in which the reconstructed shapes are non-rigid. In this paper, we propose a framework for intrinsic regularization of such problems. The assumption is that we have the geometric structure of a shape which is intrinsically (up to bending) similar to the one we would like to reconstruct. For that goal, we formulate a variation with respect to vertex coordinates of a triangulated mesh approximating the continuous shape. The numerical core of the proposed method is based on differentiating the fast marching update step for geodesic distance computation.
Topology-invariant similarity of nonrigid shapes
This paper explores the problem of similarity criteria between nonrigid shapes. Broadly speaking, such criteria are divided into intrinsic and extrinsic, the first referring to the metric structure of the object and the latter to how it is laid out in the Euclidean space. Both criteria have their advantages and disadvantages: extrinsic similarity is sensitive to nonrigid deformations, while intrinsic similarity is sensitive to topological noise. In this paper, we approach the problem from the perspective of metric geometry. We show that by unifying the extrinsic and intrinsic similarity criteria, it is possible to obtain a stronger topology-invariant similarity, suitable for comparing deformed shapes with different topology. We construct this new joint criterion as a tradeoff between the extrinsic and intrinsic similarity and use it as a set-valued distance. Numerical results demonstrate the efficiency of our approach in cases where using either extrinsic or intrinsic criteria alone would fail.
Partial similarity of objects, or how to compare a centaur to a horse
Similarity is one of the most important abstract concepts in human perception of the world. In computer vision, numerous applications deal with comparing objects observed in a scene with some a priori known patterns. Often, it happens that while two objects are not similar, they have large similar parts, that is, they are partially similar. Here, we present a novel approach to quantify partial similarity using the notion of Pareto optimality. We exemplify our approach on the problems of recognizing non-rigid geometric objects, images, and analyzing text sequences.
Partial similarity of shapes using a statistical significance measure
Partial matching of geometric structures is important in computer vision, pattern recognition and shape analysis applications. The problem consists of matching similar parts of shapes that may be dissimilar as a whole. Recently, it was proposed to consider partial similarity as a multi-criterion optimization problem trying to simultaneously maximize the similarity and the significance of the matching parts. A major challenge in that framework is providing a quantitative measure of the significance of a part of an object. Here, we define the significance of a part of a shape by its discriminative power with respect do a given shape database—that is, the uniqueness of the part. We define a point-wise significance density using a statistical weighting approach similar to the term frequency-inverse document frequency (tfidf) weighting employed in search engines. The significance measure of a given part is obtained by integrating over this density. Numerical experiments show that the proposed approach produces intuitive significant parts, and demonstrate an improvement in the performance of partial matching between shapes.
Parallel algorithms for approximation of distance maps on parametric surfaces
We present an efficient O(n) numerical algorithm for first-order approximation of geodesic distances on geometry images, where n is the number of points on the surface. The structure of our algorithm allows efficient implementation on parallel architectures. Two implementations on a SIMD processor and on a GPU are discussed. Numerical results demonstrate up to four orders of magnitude improvement in execution time compared to the state-of-the-art algorithms.
Regularized partial matching of rigid shapes
Matching of rigid shapes is an important problem in numerous applications across the boundary of computer vision, pattern recognition and computer graphics communities. A particularly challenging setting of this problem is partial matching, where the two shapes are dissimilar in general but have significant similar parts. In this paper, we show a rigorous approach allowing to find matching parts of rigid shapes with controllable size and regularity. The regularity term we use is similar to the spirit of the Mumford-Shah functional, extended to non-Euclidean spaces. Numerical experiments show that the regularized partial matching produces better results compared to the non-regularized one.
Analysis of two-dimensional non-rigid shapes
Analysis of deformable two-dimensional shapes is an important problem, encountered in numerous pattern recognition, computer vision, and computer graphics applications. In this paper, we address three major problems in the analysis of non-rigid shapes: similarity, partial similarity, and correspondence. We present an axiomatic construction of similarity criteria for deformation-invariant shape comparison, based on intrinsic geometric properties of the shapes, and show that such criteria are related to the Gromov-Hausdorff distance. Next, we extend the problem of similarity computation to shapes which have similar parts but are dissimilar when considered as a whole and present a construction of set-valued distances, based on the notion of Pareto optimality. Finally, we show that the correspondence between non-rigid shapes can be obtained as a byproduct of the non-rigid similarity problem. As a numerical framework, we use the generalized multidimensional scaling (GMDS) method, which is the numerical core of the three problems addressed in this paper.
Not only size matters: regularized partial matching of nonrigid shapes
Partial matching is probably one of the most challenging problems in nonrigid shape analysis. The problem consists of matching similar parts of shapes that are dissimilar on the whole and can assume different forms by undergoing nonrigid deformations. Conceptually, two shapes can be considered partially matching if they have significant similar parts, with the simplest definition of significance being the size of the parts. Thus, partial matching can be defined as a multcriterion optimization problem trying to simultaneously maximize the similarity and the size of these parts. In this paper, we propose a different definition of significance, taking into account the regularity of parts besides their size. The regularity term proposed here is similar to the spirit of the Mumford-Shah functional. Numerical experiments show that the regularized partial matching produces semantically better results compared to the non-regularized one.
Numerical geometry of non-rigid shapes
Deformable objects are ubiquitous in the world surrounding us, on all levels from micro to macro. The need to study such shapes and model their behavior arises in a wide spectrum of applications, ranging from medicine to security. In recent years, non-rigid shapes have attracted growing interest, which has led to rapid development of the field, where state-of-the-art results from very different sciences – theoretical and numerical geometry, optimization, linear algebra, graph theory, machine learning and computer graphics, to mention several – are applied to find solutions.
This book gives an overview of the current state of science in analysis and synthesis of non-rigid shapes. Everyday examples are used to explain concepts and to illustrate different techniques. The presentation unfolds systematically and numerous figures enrich the engaging exposition. Practice problems follow at the end of each chapter, with detailed solutions to selected problems in the appendix. A gallery of colored images enhances the text.
This book will be of interest to graduate students, researchers and professionals in different fields of mathematics, computer science and engineering. It may be used for courses in computer vision, numerical geometry and geometric modeling and computer graphics or for self-study.
Topologically constrained isometric embedding
We present a new algorithm for nonlinear dimensionality reduction that consistently uses global information, which enables understanding the intrinsic geometry of non-convex manifolds. Compared to methods that consider only local information, our method appears to be more robust to noise. We demonstrate the performance of our algorithm and compare it to state-of-the-art methods on synthetic as well as real data.
Embedded system for 3D shape reconstruction
Many applications that use three-dimensional scanning require a low cost, accurate and fast solution. This paper presents a fixed-point implementation of a real time active stereo threedimensional acquisition system on a Texas Instruments DM6446 EVM board which meets these requirements. A time-multiplexed structured light reconstruction technique is described and a fixed point algorithm for its implementation is proposed. This technique uses a standard camera and a standard projector. The fixed point reconstruction algorithm runs on the DSP core while the ARM controls the DSP and is responsible for communication with the camera and projector. The ARM uses the projector to project coded light and the camera to capture a series of images. The captured data is sent to the DSP. The DSP, in turn, performs the 3D reconstruction and returns the results to the ARM for storing. The inter-core communication is performed using the xDM interface and VISA API. Performance evaluation of a fully working prototype proves the feasibility of a fixed-point embedded implementation of a real time three-dimensional scanner, and the suitability of the DM6446 chip for such a system.
Calculus of non-rigid surfaces for geometry and texture manipulation
We present a geometric framework for automatically finding intrinsic correspondence between three-dimensional nonrigid objects. We model object deformation as near isometries and find the correspondence as the minimum-distortion mapping. A generalization of multidimensional scaling is used as the numerical core of our approach. As a result, we obtain the possibility to manipulate the extrinsic geometry and the texture of the objects as vectors in a linear space. We demonstrate our method on the problems of expression-invariant texture mapping onto an animated three-dimensional face, expression exaggeration, morphing between faces, and virtual body painting.
Rock, Paper, and Scissors: extrinsic vs. intrinsic similarity of non-rigid shapes
This paper explores similarity criteria between non-rigid shapes. Broadly speaking, such criteria are divided into intrinsic and extrinsic, the first referring to the metric structure of the objects and the latter to the geometry of the shapes in the Euclidean space. Both criteria have their advantages and disadvantages; extrinsic similarity is sensitive to non-rigid deformations of the shapes, while intrinsic similarity is sensitive to topological noise. Here, we present an approach unifying both criteria in a single distance. Numerical results demonstrate the robustness of our approach in cases where using only extrinsic or intrinsic criteria fail.
Weighted distance maps computation on parametric three-dimensional manifolds
We propose an effcient computational solver for the eikonal equations on parametric three-dimensional manifolds. Our approach is based on the fast marching method for solving the eikonal equation in O(n log n) steps by numerically simulating wavefront propagation. The obtuse angle splitting problem is reformulated as a set of small integer linear programs, that can be solved in O(n). Numerical simulations demonstrate the accuracy of the proposed algorithm.
Paretian similarity for partial comparison of non-rigid objects
In this paper, we address the problem of partial comparison of non-rigid objects. We introduce a new class of set-valued distances, related to the concept of Pareto optimality in economics. Such distances allow to capture intrinsic geometric similarity between parts of non-rigid objects, obtaining semantically meaningful comparison results. The numerical implementation of our method is computationally efficient and is similar to GMDS, a multidimensional scaling-like continuous optimization problem.
Partial similarity of objects and text sequences
Similarity is one of the most important abstract concepts in the human perception of the world. In computer vision, numerous applications deal with comparing objects observed in a scene with some a priori known patterns. Often, it happens that while two objects are not similar, they have large similar parts, that is, they are partially similar. Here, we present a novel approach to quantify this semantic definition of partial similarity using the notion of Pareto optimality. We exemplify our approach on the problems of recognizing non-rigid objects and analyzing text sequences.
Expression-invariant representation of faces
We present an efficient computational framework for isometry-invariant comparison of smooth surfaces. We formulate the Gromov-Hausdorff distance as a multidimensional scaling (MDS)-like continuous optimization problem. In order to construct an efficient optimization scheme, we develop a numerical tool for interpolating geodesic distances on a sampled surface from precomputed geodesic distances between the samples. For isometry-invariant comparison of surfaces in the case of partially missing data, we present the partial embedding distance, which is computed using a similar scheme. The main idea is finding a minimum-distortion mapping from one surface to another while considering only relevant geodesic distances. We discuss numerical implementation issues and present experimental results that demonstrate its accuracy and efficiency.
Story of Cinderella: biometrics and isometry-invariant distances
In this chapter, we address the question of what are the facial measures one could use in order to distinguish between people. Our starting point is the fact that the expressions of our face can, in most cases, be modeled as isometries, which we validate empirically. Then, based on this observation, we introduce a technique that enables us to distinguish between people based on the intrinsic geometry of their faces. We provide empirical evidence that the proposed geometric measures are invariant to facial expressions and relate our findings to the broad context of biometric methods, ranging from modern face recognition technologies to fairy tales and biblical stories.
Symmetries of non-rigid shapes
Symmetry and self-similarity is the cornerstone of Nature, exhibiting itself through the shapes of natural creations and ubiquitous laws of physics. Since many natural objects are symmetric, the absence of symmetry can often be an indication of some anomaly or abnormal behavior. Therefore, detection of asymmetries is important in numerous practical applications, including crystallography, medical imaging, and face recognition, to mention a few. Conversely, the assumption of underlying shape symmetry can facilitate solutions to many problems in shape reconstruction and analysis. Traditionally, symmetries are described as extrinsic geometric properties of the shape. While being adequate for rigid shapes, such a description is inappropriate for non-rigid ones. Extrinsic symmetry can be broken as a result of shape deformations, while its intrinsic symmetry is preserved. In this paper, we pose the problem of finding intrinsic symmetries of non-rigid shapes and propose an efficient method for their computation.
Robust expression-invariant face recognition from partially missing data
Recent studies on three-dimensional face recognition proposed to model facial expressions as isometries of the facial surface. Based on this model, expression-invariant signatures of the face were constructed by means of approximate isometric embedding into flat spaces. Here, we apply a new method for measuring isometry-invariant similarity between faces by embedding one facial surface into another. We demonstrate that our approach has several significant advantages, one of which is the ability to handle partially missing data. Promising face recognition results are obtained in numerical experiments even when the facial surfaces are severely occluded.
Matching two-dimensional articulated shapes using generalized multidimensional scaling
We present a theoretical and computational framework for matching of two-dimensional articulated shapes. Assuming that articulations can be modeled as near-isometries, we show an axiomatic construction of an articulation-invariant distance between shapes, formulated as a generalized multidimensional scaling (GMDS) problem and solved efficiently. Some numerical results demonstrating the accuracy of our method are presented.
Face2Face: an isometric model for facial animation
A geometric framework for finding intrinsic correspondence between animated 3D faces is presented. We model facial expressions as isometries of the facial surface and find the correspondence between two faces as the minimum-distortion mapping. Generalized multidimensional scaling is used for this goal. We apply our approach to texture mapping onto 3D video, expression exaggeration and morphing between faces.
On separation of semitransparent dynamic images from static background
Presented here is the problem of recovering a dynamic image superimposed on a static background. Such a problem is ill-posed and may arise e.g. in imaging through semireflective media, in separation of an illumination image from a reflectance image, in imaging with diffraction phenomena, etc. In this work we study regularization of this problem in spirit of Total Variation and general sparsifying transformations.
Multigrid multidimensional scaling
Multidimensional scaling (MDS) is a generic name for a family of algorithms that construct a configuration of points in a target metric space from information about inter-point distances measured in some other metric space. Large-scale MDS problems often occur in data analysis, representation, and visualization. Solving such problems efficiently is of key importance in many applications. In this paper, we present a multigrid framework for MDS problems. We demonstrate the performance of our algorithm on dimensionality reduction and isometric embedding problems, two classical problems requiring efficient large-scale MDS. Simulation results show that the proposed approach significantly outperforms conventional MDS algorithms.
Generalized multidimensional scaling: a framework for isometry-invariant partial surface matching
An efficient algorithm for isometry-invariant matching of surfaces is presented. The key idea is computing the minimum-distortion mapping between two surfaces. For this purpose, we introduce the generalized multidimensional scaling, a computationally efficient continuous optimization algorithm for finding the least distortion embedding of one surface into another. The generalized multidimensional scaling algorithm allows for both full and partial surface matching. As an example, it is applied to the problem of expression- invariant three-dimensional face recognition.
Expression invariant face recognition: faces as isometric surfaces
One of the hardest problems in face recognition is dealing with facial expressions. Finding an expression-invariant representation of the face could be a remedy for this problem. We suggest treating faces as deformable surfaces in the context of Riemannian geometry, and propose to approximate facial expressions as isometries of the facial surface. This way, we can define geometric invariants of a given face under different expressions. One such invariant is constructed by isometrically embedding the facial surface structure into a low-dimensional flat space. Based on this approach, we built an accurate three-dimensional face recognition system that is able to distinguish between identical twins under various facial expressions. In this chapter we show how under the near-isometric model assumption, the difficult problem of face recognition in the presence of facial expressions can be solved in a relatively simple way.
Three-dimensional face recognition
An expression-invariant 3D face recognition approach is presented. Our basic assumption is that facial expressions can be modeled as isometries of the facial surface. This allows to construct expression-invariant representations of faces using the canonical forms approach. The result is an efficient and accurate face recognition algorithm, robust to facial expressions that can distinguish between identical twins (the first two authors). We demonstrate a prototype system based on the proposed algorithm and compare its performance to classical face recognition methods. The numerical methods employed by our approach do not require the facial surface explicitly. The surface gradients field, or the surface metric, are sufficient for constructing the expression-invariant representation of any given face. It allows us to perform the 3D face recognition task while avoiding the surface reconstruction stage.
Quasi maximum likelihood blind deconvolution: super- an sub-Gaussianity versus consistency
In this note we consider the problem of MIMO quasi maximum likelihood (QML) blind deconvolution. We examine two classes of estimators, which are commonly believed to be suitable for super- and sub-Gaussian sources. We state the consistency conditions and demonstrate a distribution, for which the studied estimators are unsuitable, in the sense that they are asymptotically unstable
Relative optimization for blind deconvolution
We propose a relative optimization framework for quasi-maximum likelihood (QML) blind deconvolution and the relative Newton method as its particular instance. The special Hessian structure allows fast Newton system construction and solution, resulting in a fast-convergent algorithm with iteration complexity comparable to that of gradient methods. We also propose the use of rational IIR restoration kernels, which constitute a richer family of filters than the traditionally used FIR kernels. We discuss different choices of non-linear functions suitable for deconvolution of super- and sub-Gaussian sources and formulate the conditions, under which the QML estimation is stable. Simulation results demonstrate the efficiency of the proposed methods.
Blind deconvolution of images using optimal sparse representations
The relative Newton algorithm, previously proposed for quasi-maximum likelihood blind source separation and blind deconvolution of one-dimensional signals is generalized for blind deconvolution of images. Smooth approximation of the absolute value is used in modeling the log probability density function, which is suitable for sparse sources. In addition, we propose a method of sparsification, which allows blind deconvolution of sources with arbitrary distribution, and show how to find optimal sparsifying transformations by training.
Expression-invariant face recognition via spherical embedding
Recently, it was proven empirically that facial expressions can be modeled as isometries, that is, geodesic distances on the facial surface were shown to be significantly less sensitive to facial expressions compared to Euclidean ones. Based on this assumption, the 3DFACE face recognition system was built. The system efficiently computes expression invariant signatures based on an isometry-invariant representation of the facial surface. One of the crucial steps in the recognition system was embedding of the face geometric structure into a Euclidean (flat) space. Here, we propose to replace the flat embedding by a spherical one to construct isometric invariant representations of the facial image. We refer to these new invariants as spherical canonical images. Compared to its Euclidean counterpart, spherical embedding leads to notably smaller metric distortion. We demonstrate experimentally that representations with lower embedding error lead to better recognition. In order to efficiently compute the invariants, we introduce a dissimilarity measure between the spherical canonical images based on the spherical harmonic transform.
Unmixing tissues: sparse component analysis in multi-contrast MRI
We pose the problem of tissue classification in MRI as a blind source separation (BSS) problem and solve it by means of sparse component analysis (SCA). Assuming that most MR images can be sparsely represented, we consider their optimal sparse representation. Sparse components define a physically-meaningful feature space for classification. We demonstrate our approach on simulated and real multi-contrast MRI data. The proposed framework is general in that it is applicable to other modalities of medical imaging as well, whenever the linear mixing model is applicable.
Isometric embedding of facial surfaces into S^3
The problem of isometry-invariant representation and comparison of surfaces is of cardinal importance in pattern recognition applications dealing with deformable objects. Particularly, in three-dimensional face recognition treating facial expressions as isometries of the facial surface allows to perform robust recognition insensitive to expressions. Isometry-invariant representation of surfaces can be constructed by isometrically embedding them into some convenient space, and carrying out the comparison in that space. Presented here is a discussion on isometric embedding into S3, which appears to be superior over the previously used Euclidean space in sense of the representation accuracy.
A multigrid approach for multi-dimensional scaling
A multigrid approach for the efficient solution of large-scale multidimensional scaling (MDS) problems is presented. The main motivation is a recent application of MDS to isometry-invariant representation of surfaces, in particular, for expression-invariant recognition of human faces. Simulation results show that the proposed approach significantly outperforms conventional MDS algorithms.
Sparse ICA for blind separation of transmitted and reflected images
We address the problem of recovering a scene recorded through a semi-reflecting medium (i.e. planar lens), with a virtual reflected image being superimposed on the image of the scene transmitted through the semi-reflective lens. Recent studies propose imaging through a linear polarizer at several orientations to estimate the reflected and the transmitted components in the scene. In this stud,y we extend the sparse ICA (SPICA) technique and apply it to the problem of separating the image of the scene without having any a priori knowledge about its structure or statistics. Recent novel advances in the SPICA approach are discussed. Simulation and experimental results demonstrate the efficacy of the proposed methods.
Fusion of 2D and 3D data in three-dimensional face recognition
We discuss the synthesis between the 3D and the 2D data in three-dimensional face recognition. We show how to compensate for the illumination and facial expressions using the 3D facial geometry and present the approach of canonical images, which allows to incorporate geometric information into standard face recognition approaches.
Optimal sparse representations for blind source separation and blind deconvolution: a learning approach
We present a generic approach, which allows to adapt sparse blind deconvolution and blind source separation algorithms to arbitrary sources. The key idea is to bring the problem to the case in which the underlying sources are sparse by applying a sparsifying transformation on the mixtures. We present simulation results and show that such transformation can be found by training. Properties of the optimal sparsifying transformation are highlighted by an example with aerial images.
Fast relative Newton algorithm for blind deconvolution of images
We present an efficient Newton-like algorithm for quasi-maximum likelihood (QML) blind deconvolution of images. This algorithm exploits the sparse structure of the Hessian. An optimal distribution-shaping approach by means of sparsification allows one to use simple and convenient sparsity prior for processing of a wide range of natural images. Simulation results demonstrate the efficiency of the proposed method.
Face recognition from facial surface metric
Recently, a 3D face recognition approach based on geometric invariant signatures, has been proposed. The key idea is a representation of the facial surface, invariant to isometric deformations, such as those resulting from facial expressions. One important stage in the construction of the geometric invariants involves in measuring geodesic distances on triangulated surfaces, which is carried out by the fast marching on triangulated domains algorithm. Proposed here is a method that uses only the metric tensor of the surface for geodesic distance computation. That is, the explicit integration of the surface in 3D from its gradients is not needed for the recognition task. It enables the use of simple and cost-efficient 3D acquisition techniques such as photometric stereo. Avoiding the explicit surface reconstruction stage saves computational time and reduces numerical errors.
Blind source separation using block-coordinate relative Newton method
Presented here is a generalization of the relative Newton method, recently proposed for quasi maximum likelihood blind source separation. Special structure of the Hessian matrix allows performing block-coordinate Newton descent, which significantly reduces the algorithm computational complexity and boosts its performance. Simulations based on artificial and real data showed that the separation quality using the proposed algorithm is superior compared to other accepted blind source separation methods.
Blind source separation using the block-coordinate relative Newton method
Presented here is a generalization of the modified relative Newton method, recently proposed by Zibulevsky for quasi-maximum likelihood blind source separation. The special structure of the Hessian matrix allows to perform block-coordinate Newton descent, which significantly reduces the algorithm computational complexity and boosts its performance. Simulations based on artificial and real data show that the separation quality using the proposed algorithm outperforms other accepted blind source separation methods.
QML blind deconvolution: asymptotic analysis
Blind deconvolution is considered as a problem of quasi-maximum likelihood (QML) estimation of the restoration kernel. Simple closed-form expressions for the asymptotic estimation error are derived. The asymptotic performance bounds coincide with the Cramér-Rao bounds, when the true ML estimator is used. Conditions for asymptotic stability of the QML estimator are derived. Special cases when the estimator is super-efficient are discussed.
Optimal sparse representations for blind deconvolution of images
The relative Newton algorithm, previously proposed for quasi-maximum likelihood blind source separation and blind deconvolution of one-dimensional signals is generalized for blind deconvolution of images. Smooth approximation of the absolute value is used in modeling the log probability density function, which is suitable for sparse sources. We propose a method of sparsification, which allows blind deconvolution of sources with arbitrary distribution, and show how to find optimal sparsifying transformations by training.
Quasi maximum likelihood blind deconvolution of images acquired through scattering media
We address the problem of restoration of images obtained through a scattering medium. We present an efficient quasi-maximum likelihood blind deconvolution approach based on the fast relative Newton algorithm and optimal distribution shaping approach (sparsification), which allows to use simple and convenient sparsity prior for a wide class of images. Simulation results prove the efficiency of the proposed method.
Optimal nonlinear line-of-flight estimation in positron emission tomography
We consider detection of high-energy photons in PET using thick scintillation crystals. Parallax effect and multiple Compton interactions such crystals significantly reduce the accuracy of conventional detection methods. In order to estimate the photon line of flight based on photomultiplier responses, we use asymptotically optimal nonlinear techniques, implemented by feedforward and radial basis function (RBF) neural networks. Incorporation of information about angles of incidence of photons significantly improves the accuracy of estimation. The proposed estimators are fast enough to perform detection, using conventional computers. Monte-Carlo simulation results show that our approach significantly outperforms the conventional Anger algorithm.
Separation of semireflective layers using Sparse ICA
We address the problem of Blind Source Separation (BSS) of superimposed images and, in particular, consider the recovery of a scene recorded through a semi-refective medium (e.g. glass windshield) from its mixture with a virtual reflected image. We extend the Sparse ICA (SPICA) approach to BSS and apply it to the separation of the desired image from the superimposed images, without having any a priori knowledge about its structure and/or statistics. Advances in the SPICA approach are discussed. Simulations and experimental results illustrate the efficiency of the proposed approach, and of its specific implementation in a simple algorithm of a low computational cost. The approach and the algorithm are generic in that they can be adapted and applied to a wide range of BSS problems involving one-dimensional signals or images.
Expression-invariant 3D face recognition
We present a novel 3D face recognition approach based on geometric invariants introduced by Elad and Kimmel. The key idea of the proposed algorithm is a representation of the facial surface, invariant to isometric deformations, such as those resulting from different expressions and postures of the face. The obtained geometric invariants allow mapping 2D facial texture images into special images that incorporate the 3D geometry of the face. These signature images are then decomposed into their principal components. The result is an efficient and accurate face recognition algorithm that is robust to facial expressions. We demonstrate the results of our method and compare it to existing 2D and 3D face recognition algorithms.
Reconstruction in ultrasound diffraction tomography using non-uniform FFT
We show an iterative reconstruction framework for diffraction ultrasound tomography. The use of broadband illumination allows a significant reduction of the number of projections compared to straight ray tomography. The proposed algorithm makes use of the forward nonuniform fast Fourier transform (NUFFT) for iterative Fourier inversion. Incorporation of total variation regularization allows the reduction of noise and Gibbs phenomena while preserving the edges. The complexity of the NUFFT-based reconstruction is comparable to the frequency domain interpolation (gridding) algorithm, whereas the reconstruction accuracy (in sense of the L2 and the L∞ norm) is better.
Iterative reconstruction in diffraction tomography using non-uniform fast Fourier transform
We show an iterative reconstruction framework for diffraction ultrasound tomography. The use of broadband illumination allows the number of projections to be reduced significantly compared to straight ray tomography. The proposed algorithm makes use of fast forward non-uniform Fourier transform (NUFFT) for iterative Fourier inversion. Incorporation of total variation regularization allows noise and Gibbs phenomena to be reduced whilst preserving the edges.
Optimal nonlinear estimation of photon coordinates in PET
We consider detection of high-energy photons in PET using thick scintillation crystals. Parallax effect and multiple Compton interactions in this type of crystals significantly reduce the accuracy of conventional detection methods. In order to estimate the scintillation point coordinates based on photomultiplier responses, we use asymptotically optimal nonlinear techniques, implemented by feed-forward neural networks, radial basis functions (RBF) networks, and neuro-fuzzy systems. Incorporation of information about angles of incidence of photons significantly improves the accuracy of estimation. The proposed estimators are fast enough to perform detection using conventional computers.