Publications

Topics:
  1. A. Boyarski, S. Vedula, A. M. Bronstein, Deep Matrix Factorization with Spectral Geometric Regularization, arXiv: 1911.07255, 2019 details

    Deep Matrix Factorization with Spectral Geometric Regularization

    A. Boyarski, S. Vedula, A. M. Bronstein
    arXiv: 1911.07255, 2019

    We address the problem of reconstructing a matrix from a subset of its entries. Current methods, branded as geometric matrix completion, augment classical rank regularization techniques by incorporating geometric information into the solution. This information is usually provided as graphs encoding relations between rows/columns. In this work, we propose a simple spectral approach for solving the matrix completion problem, via the framework of functional maps. We introduce the zoomout loss, a multiresolution spectral geometric loss inspired by recent advances in shape correspondence, whose minimization leads to state-of-the-art results on various recommender systems datasets. Surprisingly, for some datasets, we were able to achieve comparable results even without incorporating geometric information. This puts into question both the quality of such information and current methods’ ability to use it in a meaningful and efficient way.

     

    Code is available either as Google Colab notebook, or via https://github.com/amitboy/SGMC

    Y. Nahshan, B. Chmiel, C. Baskin, E. Zheltonozhskii, R. Banner, A. M. Bronstein, A. Mendelson, Loss aware post-training quantization, arXiv: 1911.07190, 2019 details

    Loss aware post-training quantization

    Y. Nahshan, B. Chmiel, C. Baskin, E. Zheltonozhskii, R. Banner, A. M. Bronstein, A. Mendelson
    arXiv: 1911.07190, 2019

    Neural network quantization enables the deployment of large models on resource-constrained devices. Current post-training quantization methods fall short in terms of accuracy for INT4 (or lower) but provide reasonable accuracy for INT8 (or above). In this work, we study the effect of quantization on the structure of the loss landscape. We show that the structure is flat and separable for mild quantization, enabling straightforward post-training quantization methods to achieve good results. On the other hand, we show that with more aggressive quantization, the loss landscape becomes highly non-separable with sharp minima points, making the selection of quantization parameters more challenging. Armed with this understanding, we design a method that quantizes the layer parameters jointly, enabling significant accuracy improvement over current post-training quantization methods. Reference implementation accompanies the paper.

    Y. Nemcovsky, E. Zheltonozhskii, C. Baskin, B. Chmiel, A. M. Bronstein, A. Mendelson, Smoothed inference for adversarially-trained models, arXiv: 1911.07198, 2019 details

    Smoothed inference for adversarially-trained models

    Y. Nemcovsky, E. Zheltonozhskii, C. Baskin, B. Chmiel, A. M. Bronstein, A. Mendelson
    arXiv: 1911.07198, 2019

    Deep neural networks are known to be vulnerable to inputs with maliciously constructed adversarial perturbations aimed at forcing misclassification. We study randomized smoothing as a way to both improve performance on unperturbed data as well as increase robustness to adversarial attacks. Moreover, we extend the method proposed by arXiv:1811.09310 by adding low-rank multivariate noise, which we then use as a base model for smoothing. The proposed method achieves 58.5% top-1 accuracy on CIFAR-10 under PGD attack and outperforms previous works by 4%. In addition, we consider a family of attacks, which were previously used for training purposes in the certified robustness scheme. We demonstrate that the proposed attacks are more effective than PGD against both smoothed and non-smoothed models. Since our method is based on sampling, it lends itself well for trading-off between the model inference complexity and its performance. A reference implementation of the proposed techniques is provided.

    S. Doveh, E. Schwartz, C. Xue, R. Feris, A. M. Bronstein, R. Giryes, L. Karlinsky, MetAdapt: Meta-learned task-adaptive architecture for few-shot classification, arXiv: 1912.00412, 2019 details

    MetAdapt: Meta-learned task-adaptive architecture for few-shot classification

    S. Doveh, E. Schwartz, C. Xue, R. Feris, A. M. Bronstein, R. Giryes, L. Karlinsky
    arXiv: 1912.00412, 2019

    Few-Shot Learning (FSL) is a topic of rapidly growing interest. Typically, in FSL a model is trained on a dataset consisting of many small tasks (meta-tasks) and learns to adapt to novel tasks that it will encounter during test time. This is also referred to as meta-learning. So far, meta-learning FSL methods have focused on optimizing parameters of pre-defined network architectures, in order to make them easily adaptable to novel tasks. Moreover, it was observed that, in general, larger architectures perform better than smaller ones up to a certain saturation point (and even degrade due to over-fitting). However, little attention has been given to explicitly optimizing the architectures for FSL, nor to an adaptation of the architecture at test time to particular novel tasks. In this work, we propose to employ tools borrowed from the Differentiable Neural Architecture Search (D-NAS) literature in order to optimize the architecture for FSL without over-fitting. Additionally, to make the architecture task adaptive, we propose the concept of `MetAdapt Controller’ modules. These modules are added to the model and are meta-trained to predict the optimal network connections for a given novel task. Using the proposed approach we observe state-of-the-art results on two popular few-shot benchmarks: miniImageNet and FC100.

    T. Weiss, O. Senouf, S. Vedula, O. Michailovich, M. Zibulevsky, A. M. Bronstein, PILOT: Physics-Informed Learned Optimal Trajectories for accelerated MRI, arXiv:1909.05773, 2019 details

    PILOT: Physics-Informed Learned Optimal Trajectories for accelerated MRI

    T. Weiss, O. Senouf, S. Vedula, O. Michailovich, M. Zibulevsky, A. M. Bronstein
    arXiv:1909.05773, 2019

    Magnetic Resonance Imaging (MRI) has long been considered to be among “the gold standards” of diagnostic medical imaging. The long acquisition times, however, render MRI prone to motion artifacts, let alone their adverse contribution to the relatively high costs of MRI examination. Over the last few decades, multiple studies have focused on the development of both physical and post-processing methods for accelerated acquisition of MRI scans. These two approaches, however, have so far been addressed separately. On the other hand, recent works in optical computational imaging have demonstrated growing success of the concurrent learning-based design of data acquisition and image reconstruction schemes. Such schemes have already demonstrated substantial effectiveness, leading to considerably shorter acquisition times and improved quality of image reconstruction. Inspired by this initial success, in this work, we propose a novel approach to the learning of optimal schemes for conjoint acquisition and reconstruction of MRI scans, with the optimization, carried out simultaneously with respect to the time-efficiency of data acquisition and the quality of resulting reconstructions. To be of practical value, the schemes are encoded in the form of general k-space trajectories, whose associated magnetic gradients are constrained to obey a set of predefined hardware requirements (as defined in terms of, e.g., peak currents and maximum slew rates of magnetic gradients). With this proviso in mind, we propose a novel algorithm for the end-to-end training of a combined acquisition-reconstruction pipeline using a deep neural network with differentiable forward- and backpropagation operators. We also demonstrate the effectiveness of the proposed solution in application to both image reconstruction and image segmentation, reporting substantial improvements in terms of acceleration factors as well as the quality of these end tasks.

    E. Rozenberg, D. Freedman, A. M. Bronstein, Localization with limited annotation for chest X-rays, ML4H, NeuralIPS 2019 details

    Localization with limited annotation for chest X-rays

    E. Rozenberg, D. Freedman, A. M. Bronstein
    ML4H, NeuralIPS 2019

    Localization of an object within an image is a common task in medical imaging. Learning to localize or detect objects typically requires the collection of data which has been labelled with bounding boxes or similar annotations, which can be very time consuming and expensive. A technique which could perform such learning with much less annotation would, therefore, be quite valuable. We present such a technique for localization with limited annotation, in which the number of images with bounding boxes can be a small fraction of the total dataset (e.g. less than 1%); all other images only possess a whole image label and no bounding box. We propose a novel loss function for tackling this problem; the loss is a continuous relaxation of a well-defined discrete formulation of weakly supervised learning and is numerically well-posed. Furthermore, we propose a new architecture which accounts for both patch dependence and shift-invariance, through the inclusion of CRF layers and anti-aliasing filters, respectively. We apply our technique to the localization of thoracic diseases in chest X-ray images and demonstrate state-of-the-art localization performance on the ChestX-ray14 dataset.

    C. Baskin, B. Chmiel, E. Zheltonozhskii, R. Banner, A. M. Bronstein, A. Mendelson, CAT: Compression-aware training for bandwidth reduction, arXiv:1909.11481, 2019 details

    CAT: Compression-aware training for bandwidth reduction

    C. Baskin, B. Chmiel, E. Zheltonozhskii, R. Banner, A. M. Bronstein, A. Mendelson
    arXiv:1909.11481, 2019

    Convolutional neural networks (CNNs) have become the dominant neural network architecture for solving visual processing tasks. One of the major obstacles hindering the ubiquitous use of CNNs for inference is their relatively high memory bandwidth requirements, which can be a main energy consumer and throughput bottleneck in hardware accelerators. Accordingly, an efficient feature map compression method can result in substantial performance gains. Inspired by quantization-aware training approaches, we propose a compression-aware training (CAT) method that involves training the model in a way that allows better compression of feature maps during inference. Our method trains the model to achieve low-entropy feature maps, which enables efficient compression at inference time using classical transform coding methods. CAT significantly improves the state-of-the-art results reported for quantization. For example, on ResNet-34 we achieve 73.1% accuracy (0.2% degradation from the baseline) with an average representation of only 1.79 bits per value.

    S. Vedula, O. Senouf, G. Zurakov, A. M. Bronstein, O. Michailovich, M. Zibulevsky, Learning beamforming in ultrasound imaging, Proc. Medical Imaging with Deep Learning (MIDL), 2019 details

    Learning beamforming in ultrasound imaging

    S. Vedula, O. Senouf, G. Zurakov, A. M. Bronstein, O. Michailovich, M. Zibulevsky
    Proc. Medical Imaging with Deep Learning (MIDL), 2019
    Medical ultrasound (US) is a widespread imaging modality owing its popularity to cost-efficiency, portability, speed, and lack of harmful ionizing radiation. In this paper, we demonstrate that replacing the traditional ultrasound processing pipeline with a data-driven, learnable counterpart leads to signi cant improvement in image quality. Moreover, we demonstrate that greater improvement can be achieved through a learning-based design of the transmitted beam patterns simultaneously with learning an image reconstruction pipeline. We evaluate our method on an in-vivo fi rst-harmonic cardiac ultrasound dataset acquired from volunteers and demonstrate the signi cance of the learned pipeline and transmit beam patterns on the image quality when compared to standard transmit and receive beamformers used in high frame-rate US imaging. We believe that the presented methodology provides a fundamentally di erent perspective on the classical problem of ultrasound beam pattern design.
    E. Schwartz, L. Karlinsky, J. Shtok, S. Harary, M. Marder, R. Feris, A. Kumar, R. Giryes, A. M. Bronstein, RepMet: Representative-based metric learning for classification and one-shot object detection, Proc. Computer Vision and Pattern Recognition (CVPR), 2019 details

    RepMet: Representative-based metric learning for classification and one-shot object detection

    E. Schwartz, L. Karlinsky, J. Shtok, S. Harary, M. Marder, R. Feris, A. Kumar, R. Giryes, A. M. Bronstein
    Proc. Computer Vision and Pattern Recognition (CVPR), 2019

    Distance metric learning (DML) has been successfully applied to object classification, both in the standard regime of rich training data and in the few-shot scenario, where each category is represented by only few examples. In this work, we propose a new method for DML, featuring a joint learning of the embedding space and of the data distribution of the training categories, in a single training process. Our method improves upon leading algorithms for DML-based object classification. Furthermore, it opens the door for a new task in computer vision — a few-shot object detection, since the proposed DML architecture can be naturally embedded as the classification head of any standard object detector. In numerous experiments, we achieve state-of-the-art classification results on a variety of fine-grained datasets, and offer the community a benchmark on the few-shot detection task, performed on the Imagenet-LOC dataset.

    O. Halimi, O. Litany, E. Rodolà, A. M. Bronstein, R. Kimmel, Self-supervised learning of dense shape correspondence, Proc. Computer Vision and Pattern Recognition (CVPR), 2019 details

    Self-supervised learning of dense shape correspondence

    O. Halimi, O. Litany, E. Rodolà, A. M. Bronstein, R. Kimmel
    Proc. Computer Vision and Pattern Recognition (CVPR), 2019

    We introduce the first completely unsupervised correspondence learning approach for deformable 3D shapes. Key to our model is the understanding that natural deformations (such as changes in the pose) approximately preserve the metric structure of the surface, yielding a natural criterion to drive the learning process toward distortion-minimizing predictions. On this basis, we overcome the need for annotated data and replace it with a purely geometric criterion. The resulting learning model is class-agnostic and is able to leverage any type of deformable geometric data for the training phase. In contrast to existing supervised approaches which specialize in the class seen at training time, we demonstrate stronger generalization as well as applicability to a variety of challenging settings. We showcase our method on a wide selection of correspondence benchmarks, where we outperform other methods in terms of accuracy, generalization, and efficiency.

    A. Alfassy, L. Karlinsky, A. Aides, J. Shtok, S. Harary, R. Feris, R. Giryes, A. M. Bronstein, LaSO: Label-Set Operations networks for multi-label few-shot learning, Proc. Computer Vision and Pattern Recognition (CVPR), 2019 details

    LaSO: Label-Set Operations networks for multi-label few-shot learning

    A. Alfassy, L. Karlinsky, A. Aides, J. Shtok, S. Harary, R. Feris, R. Giryes, A. M. Bronstein
    Proc. Computer Vision and Pattern Recognition (CVPR), 2019

    Example synthesis is one of the leading methods to tackle the problem of few-shot learning, where only a small number of samples per class are available. However, current synthesis approaches only address the scenario of a single category label per image. In this work, we propose a novel technique for synthesizing samples with multiple labels for the (yet unhandled) multi-label few-shot classification scenario. We propose to combine pairs of given examples in feature space, so that the resulting synthesized feature vectors will correspond to examples whose label sets are obtained through certain set operations on the label sets of the corresponding input pairs. Thus, our method is capable of producing a sample containing the intersection, union or set-difference of labels present in two input samples. As we show, these set operations generalize to labels unseen during training. This enables performing augmentation on examples of novel categories, thus, facilitating multi-label few-shot classifier learning. We conduct numerous experiments showing promising results for the label-set manipulation capabilities of the proposed approach, both directly (using the classification and retrieval metrics), and in the context of performing data augmentation for multi-label few-shot learning. We propose a benchmark for this new and challenging task and show that our method compares favorably to all the common baselines.

    A. Zabatani, V. Surazhsky, E. Sperling, S. Ben Moshe, O. Menashe, D. H. Silver, Z. Karni, A. M. Bronstein, M. M. Bronstein, R. Kimmel, Intel RealSense SR300 Coded light depth Camera, IEEE Trans. Pattern Analysis and Machine Intelligence (PAMI), 2019 details

    Intel RealSense SR300 Coded light depth Camera

    A. Zabatani, V. Surazhsky, E. Sperling, S. Ben Moshe, O. Menashe, D. H. Silver, Z. Karni, A. M. Bronstein, M. M. Bronstein, R. Kimmel
    IEEE Trans. Pattern Analysis and Machine Intelligence (PAMI), 2019

    Intel RealSense SR300 is a depth camera capable of providing a VGA-size depth map at 60 fps and 0.125mm depth resolution. In addition, it outputs an infrared VGA-resolution image and a 1080p color texture image at 30 fps.
    SR300 form-factor enables it to be integrated into small consumer products and as a front-facing camera in laptops and Ultrabooks. The SR300 depth camera is based on a coded-light technology where triangulation between projected patterns and images captured by a dedicated sensor is used to produce the depth map. Each projected line is coded by a special temporal optical code, that enables a dense depth map reconstruction from its reflection. The solid mechanical assembly of the camera allows it to stay calibrated throughout temperature and pressure changes, drops, and hits. In addition, active dynamic control maintains a calibrated depth output. An extended API LibRS released with the camera allows developers to integrate the camera in various applications. Algorithms for 3D scanning, facial analysis, hand gesture recognition, and tracking are within reach for applications using the SR300. In this paper, we describe the underlying technology, hardware, and algorithms of the SR300, as well as its calibration procedure, and outline some use cases. We believe that this paper will provide a full case study of a mass-produced depth sensing product and technology.

    Y. Zur, C. Baskin, E. Zheltonozhskii, B. Chmiel, I. Evron, A. M. Bronstein, A. Mendelson, Towards learning of filter-level heterogeneous compression of convolutional neural networks, Proc. AutoML Workshop, Int'l Conf. on Machine Learning (ICML), 2019 details

    Towards learning of filter-level heterogeneous compression of convolutional neural networks

    Y. Zur, C. Baskin, E. Zheltonozhskii, B. Chmiel, I. Evron, A. M. Bronstein, A. Mendelson
    Proc. AutoML Workshop, Int'l Conf. on Machine Learning (ICML), 2019

    Recently, deep learning has become a de facto standard in machine learning with convolutional neural networks (CNNs) demonstrating spectacular success on a wide variety of tasks. However, CNNs are typically very demanding computationally at inference time. One of the ways to alleviate  this burden on certain hardware platforms is quantization relying on the use of low-precision arithmetic representation for the weights and the activations. Another popular method is the pruning of the number of filters in each layer. While mainstream deep learning methods train the neural networks weights while keeping the network architecture fixed, the emerging neural architecture search (NAS) techniques make the latter also amenable to training. In this paper, we formulate optimal arithmetic bit length allocation and neural network pruning as a NAS problem, searching for the configurations satisfying a computational complexity budget while maximizing the accuracy. We use a differentiable search method based on the continuous relaxation of the search space proposed by Liu et al. (2019a). We show, by grid search, that heterogeneous quantized networks suffer from a high variance which renders the benefit of the search questionable. For pruning, improvement over homogeneous cases is possible, but it is still challenging to find those configurations with the proposed method.  The code is publicly available at https://github.com/yochaiz/Slimmable and https://github.com/yochaiz/darts-UNIQ.

    T. Weiss, S. Vedula, O. Senouf, A. M. Bronstein, O. Michailovich, M. Zibulevsky, Joint learning of Cartesian undersampling and reconstruction for accelerated MRI, Proc. Int’l Conf. on Acoustics, Speech and Signal Processing (ICASSP), 2020 details

    Joint learning of Cartesian undersampling and reconstruction for accelerated MRI

    T. Weiss, S. Vedula, O. Senouf, A. M. Bronstein, O. Michailovich, M. Zibulevsky
    Proc. Int’l Conf. on Acoustics, Speech and Signal Processing (ICASSP), 2020

    Magnetic Resonance Imaging (MRI) is considered today the golden-standard modality for soft tissues. The long acquisition times, however, make it more prone to motion artifacts as well as contribute to the relatively high costs of this examination. Over the years, multiple studies concentrated on designing reduced measurement schemes and image reconstruction schemes for MRI, however, these problems have been so far addressed separately. On the other hand, recent works in optical computational imaging have demonstrated growing success of the simultaneous learning-based design of the acquisition and reconstruction schemes manifesting significant improvement in the reconstruction quality with a constrained time budget. Inspired by these successes, in this work, we propose to learn accelerated MR acquisition schemes (in the form of Cartesian trajectories) jointly with the image reconstruction operator. To this end, we propose an algorithm for training the combined acquisition-reconstruction pipeline end-to-end in a differentiable way. We demonstrate the significance of using the learned Cartesian trajectories at different speed up rates.

    B. Chmiel, C. Baskin, R. Banner, E. Zheltonozshkii, Y. Yermolin, A. Karbachevsky, A. M. Bronstein, A. Mendelson, Feature map transform coding for energy-efficient CNN inference, Proc. Intl. Joint Conf. on Neural Networks (IJCNN), 2020 details

    Feature map transform coding for energy-efficient CNN inference

    B. Chmiel, C. Baskin, R. Banner, E. Zheltonozshkii, Y. Yermolin, A. Karbachevsky, A. M. Bronstein, A. Mendelson
    Proc. Intl. Joint Conf. on Neural Networks (IJCNN), 2020

    Convolutional neural networks (CNNs) achieve state-of-the-art accuracy in a variety of tasks in computer vision and beyond. One of the major obstacles hindering the ubiquitous use of CNNs for inference on low-power edge devices is their relatively high computational complexity and memory bandwidth requirements. The latter often dominates the energy footprint on modern hardware. In this paper, we introduce a lossy transform coding approach, inspired by image and video compression, designed to reduce the memory bandwidth due to the storage of intermediate activation calculation results. Our method exploits the high correlations between feature maps and adjacent pixels and allows to halve the data transfer volumes to the main memory without re-training. We analyze the performance of our approach on a variety of CNN architectures and demonstrated FPGA implementation of ResNet18 with our approach results in a reduction of around 40% in the memory energy footprint compared to quantized network with negligible impact on accuracy. A reference implementation accompanies the paper.

    E. Amrani, R. Ben-Ari, T. Hakim, A. M. Bronstein, Toward self-supervised object detection in unlabelled videos, arXiv:1905.11137, 2019 details

    Toward self-supervised object detection in unlabelled videos

    E. Amrani, R. Ben-Ari, T. Hakim, A. M. Bronstein
    arXiv:1905.11137, 2019

    Unlabeled video in the wild presents a valuable, yet so far unharnessed, source of information for learning vision tasks. We present the first attempt of fully self-supervised learning of object detection from subtitled videos without any manual object annotation. To this end, we use the How2 multi-modal collection of instructional videos with English subtitles. We pose the problem as learning with a weakly- and noisily-labeled data, and propose a novel training model that can confront high noise levels, and yet train a classifier to localize the object of interest in the video frames, without any manual labeling involved. We evaluate our approach on a set of 11 manually annotated objects in over 5000 frames and compare it to an existing weakly-supervised approach as baseline. Benchmark data and code will be released upon acceptance of the paper.

    E. Schwartz, L. Karlinsky, R. Feris, R. Giryes, A. M. Bronstein, Baby steps towards few-shot learning with multiple semantics, arXiv:1906.01905, 2019 details

    Baby steps towards few-shot learning with multiple semantics

    E. Schwartz, L. Karlinsky, R. Feris, R. Giryes, A. M. Bronstein
    arXiv:1906.01905, 2019

    Learning from one or few visual examples is one of the key capabilities of humans since early infancy, but is still a significant challenge for modern AI systems. While considerable progress has been achieved in few-shot learning from a few image examples, much less attention has been given to the verbal descriptions that are usually provided to infants when they are presented with a new object. In this paper, we focus on the role of additional semantics that can significantly facilitate few-shot visual learning. Building upon recent advances in few-shot learning with additional semantic information, we demonstrate that further improvements are possible using richer semantics and multiple semantic sources. Using these ideas, we offer the community a new result on the one-shot test of the popular miniImageNet benchmark, comparing favorably to the previous state-of-the-art results for both visual only and visual plus semantics-based approaches. We also performed an ablation study investigating the components and design choices of our approach.

    A. Rampini, I. Tallini, M. Ovsjanikov, A. M. Bronstein, E. Rodola, Correspondence-free region localization for partial shape similarity via Hamiltonian spectrum alignment, Proc. 3D Vision (3DV), 2019 (Best paper award) details

    Correspondence-free region localization for partial shape similarity via Hamiltonian spectrum alignment

    A. Rampini, I. Tallini, M. Ovsjanikov, A. M. Bronstein, E. Rodola
    Proc. 3D Vision (3DV), 2019 (Best paper award)

    We consider the problem of localizing relevant subsets of non-rigid geometric shapes given only a partial 3D query as the input. Such problems arise in several challenging tasks in 3D vision and graphics, including partial shape similarity, retrieval, and non-rigid correspondence. We phrase the problem as one of alignment between short sequences of eigenvalues of basic differential operators, which are constructed upon a scalar function defined on the 3D surfaces. Our method therefore seeks for a scalar function that entails this alignment. Differently from existing approaches, we do not require solving for a correspondence between the query and the target, therefore greatly simplifying the optimization process; our core technique is also descriptor-free, as it is driven by the geometry of the two objects as encoded in their operator spectra. We further show that our spectral alignment algorithm provides a remarkably simple alternative to the recent shape-from-spectrum reconstruction approaches. For both applications, we demonstrate improvement over the state-of-the-art either in terms of accuracy or computational cost.

    O. Senouf, S. Vedula, T. Weiss, A. M. Bronstein, O. Michailovich, M. Zibulevsky, Self-supervised learning of inverse problem solvers in medical imaging, Proc. Medical Image Learning with Less Labels and Imperfect Data, MICCAI 2019 details

    Self-supervised learning of inverse problem solvers in medical imaging

    O. Senouf, S. Vedula, T. Weiss, A. M. Bronstein, O. Michailovich, M. Zibulevsky
    Proc. Medical Image Learning with Less Labels and Imperfect Data, MICCAI 2019

    In the past few years, deep learning-based methods have demonstrated enormous success for solving inverse problems in medical imaging. In this work, we address the following question: Given a set of measurements obtained from real imaging experiments, what is the best way to use a learnable model and the physics of the modality to solve the inverse problem and reconstruct the latent image? Standard supervised learning based methods approach this problem by collecting data sets of known latent images and their corresponding measurements. However, these methods are often impractical due to the lack of availability of appropriately sized training sets, and, more generally, due to the inherent difficulty in measuring the “groundtruth” latent image. In light of this, we propose a self-supervised approach to training inverse models in medical imaging in the absence of aligned data. Our method only requiring access to the measurements and the forward model at training. We showcase its effectiveness on inverse problems arising in accelerated magnetic resonance imaging (MRI).

    N. Diamant, D. Zadok, C. Baskin, E. Schwartz, A. M. Bronstein, Beholder-GAN: Generation and beautification of facial images with conditioning on their beauty level, Proc. Int'l Conf. on Image Processing (ICIP), 2019 details

    Beholder-GAN: Generation and beautification of facial images with conditioning on their beauty level

    N. Diamant, D. Zadok, C. Baskin, E. Schwartz, A. M. Bronstein
    Proc. Int'l Conf. on Image Processing (ICIP), 2019

    Beauty is in the eye of the beholder. This maxim, emphasizing the subjectivity of the perception of beauty, has enjoyed a wide consensus since ancient times. In the digital era, data-driven methods have been shown to be able to predict human-assigned beauty scores for facial images. In this work, we augment this ability and train a generative model that generates faces conditioned on a requested beauty score. In addition, we show how this trained generator can be used to beautify an input face image. By doing so, we achieve an unsupervised beautification model, in the sense that it relies on no ground truth target images.

    G. Pai, R. Talmon, A. M. Bronstein, R. Kimmel, DIMAL: Deep isometric manifold learning using sparse geodesic sampling, Proc. IEEE Winter Conf. on Applications of Computer Vision (WACV), 2019 details

    DIMAL: Deep isometric manifold learning using sparse geodesic sampling

    G. Pai, R. Talmon, A. M. Bronstein, R. Kimmel
    Proc. IEEE Winter Conf. on Applications of Computer Vision (WACV), 2019

    This paper explores a fully unsupervised deep learning approach for computing distance-preserving maps that generate low-dimensional embeddings for a certain class of manifolds. We use the Siamese configuration to train a neural network to solve the problem of least squares multidimensional scaling for generating maps that approximately preserve geodesic distances. By training with only a few landmarks, we show a significantly improved local and nonlocal generalization of the isometric mapping as compared to analogous non-parametric counterparts. Importantly, the combination of a deep-learning framework with a multidimensional scaling objective enables a numerical analysis of network architectures to aid in understanding their representation power. This provides a geometric perspective to the generalizability of deep learning.

    O. Litany, E. Rodolà, A. M. Bronstein, M. M. Bronstein, D. Cremers, Partial single- and multi-shape dense correspondence using functional maps, Chapter in The Handbook of Numerical Analysis - Processing, Analyzing and Learning of Images, Shapes, and Forms, Elsevier, 2019 details

    Partial single- and multi-shape dense correspondence using functional maps

    O. Litany, E. Rodolà, A. M. Bronstein, M. M. Bronstein, D. Cremers
    Chapter in The Handbook of Numerical Analysis - Processing, Analyzing and Learning of Images, Shapes, and Forms, Elsevier, 2019

    Shape correspondence is a fundamental problem in computer graphics and vision, with applications in various problems including animation, texture mapping, robotic vision, medical imaging, archaeology and many more. In settings where the shapes are allowed to undergo non-rigid deformations and only partial views are available, the problem becomes very challenging. In this chapter we describe recent techniques designed to tackle such problems. Specifically, we explain how the renown functional maps framework can be extended to tackle the partial setting. We then present a further extension to the mutli-part case in which one tries to establish correspondence between a collection of shapes. Finally, we focus on improving the technique efficiency, by disposing of its spatial ingredient and thus keeping the computation in the spectral domain. Extensive experimental results are provided along with the theoretical explanations, to demonstrate the effectiveness of the described methods in these challenging scenarios.

    A. Boyarski, A. M. Bronstein, Multidimensional scaling, Computer Vision: A Reference Guide, (Katsushi Ikeuchi, Ed.) details

    Multidimensional scaling

    A. Boyarski, A. M. Bronstein
    Computer Vision: A Reference Guide, (Katsushi Ikeuchi, Ed.)

    The various multidimensional scaling models can be broadly classified into metric vs. non-metric, and strain (classical scaling) vs. stress (distance scaling) based MDS models. In metric MDS the goal is to maintain the distances in the embedding space as close as possible to the given dissimilarities, while in nonmetric MDS only the order relations between the dissimilarities are important. Strain-based MDS is an algebraic version of the problem that can be solved by eigenvalue decomposition. Stress-based MDS uses a geometric distortion criterion which results in a non-linear and non-convex optimization problem. Each of these models has its own merits and drawbacks, both numerically and application-wise. On top of these basic models, there exist numerous generalizations, including embedding into non-Euclidean domains, working with different stress models, working in different subspaces, and incorporating machine learning approaches to obtain faster, more accurate and more robust embeddings. This chapter reviews these models, with emphasis on their role in computer vision applications.