Relevant publications

Deep Learning Algorithms and Hardware

C. Baskin, B. Chmiel, E. Zheltonozhskii, R. Banner, A. M. Bronstein, A. Mendelson, CAT: Compression-aware training for bandwidth reduction, JMLR, 2021 details

CAT: Compression-aware training for bandwidth reduction

C. Baskin, B. Chmiel, E. Zheltonozhskii, R. Banner, A. M. Bronstein, A. Mendelson
JMLR, 2021

Convolutional neural networks (CNNs) have become the dominant neural network architecture for solving visual processing tasks. One of the major obstacles hindering the ubiquitous use of CNNs for inference is their relatively high memory bandwidth requirements, which can be a main energy consumer and throughput bottleneck in hardware accelerators. Accordingly, an efficient feature map compression method can result in substantial performance gains. Inspired by quantization-aware training approaches, we propose a compression-aware training (CAT) method that involves training the model in a way that allows better compression of feature maps during inference. Our method trains the model to achieve low-entropy feature maps, which enables efficient compression at inference time using classical transform coding methods. CAT significantly improves the state-of-the-art results reported for quantization. For example, on ResNet-34 we achieve 73.1% accuracy (0.2% degradation from the baseline) with an average representation of only 1.79 bits per value.

E. Amrani, A. M. Bronstein, Self-supervised classification network, arXiv:2103.10994, 2021 details

Self-supervised classification network

E. Amrani, A. M. Bronstein
arXiv:2103.10994, 2021
Picture for Self-supervised classification network

We present Self-Classifier — a novel self-supervised end-to-end classification neural network. Self-Classifier learns labels and representations simultaneously in a single-stage end-to-end manner by optimizing for same-class prediction of two augmented views of the same sample. To guarantee non-degenerate solutions (i.e., solutions where all labels are assigned to the same class), a uniform prior is asserted on the labels. We show mathematically that unlike the regular cross-entropy loss, our approach avoids such solutions. Self-Classifier is simple to implement and is scalable to practically unlimited amounts of data. Unlike other unsupervised classification approaches, it does not require any form of pre-training or the use of expectation maximization algorithms, pseudo-labelling or external clustering. Unlike other contrastive learning representation learning approaches, it does not require a memory bank or a second network. Despite its relative simplicity, our approach achieves comparable results to state-of-the-art performance with ImageNet, CIFAR10 and CIFAR100 for its two objectives: unsupervised classification and unsupervised representation learning. Furthermore, it is the first unsupervised end-to-end classification network to perform well on the large-scale ImageNet dataset. Code will be made available.

E. Rozenberg, D. Freedman, A. M. Bronstein, Learning to Localize Objects Using Limited Annotation, With Applications to Thoracic Diseases, IEEE Access Vol. 9 details

Learning to Localize Objects Using Limited Annotation, With Applications to Thoracic Diseases

E. Rozenberg, D. Freedman, A. M. Bronstein
IEEE Access Vol. 9
Picture for Learning to Localize Objects Using Limited Annotation, With Applications to Thoracic Diseases

Motivation: The localization of objects in images is a longstanding objective within the field of image processing. Most current techniques are based on machine learning approaches, which typically require careful annotation of training samples in the form of expensive bounding box labels. The need for such large-scale annotation has only been exacerbated by the widespread adoption of deep learning techniques within the image processing community: deep learning is notoriously data-hungry. Method: In this work, we attack this problem directly by providing a new method for learning to localize objects with limited annotation: most training images can simply be annotated with their whole image labels (and no bounding box), with only a small fraction marked with bounding boxes. The training is driven by a novel loss function, which is a continuous relaxation of a well-defined discrete formulation of weakly supervised learning. Care is taken to ensure that the loss is numerically well-posed. Additionally, we propose a neural network architecture which accounts for both patch dependence, through the use of Conditional Random Field layers, and shift-invariance, through the inclusion of anti-aliasing filters. Results: We demonstrate our method on the task of localizing thoracic diseases in chest X-ray images, achieving state-of-the-art performance on the ChestX-ray14 dataset. We further show that with a modicum of additional effort our technique can be extended from object localization to object detection, attaining high quality results on the Kaggle RSNA Pneumonia Detection Challenge. Conclusion: The technique presented in this paper has the potential to enable high accuracy localization in regimes in which annotated data is either scarce or expensive to acquire. Future work will focus on applying the ideas presented in this paper to the realm of semantic segmentation.

T. Weiss, O. Senouf, S. Vedula, O. Michailovich, M. Zibulevsky, A. M. Bronstein, PILOT: Physics-Informed Learned Optimal Trajectories for accelerated MRI, Journal of Machine Learning for Biomedical Imaging (MELBA), 2021 details

PILOT: Physics-Informed Learned Optimal Trajectories for accelerated MRI

T. Weiss, O. Senouf, S. Vedula, O. Michailovich, M. Zibulevsky, A. M. Bronstein
Journal of Machine Learning for Biomedical Imaging (MELBA), 2021
Picture for PILOT: Physics-Informed Learned Optimal Trajectories for accelerated MRI

Magnetic Resonance Imaging (MRI) has long been considered to be among “the gold standards” of diagnostic medical imaging. The long acquisition times, however, render MRI prone to motion artifacts, let alone their adverse contribution to the relatively high costs of MRI examination. Over the last few decades, multiple studies have focused on the development of both physical and post-processing methods for accelerated acquisition of MRI scans. These two approaches, however, have so far been addressed separately. On the other hand, recent works in optical computational imaging have demonstrated growing success of the concurrent learning-based design of data acquisition and image reconstruction schemes. Such schemes have already demonstrated substantial effectiveness, leading to considerably shorter acquisition times and improved quality of image reconstruction. Inspired by this initial success, in this work, we propose a novel approach to the learning of optimal schemes for conjoint acquisition and reconstruction of MRI scans, with the optimization, carried out simultaneously with respect to the time-efficiency of data acquisition and the quality of resulting reconstructions. To be of practical value, the schemes are encoded in the form of general k-space trajectories, whose associated magnetic gradients are constrained to obey a set of predefined hardware requirements (as defined in terms of, e.g., peak currents and maximum slew rates of magnetic gradients). With this proviso in mind, we propose a novel algorithm for the end-to-end training of a combined acquisition-reconstruction pipeline using a deep neural network with differentiable forward- and backpropagation operators. We also demonstrate the effectiveness of the proposed solution in application to both image reconstruction and image segmentation, reporting substantial improvements in terms of acceleration factors as well as the quality of these end tasks.

A. Arbelle, S. Doveh, A. Alfassy, J. Shtok, G. Lev, E. Schwartz, H. Kuehne, H. Barak Levi, P. Sattigeri, R. Panda, C.-F. Chen, A. M. Bronstein, K. Saenko, S. Ullman, R. Giryes, R. Feris, L. Karlinsky, Detector-free weakly supervised grounding by separation, arXiv:2104.09829, 2021 details

Detector-free weakly supervised grounding by separation

A. Arbelle, S. Doveh, A. Alfassy, J. Shtok, G. Lev, E. Schwartz, H. Kuehne, H. Barak Levi, P. Sattigeri, R. Panda, C.-F. Chen, A. M. Bronstein, K. Saenko, S. Ullman, R. Giryes, R. Feris, L. Karlinsky
arXiv:2104.09829, 2021
Picture for Detector-free weakly supervised grounding by separation

Nowadays, there is an abundance of data involving images and surrounding free-form text weakly corresponding to those images. Weakly Supervised phrase-Grounding (WSG) deals with the task of using this data to learn to localize (or to ground) arbitrary text phrases in images without any additional annotations. However, most recent SotA methods for WSG assume the existence of a pre-trained object detector, relying on it to produce the ROIs for localization. In this work, we focus on the task of Detector-Free WSG (DF-WSG) to solve WSG without relying on a pre-trained detector. We directly learn everything from the images and associated free-form text pairs, thus potentially gaining an advantage on the categories unsupported by the detector. The key idea behind our proposed Grounding by Separation (GbS) method is synthesizing `text to image-regions’ associations by random alpha-blending of arbitrary image pairs and using the corresponding texts of the pair as conditions to recover the alpha map from the blended image via a segmentation network. At test time, this allows using the query phrase as a condition for a non-blended query image, thus interpreting the test image as a composition of a region corresponding to the phrase and the complement region. Using this approach we demonstrate a significant accuracy improvement, of up to 8.5% over previous DF-WSG SotA, for a range of benchmarks including Flickr30K, Visual Genome, and ReferIt, as well as a significant complementary improvement (above 7%) over the detector-based approaches for WSG.

Y. Elul, A. Rosenberg, A. Schuster, A. M. Bronstein, Y. Yaniv, Meeting the unmet needs of clinicians from AI systems showcased for cardiology with deep-learning-based ECG analysis, PNAS 2021 details

Meeting the unmet needs of clinicians from AI systems showcased for cardiology with deep-learning-based ECG analysis

Y. Elul, A. Rosenberg, A. Schuster, A. M. Bronstein, Y. Yaniv
PNAS 2021
Picture for Meeting the unmet needs of clinicians from AI systems showcased for cardiology with deep-learning-based ECG analysis

Despite their great promise, artificial intelligence (AI) systems have yet to become ubiquitous in the daily practice of medicine largely due to several crucial unmet needs of healthcare practitioners. These include lack of explanations in clinically meaningful terms, handling the presence of unknown medical conditions, and transparency regarding the system’s limitations, both in terms of statistical performance as well as recognizing situations for which the system’s predictions are irrelevant. We articulate these unmet clinical needs as machine-learning (ML) problems and systematically address them with cutting-edge ML techniques. We focus on electrocardiogram (ECG) analysis as an example domain in which AI has great potential and tackle two challenging tasks: the detection of a heterogeneous mix of known and unknown arrhythmias from ECG and the identification of underlying cardio-pathology from segments annotated as normal sinus rhythm recorded in patients with an intermittent arrhythmia. We validate our methods by simulating a screening for arrhythmias in a large-scale population while adhering to statistical significance requirements. Specifically, our system 1) visualizes the relative importance of each part of an ECG segment for the final model decision; 2) upholds specified statistical constraints on its out-of-sample performance and provides uncertainty estimation for its predictions; 3) handles inputs containing unknown rhythm types; and 4) handles data from unseen patients while also flagging cases in which the model’s outputs are not usable for a specific patient. This work represents a significant step toward overcoming the limitations currently impeding the integration of AI into clinical practice in cardiology and medicine in general.

E. Amrani, R. Ben-Ari, D. Rotman, A. M. Bronstein, Noise estimation using density estimation for self-supervised multimodal learning, AAAI, 2021 details

Noise estimation using density estimation for self-supervised multimodal learning

E. Amrani, R. Ben-Ari, D. Rotman, A. M. Bronstein
AAAI, 2021
Picture for Noise estimation using density estimation for self-supervised multimodal learning

One of the key factors of enabling machine learning models to comprehend and solve real-world tasks is to leverage multimodal data. Unfortunately, the annotation of multimodal data is challenging and expensive. Recently, self-supervised multimodal methods that combine vision and language were proposed to learn multimodal representations without annotation. However, these methods choose to ignore the presence of high levels of noise and thus yield sub-optimal results. In this work, we show that the problem of noise estimation for multimodal data can be reduced to a multimodal density estimation task. Using multimodal density estimation, we propose a noise estimation building block for multimodal representation learning that is based strictly on the inherent correlation between different modalities. We demonstrate how our noise estimation can be broadly integrated and achieves comparable results to state-of-the-art performance on five different benchmark datasets for two challenging multimodal tasks: Video Question Answering and Text-To-Video Retrieval.

O. Dahary, M. Jacoby, A. M. Bronstein, Digital Gimbal: End-to-end deep image stabilization with learnable exposure times, Proc. CVPR, 2021 details

Digital Gimbal: End-to-end deep image stabilization with learnable exposure times

O. Dahary, M. Jacoby, A. M. Bronstein
Proc. CVPR, 2021
Picture for Digital Gimbal: End-to-end deep image stabilization with learnable exposure times

Mechanical image stabilization using actuated gimbals enables capturing long-exposure shots without suffering from blur due to camera motion. These devices, however, are often physically cumbersome and expensive, limiting their widespread use. In this work, we propose to digitally emulate a mechanically stabilized system from the input of a fast unstabilized camera. To exploit the trade-off between motion blur at long exposures and low SNR at short exposures, we train a CNN that estimates a sharp high-SNR image by aggregating a burst of noisy short-exposure frames, related by unknown motion. We further suggest learning the burst’s exposure times in an end-to-end manner, thus balancing the noise and blur across the frames. We demonstrate this method’s advantage over the traditional approach of deblurring a single image or denoising a fixed-exposure burst.

A. Boyarski, S. Vedula, A. M. Bronstein, Spectral geometric matrix completion, Proc. Mathematical and Scientific Machine Learning, 2021 details

Spectral geometric matrix completion

A. Boyarski, S. Vedula, A. M. Bronstein
Proc. Mathematical and Scientific Machine Learning, 2021
Picture for Spectral geometric matrix completion

Deep Matrix Factorization (DMF) is an emerging approach to the problem of reconstructing a matrix from a subset of its entries. Recent works have established that gradient descent applied to a DMF model induces an implicit regularization on the rank of the recovered matrix. Despite these promising theoretical results, empirical evaluation of vanilla DMF on real benchmarks exhibits poor reconstructions which we attribute to the extremely low number of samples available. We propose an explicit spectral regularization scheme that is able to make DMF models competitive on real benchmarks, while still maintaining the implicit regularization induced by gradient descent, thus enjoying the best of both worlds.

E. Rozenberg, A. Karnieli, O. Yesharim, S. Trajtenberg-Mills, D. Freedman, A. M. Bronstein, A. Arie, Inverse Design of Quantum Holograms in Three-Dimensional Nonlinear Photonic Crystals, CLEO, 2021 details

Inverse Design of Quantum Holograms in Three-Dimensional Nonlinear Photonic Crystals

E. Rozenberg, A. Karnieli, O. Yesharim, S. Trajtenberg-Mills, D. Freedman, A. M. Bronstein, A. Arie
CLEO, 2021

We introduce a systematic approach for designing 3D nonlinear photonic crystals and pump beams for generating desired quantum correlations between structured photon-pairs. Our model is fully differentiable, allowing accurate and efficient learning and discovery of novel designs.

A. Karbachevsky, C. Baskin, E. Zheltonozshkii, Y. Yermolin, F. Gabbay, A. M. Bronstein, A. Mendelson, Early-stage neural network hardware performance analysis, Sustainability 13(2):717, 2021 details

Early-stage neural network hardware performance analysis

A. Karbachevsky, C. Baskin, E. Zheltonozshkii, Y. Yermolin, F. Gabbay, A. M. Bronstein, A. Mendelson
Sustainability 13(2):717, 2021
Picture for Early-stage neural network hardware performance analysis
The demand for running NNs in embedded environments has increased significantly in recent years due to the significant success of convolutional neural network (CNN) approaches in various tasks, including image recognition and generation. The task of achieving high accuracy on resource-restricted devices, however, is still considered to be challenging, which is mainly due to the vast number of design parameters that need to be balanced. While the quantization of CNN parameters leads to a reduction of power and area, it can also generate unexpected changes in the balance between communication and computation. This change is hard to evaluate, and the lack of balance may lead to lower utilization of either memory bandwidth or computational resources, thereby reducing performance. This paper introduces a hardware performance analysis framework for identifying bottlenecks in the early stages of CNN hardware design. We demonstrate how the proposed method can help in evaluating different architecture alternatives of resource-restricted CNN accelerators (e.g., part of real-time embedded systems) early in design stages and, thus, prevent making design mistakes.
Keywords: neural networks; accelerators; quantization; CNN architecture
C. Baskin, E. Schwartz, E. Zheltonozhskii, N. Liss, R. Giryes, A. M. Bronstein, A. Mendelson, UNIQ: Uniform noise injection for non-uniform quantization of neural networks, ACM Transactions on Computer Systems (TOCS), 2020 details

UNIQ: Uniform noise injection for non-uniform quantization of neural networks

C. Baskin, E. Schwartz, E. Zheltonozhskii, N. Liss, R. Giryes, A. M. Bronstein, A. Mendelson
ACM Transactions on Computer Systems (TOCS), 2020
Picture for UNIQ: Uniform noise injection for non-uniform quantization of neural networks

We present a novel method for training a neural network amenable to inference in low-precision arithmetic with quantized weights and activations. The training is performed in full precision with random noise injection emulating quantization noise. In order to circumvent the need to simulate realistic quantization noise distributions, the weight distributions are uniformized by a non-linear transfor- mation, and uniform noise is injected. This procedure emulates a non-uniform k-quantile quantizer at inference time, which adapts to the specific distribution of the quantized parameters. As a by-product of injecting noise to weights, we find that activations can also be quantized to as low as 8-bit with only a minor accuracy degradation. The method achieves state-of-the-art results for training low-precision networks on ImageNet. In particular, we observe no degradation in accuracy for MobileNet and ResNet-18/34/50 on ImageNet with as low as 4-bit quantization of weights. Our solution achieves the state-of-the-art results in accuracy, in the low computational budget regime, compared to similar models.

B. Finkelshtein, C. Baskin, E. Zheltonozhskii, U. Alon, Single-node attack for fooling graph neural networks, arXiv:2011.03574, 2020 details

Single-node attack for fooling graph neural networks

B. Finkelshtein, C. Baskin, E. Zheltonozhskii, U. Alon
arXiv:2011.03574, 2020

Graph neural networks (GNNs) have shown broad applicability in a variety of domains. Some of these domains, such as social networks and product recommendations, are fertile ground for malicious users and behavior. In this paper, we show that GNNs are vulnerable to the extremely limited scenario of a single-node adversarial example, where the node cannot be picked by the attacker. That is, an attacker can force the GNN to classify any target node to a chosen label by only slightly perturbing another single arbitrary node in the graph, even when not being able to pick that specific attacker node. When the adversary is allowed to pick a specific attacker node, the attack is even more effective. We show that this attack is effective across various GNN types, such as GraphSAGE, GCN, GAT, and GIN, across a variety of real-world datasets, and as a targeted and a non-targeted attack.

J. Alush-Aben, L. Ackerman-Schraier, T. Weiss, S. Vedula, O. Senouf, A. M. Bronstein, 3D FLAT: Feasible Learned Acquisition Trajectories for Accelerated MRI, Proc. Machine Learning for Medical Image Reconstruction, MICCAI 2020 details

3D FLAT: Feasible Learned Acquisition Trajectories for Accelerated MRI

J. Alush-Aben, L. Ackerman-Schraier, T. Weiss, S. Vedula, O. Senouf, A. M. Bronstein
Proc. Machine Learning for Medical Image Reconstruction, MICCAI 2020
Picture for 3D FLAT: Feasible Learned Acquisition Trajectories for Accelerated MRI

Magnetic Resonance Imaging (MRI) has long been considered to be among the gold standards of today’s diagnostic imaging. The most significant drawback of MRI is long acquisition times, prohibiting its use in standard practice for some applications. Compressed sensing (CS) proposes to subsample the k-space (the Fourier domain dual to the physical space of spatial coordinates) leading to significantly accelerated acquisition. However, the benefit of compressed sensing has not been fully  exploited; most of the sampling densities obtained through CS do not produce a trajectory that obeys the stringent constraints of the MRI machine imposed in practice. Inspired by recent success of deep learning-based approaches for image reconstruction and ideas from computational imaging on learning-based design of imaging systems, we introduce 3D FLAT, a novel protocol for data-driven design of 3D non-Cartesian accelerated trajectories in MRI. Our proposal leverages the entire 3D k-space to simultaneously learn a physically feasible acquisition trajectory with a reconstruction method. Experimental results, performed as a proof-of-concept, suggest that 3D FLAT achieves higher image quality for a given readout time compared to standard trajectories such as radial, stack-of-stars, or 2D learned trajectories (trajectories that evolve only in the 2D plane while fully sampling along the third dimension). Furthermore, we demonstrate evidence supporting the significant benefit of performing MRI acquisitions using non-Cartesian 3D trajectories over 2D non-Cartesian trajectories acquired slice-wise.

T. Weiss, S. Vedula, O. Senouf, O. Michailovich, A. M. Bronstein, Towards learned optimal q-space sampling in diffusion MRI, Proc. Computational Diffusion MRI, MICCAI 2020 details

Towards learned optimal q-space sampling in diffusion MRI

T. Weiss, S. Vedula, O. Senouf, O. Michailovich, A. M. Bronstein
Proc. Computational Diffusion MRI, MICCAI 2020

Fiber tractography is an important tool of computational neuroscience that enables reconstructing the spatial connectivity and organization of white matter of the brain. Fiber tractography takes advantage of diffusion Magnetic Resonance Imaging (dMRI) which allows measuring the apparent diffusivity of cerebral water along different spatial directions. Unfortunately, collecting such data comes at the price of reduced spatial resolution and substantially elevated acquisition times, which limits the clinical applicability of dMRI. This problem has been thus far addressed using two principal strategies. Most of the efforts have been extended towards improving the quality of signal estimation for any, yet fixed sampling scheme (defined through the choice of diffusion encoding gradients). On the other hand, optimization over the sampling scheme has also proven to be effective. Inspired by the previous results, the present work consolidates the above strategies into a unified estimation framework, in which the optimization is carried out with respect to both estimation model and sampling design concurrently. The proposed solution offers substantial improvements in the quality of signal estimation as well as the accuracy of ensuing analysis by means of fiber tractography. While proving the optimality of the learned estimation models would probably need more extensive evaluation, we nevertheless claim that the learned sampling schemes can be of immediate use, offering a way to improve the dMRI analysis without the necessity of deploying the neural network used for their estimation. We present a comprehensive comparative analysis based on the Human Connectome Project data.

E. Zheltonozhskii, C. Baskin, A. M. Bronstein, A. Mendelson, Self-supervised learning for large-scale unsupervised image clustering, NeurIPS 2020 Workshop: Self-Supervised Learning - Theory and Practice, 2020 details

Self-supervised learning for large-scale unsupervised image clustering

E. Zheltonozhskii, C. Baskin, A. M. Bronstein, A. Mendelson
NeurIPS 2020 Workshop: Self-Supervised Learning - Theory and Practice, 2020

Unsupervised learning has always been appealing to machine learning researchers and practitioners, allowing them to avoid an expensive and complicated process of labeling the data. However, unsupervised learning of complex data is challenging, and even the best approaches show much weaker performance than their supervised counterparts. Self-supervised deep learning has become a strong instrument for representation learning in computer vision. However, those methods have not been evaluated in a fully unsupervised setting.
In this paper, we propose a simple scheme for unsupervised classification based on self-supervised representations. We evaluate the proposed approach with several recent self-supervised methods showing that it achieves competitive results for ImageNet classification (39% accuracy on ImageNet with 1000 clusters and 46% with overclustering). We suggest adding the unsupervised evaluation to a set of standard benchmarks for self-supervised learning.

 

G. Mariani, L. Cosmo, A. M. Bronstein, E. Rodolà, Generating adversarial surfaces via band-limited perturbations, Computer Graphics Forum, 2020 details

Generating adversarial surfaces via band-limited perturbations

G. Mariani, L. Cosmo, A. M. Bronstein, E. Rodolà
Computer Graphics Forum, 2020

Adversarial attacks have demonstrated remarkable efficacy in altering the output of a learning model by applying a minimal perturbation to the input data. While increasing attention has been placed on the image domain, however, the study of adversarial perturbations for geometric data has been notably lagging behind. In this paper, we show that effective adversarial attacks can be concocted for surfaces embedded in 3D, under weak smoothness assumptions on the perceptibility of the attack. We address the case of deformable 3D shapes in particular, and introduce a general model that is not tailored to any specific surface representation, nor does it assume access to a parametric description of the 3D object.In this context, we consider targeted and untargeted variants of the attack, demonstrating compelling results in either case. We further show how discovering adversarial examples, and then using them for adversarial training, leads to an increase in both robustness and accuracy. Our findings are confirmed empirically over multiple datasets spanning different semantic classes and deformations.

E. Amrani, R. Ben-Ari, T. Hakim, A. M. Bronstein, Self-Supervised Object Detection and Retrieval Using Unlabeled Videos, CVPR workshop, 2020 details

Self-Supervised Object Detection and Retrieval Using Unlabeled Videos

E. Amrani, R. Ben-Ari, T. Hakim, A. M. Bronstein
CVPR workshop, 2020
Picture for Self-Supervised Object Detection and Retrieval Using Unlabeled Videos

Unlabeled video in the wild presents a valuable, yet so far unharnessed, source of information for learning vision tasks. We present the first attempt of fully self-supervised learning of object detection from subtitled videos without any manual object annotation. To this end, we use the How2 multi-modal collection of instructional videos with English subtitles. We pose the problem as learning with a weakly- and noisily-labeled data, and propose a novel training model that can confront high noise levels, and yet train a classifier to localize the object of interest in the video frames, without any manual labeling involved. We evaluate our approach on a set of 11 manually annotated objects in over 5000 frames and compare it to an existing weakly-supervised approach as baseline. Benchmark data and code will be released upon acceptance of the paper.

D. H. Silver, M. Feder, Y. Gold-Zamir, A. L. Polsky, S. Rosentraub, E. Shachor, A. Weinberger, P. Mazur, V. D. Zukin, A. M. Bronstein, Data-driven prediction of embryo implantation probability using IVF time-lapse imaging, Proc. MIDL, 2020 details

Data-driven prediction of embryo implantation probability using IVF time-lapse imaging

D. H. Silver, M. Feder, Y. Gold-Zamir, A. L. Polsky, S. Rosentraub, E. Shachor, A. Weinberger, P. Mazur, V. D. Zukin, A. M. Bronstein
Proc. MIDL, 2020

The process of fertilizing a human egg outside the body in order to help those suffering from infertility to conceive is known as in vitro fertilization (IVF). Despite being the most effective method of assisted reproductive technology (ART), the average success rate of IVF is a mere 20-40%. One step that is critical to the success of the procedure is selecting which embryo to transfer to the patient, a process typically conducted manually and without any universally accepted and standardized criteria. In this paper, we describe a novel data-driven system trained to directly predict embryo implantation probability from embryogenesis time-lapse imaging videos. Using retrospectively collected videos from 272 embryos, we demonstrate that, when compared to an external panel of embryologists, our algorithm results in a 12% increase of positive predictive value and a 29% increase of negative predictive value.

S. Sommer, A. M. Bronstein, Horizontal flows and manifold stochastics in geometric deep learning, IEEE Trans. Pattern Analysis and Machine Intelligence (PAMI), 2020 details

Horizontal flows and manifold stochastics in geometric deep learning

S. Sommer, A. M. Bronstein
IEEE Trans. Pattern Analysis and Machine Intelligence (PAMI), 2020
Picture for Horizontal flows and manifold stochastics in geometric deep learning

We introduce two constructions in geometric deep learning for 1) transporting orientation-dependent convolutional filters over a manifold in a continuous way and thereby defining a convolution operator that naturally incorporates the rotational effect of holonomy; and 2) allowing efficient evaluation of manifold convolution layers by sampling manifold valued random variables that center around a weighted Brownian motion maximum likelihood mean. Both methods are inspired by stochastics on manifolds and geometric statistics, and provide examples of how stochastic methods — here horizontal frame bundle flows and non-linear bridge sampling schemes, can be used in geometric deep learning. We outline the theoretical foundation of the two methods, discuss their relation to Euclidean deep networks and existing methodology in geometric deep learning, and establish important properties of the proposed constructions.

A. Tsitsulin, M. Munkhoeva, D. Mottin, P. Karras. A. M. Bronstein, I. Oseledets, E. Müller, Intrinsic multi-scale evaluation of generative models, Proc. ICLR, 2020 details

Intrinsic multi-scale evaluation of generative models

A. Tsitsulin, M. Munkhoeva, D. Mottin, P. Karras. A. M. Bronstein, I. Oseledets, E. Müller
Proc. ICLR, 2020
Picture for Intrinsic multi-scale evaluation of generative models

Generative models are often used to sample high-dimensional data points from a manifold with small intrinsic dimension. Existing techniques for comparing generative models focus on global data properties such as mean and covariance; in that sense, they are extrinsic and uni-scale. We develop the first, to our knowledge, intrinsic and multi-scale method for characterizing and comparing underlying data manifolds, based on comparing all data moments by lower-bounding the spectral notion of the Gromov-Wasserstein distance between manifolds. In a thorough experimental study, we demonstrate that our method effectively evaluates the quality of generative models; further, we showcase its efficacy in discerning the disentanglement process in neural networks.

A. Karbachevsky, C. Baskin, E. Zheltonozshkii, Y. Yermolin, F. Gabbay, A. M. Bronstein, A. Mendelson, HCM: Hardware-aware complexity metric for neural network architectures, arXiv:2004.08906, 2020 details

HCM: Hardware-aware complexity metric for neural network architectures

A. Karbachevsky, C. Baskin, E. Zheltonozshkii, Y. Yermolin, F. Gabbay, A. M. Bronstein, A. Mendelson
arXiv:2004.08906, 2020
Picture for HCM: Hardware-aware complexity metric for neural network architectures

Convolutional Neural Networks (CNNs) have become common in many fields including computer vision, speech recognition, and natural language processing. Although CNN hardware accelerators are already included as part of many SoC architectures, the task of achieving high accuracy on resource-restricted devices is still considered challenging, mainly due to the vast number of design parameters that need to be balanced to achieve an efficient solution. Quantization techniques, when applied to the network parameters, lead to a reduction of power and area and may also change the ratio between communication and computation. As a result, some algorithmic solutions may suffer from lack of memory bandwidth or computational resources and fail to achieve the expected performance due to hardware constraints. Thus, the system designer and the micro-architect need to understand at early development stages the impact of their high-level decisions (e.g., the architecture of the CNN and the amount of bits used to represent its parameters) on the final product (e.g., the expected power saving, area, and accuracy). Unfortunately, existing tools fall short of supporting such decisions. This paper introduces a hardware-aware complexity metric that aims to assist the system designer of the neural network architectures, through the entire project lifetime (especially at its early stages) by predicting the impact of architectural and micro-architectural decisions on the final product. We demonstrate how the proposed metric can help evaluate different design alternatives of neural network models on resource-restricted devices such as real-time embedded systems, and to avoid making design mistakes at early stages.

L. Karlinsky, J. Shtok, A. Alfassy, M. Lichtenstein, S. Harary, E. Schwartz, S. Doveh, P. Sattigeri, R. Feris, A. M. Bronstein, R. Giryes, StarNet: towards weakly supervised few-shot detection and explainable few-shot classification, AAAI, 2021 details

StarNet: towards weakly supervised few-shot detection and explainable few-shot classification

L. Karlinsky, J. Shtok, A. Alfassy, M. Lichtenstein, S. Harary, E. Schwartz, S. Doveh, P. Sattigeri, R. Feris, A. M. Bronstein, R. Giryes
AAAI, 2021
Picture for StarNet: towards weakly supervised few-shot detection and explainable few-shot classification

In this paper, we propose a new few-shot learning method called StarNet, which is an end-to-end trainable non-parametric star-model few-shot classifier. While being meta-trained using only image-level class labels, StarNet learns not only to predict the class labels for each query image of a few-shot task, but also to localize (via a heatmap) what it believes to be the key image regions supporting its prediction, thus effectively detecting the instances of the novel categories. The localization is enabled by the StarNet’s ability to find large, arbitrarily shaped, semantically matching regions between all pairs of support and query images of a few-shot task. We evaluate StarNet on multiple few-shot classification benchmarks attaining significant state-of-the-art improvement on the CUB and ImageNetLOC-FS, and smaller improvements on other benchmarks. At the same time, in many cases, StarNet provides plausible explanations for its class label predictions, by highlighting the correctly paired novel category instances on the query and on its best matching support (for the predicted class). In addition, we test the proposed approach on the previously unexplored and challenging task of Weakly Supervised Few-Shot Object Detection (WS-FSOD), obtaining significant improvements over the baselines.

E. Zheltonozhskii, C. Baskin, Y. Nemcovsky, B. Chmiel, A. Mendelson, A. M. Bronstein, Colored noise injection for training adversarially robust neural networks, arXiv:2003.02188, 2020 details

Colored noise injection for training adversarially robust neural networks

E. Zheltonozhskii, C. Baskin, Y. Nemcovsky, B. Chmiel, A. Mendelson, A. M. Bronstein
arXiv:2003.02188, 2020
Picture for Colored noise injection for training adversarially robust neural networks

Even though deep learning have shown unmatched performance on various tasks, neural networks has been shown to be vulnerable to small adversarial perturbation of the input which lead to significant performance degradation. In this work we extend the idea of adding independent Gaussian noise to weights and activation during adversarial training (PNI) to injection of colored noise for defense against common white-box and black-box attacks. We show that our approach outperforms PNI and various previous approaches in terms of adversarial accuracy on CIFAR-10 dataset. In addition, we provide an extensive ablation study of the proposed method justifying the chosen configurations.

A. Livne, A. M. Bronstein, R. Kimmel, Z. Aviv, S. Grofit, Do we need depth in state-of-the-art face authentication?, arXiv:2003.10895 2020 details

Do we need depth in state-of-the-art face authentication?

A. Livne, A. M. Bronstein, R. Kimmel, Z. Aviv, S. Grofit
arXiv:2003.10895 2020
Picture for Do we need depth in state-of-the-art face authentication?

Some face recognition methods are designed to utilize geometric features extracted from depth sensors to handle the challenges of single-image based recognition technologies. However, calculating the geometrical data is an expensive and challenging process. Here, we introduce a novel method that learns distinctive geometric features from stereo camera systems without the need to explicitly compute the facial surface or depth map. The raw face stereo images along with coordinate maps allow a CNN to learn geometric features. This way, we keep the simplicity and cost-efficiency of recognition from a single image, while enjoying the benefits of geometric data without explicitly reconstructing it. We demonstrate that the suggested method outperforms both existing single-image and explicit depth-based methods on large-scale benchmarks. We also provide an ablation study to show that the suggested method uses the coordinate maps to encode more informative features.

M. Shkolnik, B. Chmiel, R. Banner, G. Shomron, Y. Nahshan, A. M. Bronstein, U. Weiser, Robust Quantization: One Model to Rule Them All, NeurIPS 2020 details

Robust Quantization: One Model to Rule Them All

M. Shkolnik, B. Chmiel, R. Banner, G. Shomron, Y. Nahshan, A. M. Bronstein, U. Weiser
NeurIPS 2020
Picture for Robust Quantization: One Model to Rule Them All

Neural network quantization methods often involve simulating the quantization process during training. This makes the trained model highly dependent on the precise way quantization is performed. Since low-precision accelerators differ in their quantization policies and their supported mix of data-types, a model trained for one accelerator may not be suitable for another. To address this issue, we propose KURE, a method that provides intrinsic robustness to the model against a broad range of quantization implementations. We show that KURE yields a generic model that may be deployed on numerous inference accelerators without a significant loss in accuracy

A. Boyarski, S. Vedula, A. M. Bronstein, Deep matrix factorization with spectral geometric regularization, arXiv: 1911.07255, 2019 details

Deep matrix factorization with spectral geometric regularization

A. Boyarski, S. Vedula, A. M. Bronstein
arXiv: 1911.07255, 2019

We address the problem of reconstructing a matrix from a subset of its entries. Current methods, branded as geometric matrix completion, augment classical rank regularization techniques by incorporating geometric information into the solution. This information is usually provided as graphs encoding relations between rows/columns. In this work, we propose a simple spectral approach for solving the matrix completion problem, via the framework of functional maps. We introduce the zoomout loss, a multiresolution spectral geometric loss inspired by recent advances in shape correspondence, whose minimization leads to state-of-the-art results on various recommender systems datasets. Surprisingly, for some datasets, we were able to achieve comparable results even without incorporating geometric information. This puts into question both the quality of such information and current methods’ ability to use it in a meaningful and efficient way.

 

Code is available either as Google Colab notebook, or via https://github.com/amitboy/SGMC

Y. Nahshan, B. Chmiel, C. Baskin, E. Zheltonozhskii, R. Banner, A. M. Bronstein, A. Mendelson, Loss aware post-training quantization, arXiv: 1911.07190, 2019 details

Loss aware post-training quantization

Y. Nahshan, B. Chmiel, C. Baskin, E. Zheltonozhskii, R. Banner, A. M. Bronstein, A. Mendelson
arXiv: 1911.07190, 2019

Neural network quantization enables the deployment of large models on resource-constrained devices. Current post-training quantization methods fall short in terms of accuracy for INT4 (or lower) but provide reasonable accuracy for INT8 (or above). In this work, we study the effect of quantization on the structure of the loss landscape. We show that the structure is flat and separable for mild quantization, enabling straightforward post-training quantization methods to achieve good results. On the other hand, we show that with more aggressive quantization, the loss landscape becomes highly non-separable with sharp minima points, making the selection of quantization parameters more challenging. Armed with this understanding, we design a method that quantizes the layer parameters jointly, enabling significant accuracy improvement over current post-training quantization methods. Reference implementation accompanies the paper.

Y. Nemcovsky, E. Zheltonozhskii, C. Baskin, B. Chmiel, A. M. Bronstein, A. Mendelson, Smoothed inference for adversarially-trained models, arXiv: 1911.07198, 2019 details

Smoothed inference for adversarially-trained models

Y. Nemcovsky, E. Zheltonozhskii, C. Baskin, B. Chmiel, A. M. Bronstein, A. Mendelson
arXiv: 1911.07198, 2019

Deep neural networks are known to be vulnerable to inputs with maliciously constructed adversarial perturbations aimed at forcing misclassification. We study randomized smoothing as a way to both improve performance on unperturbed data as well as increase robustness to adversarial attacks. Moreover, we extend the method proposed by arXiv:1811.09310 by adding low-rank multivariate noise, which we then use as a base model for smoothing. The proposed method achieves 58.5% top-1 accuracy on CIFAR-10 under PGD attack and outperforms previous works by 4%. In addition, we consider a family of attacks, which were previously used for training purposes in the certified robustness scheme. We demonstrate that the proposed attacks are more effective than PGD against both smoothed and non-smoothed models. Since our method is based on sampling, it lends itself well for trading-off between the model inference complexity and its performance. A reference implementation of the proposed techniques is provided.

S. Doveh, E. Schwartz, C. Xue, R. Feris, A. M. Bronstein, R. Giryes, L. Karlinsky, MetAdapt: Meta-learned task-adaptive architecture for few-shot classification, arXiv: 1912.00412, 2019 details

MetAdapt: Meta-learned task-adaptive architecture for few-shot classification

S. Doveh, E. Schwartz, C. Xue, R. Feris, A. M. Bronstein, R. Giryes, L. Karlinsky
arXiv: 1912.00412, 2019
Picture for MetAdapt: Meta-learned task-adaptive architecture for few-shot classification

Few-Shot Learning (FSL) is a topic of rapidly growing interest. Typically, in FSL a model is trained on a dataset consisting of many small tasks (meta-tasks) and learns to adapt to novel tasks that it will encounter during test time. This is also referred to as meta-learning. So far, meta-learning FSL methods have focused on optimizing parameters of pre-defined network architectures, in order to make them easily adaptable to novel tasks. Moreover, it was observed that, in general, larger architectures perform better than smaller ones up to a certain saturation point (and even degrade due to over-fitting). However, little attention has been given to explicitly optimizing the architectures for FSL, nor to an adaptation of the architecture at test time to particular novel tasks. In this work, we propose to employ tools borrowed from the Differentiable Neural Architecture Search (D-NAS) literature in order to optimize the architecture for FSL without over-fitting. Additionally, to make the architecture task adaptive, we propose the concept of `MetAdapt Controller’ modules. These modules are added to the model and are meta-trained to predict the optimal network connections for a given novel task. Using the proposed approach we observe state-of-the-art results on two popular few-shot benchmarks: miniImageNet and FC100.

E. Rozenberg, D. Freedman, A. M. Bronstein, Localization with limited annotation for chest X-rays, ML4H, NeuralIPS 2019 details

Localization with limited annotation for chest X-rays

E. Rozenberg, D. Freedman, A. M. Bronstein
ML4H, NeuralIPS 2019
Picture for Localization with limited annotation for chest X-rays

Localization of an object within an image is a common task in medical imaging. Learning to localize or detect objects typically requires the collection of data which has been labelled with bounding boxes or similar annotations, which can be very time consuming and expensive. A technique which could perform such learning with much less annotation would, therefore, be quite valuable. We present such a technique for localization with limited annotation, in which the number of images with bounding boxes can be a small fraction of the total dataset (e.g. less than 1%); all other images only possess a whole image label and no bounding box. We propose a novel loss function for tackling this problem; the loss is a continuous relaxation of a well-defined discrete formulation of weakly supervised learning and is numerically well-posed. Furthermore, we propose a new architecture which accounts for both patch dependence and shift-invariance, through the inclusion of CRF layers and anti-aliasing filters, respectively. We apply our technique to the localization of thoracic diseases in chest X-ray images and demonstrate state-of-the-art localization performance on the ChestX-ray14 dataset.

S. Vedula, O. Senouf, G. Zurakov, A. M. Bronstein, O. Michailovich, M. Zibulevsky, Learning beamforming in ultrasound imaging, Proc. Medical Imaging with Deep Learning (MIDL), 2019 details

Learning beamforming in ultrasound imaging

S. Vedula, O. Senouf, G. Zurakov, A. M. Bronstein, O. Michailovich, M. Zibulevsky
Proc. Medical Imaging with Deep Learning (MIDL), 2019
Picture for Learning beamforming in ultrasound imaging
Medical ultrasound (US) is a widespread imaging modality owing its popularity to cost-efficiency, portability, speed, and lack of harmful ionizing radiation. In this paper, we demonstrate that replacing the traditional ultrasound processing pipeline with a data-driven, learnable counterpart leads to signi cant improvement in image quality. Moreover, we demonstrate that greater improvement can be achieved through a learning-based design of the transmitted beam patterns simultaneously with learning an image reconstruction pipeline. We evaluate our method on an in-vivo fi rst-harmonic cardiac ultrasound dataset acquired from volunteers and demonstrate the signi cance of the learned pipeline and transmit beam patterns on the image quality when compared to standard transmit and receive beamformers used in high frame-rate US imaging. We believe that the presented methodology provides a fundamentally di erent perspective on the classical problem of ultrasound beam pattern design.
E. Schwartz, L. Karlinsky, J. Shtok, S. Harary, M. Marder, R. Feris, A. Kumar, R. Giryes, A. M. Bronstein, RepMet: Representative-based metric learning for classification and one-shot object detection, Proc. Computer Vision and Pattern Recognition (CVPR), 2019 details

RepMet: Representative-based metric learning for classification and one-shot object detection

E. Schwartz, L. Karlinsky, J. Shtok, S. Harary, M. Marder, R. Feris, A. Kumar, R. Giryes, A. M. Bronstein
Proc. Computer Vision and Pattern Recognition (CVPR), 2019
Picture for RepMet: Representative-based metric learning for classification and one-shot object detection

Distance metric learning (DML) has been successfully applied to object classification, both in the standard regime of rich training data and in the few-shot scenario, where each category is represented by only few examples. In this work, we propose a new method for DML, featuring a joint learning of the embedding space and of the data distribution of the training categories, in a single training process. Our method improves upon leading algorithms for DML-based object classification. Furthermore, it opens the door for a new task in computer vision — a few-shot object detection, since the proposed DML architecture can be naturally embedded as the classification head of any standard object detector. In numerous experiments, we achieve state-of-the-art classification results on a variety of fine-grained datasets, and offer the community a benchmark on the few-shot detection task, performed on the Imagenet-LOC dataset.

O. Halimi, O. Litany, E. Rodolà, A. M. Bronstein, R. Kimmel, Self-supervised learning of dense shape correspondence, Proc. Computer Vision and Pattern Recognition (CVPR), 2019 details

Self-supervised learning of dense shape correspondence

O. Halimi, O. Litany, E. Rodolà, A. M. Bronstein, R. Kimmel
Proc. Computer Vision and Pattern Recognition (CVPR), 2019

We introduce the first completely unsupervised correspondence learning approach for deformable 3D shapes. Key to our model is the understanding that natural deformations (such as changes in the pose) approximately preserve the metric structure of the surface, yielding a natural criterion to drive the learning process toward distortion-minimizing predictions. On this basis, we overcome the need for annotated data and replace it with a purely geometric criterion. The resulting learning model is class-agnostic and is able to leverage any type of deformable geometric data for the training phase. In contrast to existing supervised approaches which specialize in the class seen at training time, we demonstrate stronger generalization as well as applicability to a variety of challenging settings. We showcase our method on a wide selection of correspondence benchmarks, where we outperform other methods in terms of accuracy, generalization, and efficiency.

A. Alfassy, L. Karlinsky, A. Aides, J. Shtok, S. Harary, R. Feris, R. Giryes, A. M. Bronstein, LaSO: Label-Set Operations networks for multi-label few-shot learning, Proc. Computer Vision and Pattern Recognition (CVPR), 2019 details

LaSO: Label-Set Operations networks for multi-label few-shot learning

A. Alfassy, L. Karlinsky, A. Aides, J. Shtok, S. Harary, R. Feris, R. Giryes, A. M. Bronstein
Proc. Computer Vision and Pattern Recognition (CVPR), 2019
Picture for LaSO: Label-Set Operations networks for multi-label few-shot learning

Example synthesis is one of the leading methods to tackle the problem of few-shot learning, where only a small number of samples per class are available. However, current synthesis approaches only address the scenario of a single category label per image. In this work, we propose a novel technique for synthesizing samples with multiple labels for the (yet unhandled) multi-label few-shot classification scenario. We propose to combine pairs of given examples in feature space, so that the resulting synthesized feature vectors will correspond to examples whose label sets are obtained through certain set operations on the label sets of the corresponding input pairs. Thus, our method is capable of producing a sample containing the intersection, union or set-difference of labels present in two input samples. As we show, these set operations generalize to labels unseen during training. This enables performing augmentation on examples of novel categories, thus, facilitating multi-label few-shot classifier learning. We conduct numerous experiments showing promising results for the label-set manipulation capabilities of the proposed approach, both directly (using the classification and retrieval metrics), and in the context of performing data augmentation for multi-label few-shot learning. We propose a benchmark for this new and challenging task and show that our method compares favorably to all the common baselines.

Y. Zur, C. Baskin, E. Zheltonozhskii, B. Chmiel, I. Evron, A. M. Bronstein, A. Mendelson, Towards learning of filter-level heterogeneous compression of convolutional neural networks, Proc. AutoML Workshop, Int'l Conf. on Machine Learning (ICML), 2019 details

Towards learning of filter-level heterogeneous compression of convolutional neural networks

Y. Zur, C. Baskin, E. Zheltonozhskii, B. Chmiel, I. Evron, A. M. Bronstein, A. Mendelson
Proc. AutoML Workshop, Int'l Conf. on Machine Learning (ICML), 2019

Recently, deep learning has become a de facto standard in machine learning with convolutional neural networks (CNNs) demonstrating spectacular success on a wide variety of tasks. However, CNNs are typically very demanding computationally at inference time. One of the ways to alleviate  this burden on certain hardware platforms is quantization relying on the use of low-precision arithmetic representation for the weights and the activations. Another popular method is the pruning of the number of filters in each layer. While mainstream deep learning methods train the neural networks weights while keeping the network architecture fixed, the emerging neural architecture search (NAS) techniques make the latter also amenable to training. In this paper, we formulate optimal arithmetic bit length allocation and neural network pruning as a NAS problem, searching for the configurations satisfying a computational complexity budget while maximizing the accuracy. We use a differentiable search method based on the continuous relaxation of the search space proposed by Liu et al. (2019a). We show, by grid search, that heterogeneous quantized networks suffer from a high variance which renders the benefit of the search questionable. For pruning, improvement over homogeneous cases is possible, but it is still challenging to find those configurations with the proposed method.  The code is publicly available at https://github.com/yochaiz/Slimmable and https://github.com/yochaiz/darts-UNIQ.

T. Weiss, S. Vedula, O. Senouf, A. M. Bronstein, O. Michailovich, M. Zibulevsky, Joint learning of Cartesian undersampling and reconstruction for accelerated MRI, Proc. Int’l Conf. on Acoustics, Speech and Signal Processing (ICASSP), 2020 details

Joint learning of Cartesian undersampling and reconstruction for accelerated MRI

T. Weiss, S. Vedula, O. Senouf, A. M. Bronstein, O. Michailovich, M. Zibulevsky
Proc. Int’l Conf. on Acoustics, Speech and Signal Processing (ICASSP), 2020
Picture for Joint learning of Cartesian undersampling and reconstruction for accelerated MRI

Magnetic Resonance Imaging (MRI) is considered today the golden-standard modality for soft tissues. The long acquisition times, however, make it more prone to motion artifacts as well as contribute to the relatively high costs of this examination. Over the years, multiple studies concentrated on designing reduced measurement schemes and image reconstruction schemes for MRI, however, these problems have been so far addressed separately. On the other hand, recent works in optical computational imaging have demonstrated growing success of the simultaneous learning-based design of the acquisition and reconstruction schemes manifesting significant improvement in the reconstruction quality with a constrained time budget. Inspired by these successes, in this work, we propose to learn accelerated MR acquisition schemes (in the form of Cartesian trajectories) jointly with the image reconstruction operator. To this end, we propose an algorithm for training the combined acquisition-reconstruction pipeline end-to-end in a differentiable way. We demonstrate the significance of using the learned Cartesian trajectories at different speed up rates.

B. Chmiel, C. Baskin, R. Banner, E. Zheltonozshkii, Y. Yermolin, A. Karbachevsky, A. M. Bronstein, A. Mendelson, Feature map transform coding for energy-efficient CNN inference, Proc. Intl. Joint Conf. on Neural Networks (IJCNN), 2020 details

Feature map transform coding for energy-efficient CNN inference

B. Chmiel, C. Baskin, R. Banner, E. Zheltonozshkii, Y. Yermolin, A. Karbachevsky, A. M. Bronstein, A. Mendelson
Proc. Intl. Joint Conf. on Neural Networks (IJCNN), 2020
Picture for Feature map transform coding for energy-efficient CNN inference

Convolutional neural networks (CNNs) achieve state-of-the-art accuracy in a variety of tasks in computer vision and beyond. One of the major obstacles hindering the ubiquitous use of CNNs for inference on low-power edge devices is their relatively high computational complexity and memory bandwidth requirements. The latter often dominates the energy footprint on modern hardware. In this paper, we introduce a lossy transform coding approach, inspired by image and video compression, designed to reduce the memory bandwidth due to the storage of intermediate activation calculation results. Our method exploits the high correlations between feature maps and adjacent pixels and allows to halve the data transfer volumes to the main memory without re-training. We analyze the performance of our approach on a variety of CNN architectures and demonstrated FPGA implementation of ResNet18 with our approach results in a reduction of around 40% in the memory energy footprint compared to quantized network with negligible impact on accuracy. A reference implementation accompanies the paper.

E. Schwartz, L. Karlinsky, R. Feris, R. Giryes, A. M. Bronstein, Baby steps towards few-shot learning with multiple semantics, arXiv:1906.01905, 2019 details

Baby steps towards few-shot learning with multiple semantics

E. Schwartz, L. Karlinsky, R. Feris, R. Giryes, A. M. Bronstein
arXiv:1906.01905, 2019
Picture for Baby steps towards few-shot learning with multiple semantics

Learning from one or few visual examples is one of the key capabilities of humans since early infancy, but is still a significant challenge for modern AI systems. While considerable progress has been achieved in few-shot learning from a few image examples, much less attention has been given to the verbal descriptions that are usually provided to infants when they are presented with a new object. In this paper, we focus on the role of additional semantics that can significantly facilitate few-shot visual learning. Building upon recent advances in few-shot learning with additional semantic information, we demonstrate that further improvements are possible using richer semantics and multiple semantic sources. Using these ideas, we offer the community a new result on the one-shot test of the popular miniImageNet benchmark, comparing favorably to the previous state-of-the-art results for both visual only and visual plus semantics-based approaches. We also performed an ablation study investigating the components and design choices of our approach.

O. Senouf, S. Vedula, T. Weiss, A. M. Bronstein, O. Michailovich, M. Zibulevsky, Self-supervised learning of inverse problem solvers in medical imaging, Proc. Medical Image Learning with Less Labels and Imperfect Data, MICCAI 2019 details

Self-supervised learning of inverse problem solvers in medical imaging

O. Senouf, S. Vedula, T. Weiss, A. M. Bronstein, O. Michailovich, M. Zibulevsky
Proc. Medical Image Learning with Less Labels and Imperfect Data, MICCAI 2019
Picture for Self-supervised learning of inverse problem solvers in medical imaging

In the past few years, deep learning-based methods have demonstrated enormous success for solving inverse problems in medical imaging. In this work, we address the following question: Given a set of measurements obtained from real imaging experiments, what is the best way to use a learnable model and the physics of the modality to solve the inverse problem and reconstruct the latent image? Standard supervised learning based methods approach this problem by collecting data sets of known latent images and their corresponding measurements. However, these methods are often impractical due to the lack of availability of appropriately sized training sets, and, more generally, due to the inherent difficulty in measuring the “groundtruth” latent image. In light of this, we propose a self-supervised approach to training inverse models in medical imaging in the absence of aligned data. Our method only requiring access to the measurements and the forward model at training. We showcase its effectiveness on inverse problems arising in accelerated magnetic resonance imaging (MRI).

N. Diamant, D. Zadok, C. Baskin, E. Schwartz, A. M. Bronstein, Beholder-GAN: Generation and beautification of facial images with conditioning on their beauty level, Proc. Int'l Conf. on Image Processing (ICIP), 2019 details

Beholder-GAN: Generation and beautification of facial images with conditioning on their beauty level

N. Diamant, D. Zadok, C. Baskin, E. Schwartz, A. M. Bronstein
Proc. Int'l Conf. on Image Processing (ICIP), 2019
Picture for Beholder-GAN: Generation and beautification of facial images with conditioning on their beauty level

Beauty is in the eye of the beholder. This maxim, emphasizing the subjectivity of the perception of beauty, has enjoyed a wide consensus since ancient times. In the digital era, data-driven methods have been shown to be able to predict human-assigned beauty scores for facial images. In this work, we augment this ability and train a generative model that generates faces conditioned on a requested beauty score. In addition, we show how this trained generator can be used to beautify an input face image. By doing so, we achieve an unsupervised beautification model, in the sense that it relies on no ground truth target images.

G. Pai, R. Talmon, A. M. Bronstein, R. Kimmel, DIMAL: Deep isometric manifold learning using sparse geodesic sampling, Proc. IEEE Winter Conf. on Applications of Computer Vision (WACV), 2019 details

DIMAL: Deep isometric manifold learning using sparse geodesic sampling

G. Pai, R. Talmon, A. M. Bronstein, R. Kimmel
Proc. IEEE Winter Conf. on Applications of Computer Vision (WACV), 2019
Picture for DIMAL: Deep isometric manifold learning using sparse geodesic sampling

This paper explores a fully unsupervised deep learning approach for computing distance-preserving maps that generate low-dimensional embeddings for a certain class of manifolds. We use the Siamese configuration to train a neural network to solve the problem of least squares multidimensional scaling for generating maps that approximately preserve geodesic distances. By training with only a few landmarks, we show a significantly improved local and nonlocal generalization of the isometric mapping as compared to analogous non-parametric counterparts. Importantly, the combination of a deep-learning framework with a multidimensional scaling objective enables a numerical analysis of network architectures to aid in understanding their representation power. This provides a geometric perspective to the generalizability of deep learning.

E. Schwartz, L. Karlinsky, J. Shtok, S. Harary, M. Marder, R. Feris, A. Kumar, R. Giryes, A. M. Bronstein, ∆-encoder: an effective sample synthesis method for few-shot object recognition, Proc. Neural Information Processing Systems (NIPS), 2018 details

∆-encoder: an effective sample synthesis method for few-shot object recognition

E. Schwartz, L. Karlinsky, J. Shtok, S. Harary, M. Marder, R. Feris, A. Kumar, R. Giryes, A. M. Bronstein
Proc. Neural Information Processing Systems (NIPS), 2018

Learning to classify new categories based on just one or a few examples is a long-standing challenge in modern computer vision. In this work, we propose a simple yet effective method for few-shot (and one-shot) object recognition. Our approach is based on a modified auto-encoder, denoted ∆-encoder, that learns to synthesize new samples for an unseen category just by seeing few examples from it. The synthesized samples are then used to train a classifier. The proposed approach learns to both extract transferable intra-class deformations, or “deltas”, between same-class pairs of training examples, and to apply those deltas to the few provided examples of a novel class (unseen during training) in order to efficiently synthesize samples from that new class. The proposed method improves over the state-of-the-art in one-shot object-recognition and compares favorably in the few-shot case.

C. Baskin, N. Liss, Y. Chai, E. Zheltonozhskii, E. Schwartz, R. Giryes, A. Mendelson, A. M. Bronstein, NICE: noise injection and clamping estimation for neural network quantization, arXiv:1810.00162, 2018 details

NICE: noise injection and clamping estimation for neural network quantization

C. Baskin, N. Liss, Y. Chai, E. Zheltonozhskii, E. Schwartz, R. Giryes, A. Mendelson, A. M. Bronstein
arXiv:1810.00162, 2018

Convolutional Neural Networks (CNN) are very popular in many fields including computer vision, speech recognition, natural language processing, to name a few. Though deep learning leads to groundbreaking performance in these domains, the networks used are very demanding computationally and are far from real-time even on a GPU, which is not power efficient and therefore does not suit low power systems such as mobile devices. To overcome this challenge, some solutions have been proposed for quantizing the weights and activations of these networks, which accelerate the runtime significantly. Yet, this acceleration comes at the cost of a larger error. The uniqname method proposed in this work trains quantized neural networks by noise injection and a learned clamping, which improve the accuracy. This leads to state-of-the-art results on various regression and classification tasks, e.g., ImageNet classification with architectures such as ResNet-18/34/50 with low as 3-bit weights and activations. We implement the proposed solution on an FPGA to demonstrate its applicability for low power real-time applications.

T. Remez, O. Litany, R. Giryes, A. M. Bronstein, Class-aware fully-convolutional Gaussian and Poisson denoising, IEEE Trans. Image Processing, Vol. 27(11), 2018 details

Class-aware fully-convolutional Gaussian and Poisson denoising

T. Remez, O. Litany, R. Giryes, A. M. Bronstein
IEEE Trans. Image Processing, Vol. 27(11), 2018
Picture for Class-aware fully-convolutional Gaussian and Poisson denoising

We propose a fully-convolutional neural-network architecture for image denoising which is simple yet powerful. Its structure allows to exploit the gradual nature of the denoising process, in which shallow layers handle local noise statistics, while deeper layers recover edges and enhance textures. Our method advances the state-of-the-art when trained for different noise levels and distributions (both Gaussian and Poisson). In addition, we show that making the denoiser class-aware by exploiting semantic class information boosts performance, enhances textures and reduces artifacts.

O. Senouf, S. Vedula, G. Zurakhov, A. M. Bronstein, M. Zibulevsky, O. Michailovich, D. Adam, D. Blondheim, High frame-rate cardiac ultrasound imaging with deep learning, Proc. Int'l Conf. Medical Image Computing & Computer Assisted Intervention (MICCAI), 2018 details

High frame-rate cardiac ultrasound imaging with deep learning

O. Senouf, S. Vedula, G. Zurakhov, A. M. Bronstein, M. Zibulevsky, O. Michailovich, D. Adam, D. Blondheim
Proc. Int'l Conf. Medical Image Computing & Computer Assisted Intervention (MICCAI), 2018
Picture for High frame-rate cardiac ultrasound imaging with deep learning

Cardiac ultrasound imaging requires a high frame rate in order to capture rapid motion. This can be achieved by multi-line acquisition (MLA), where several narrow-focused received lines are obtained from each wide-focused transmitted line. This shortens the acquisition time at the expense of introducing block artifacts. In this paper, we propose a data-driven learning-based approach to improve the MLA image quality. We train an end-to-end convolutional neural network on pairs of real ultrasound cardiac data, acquired through MLA and the corresponding single-line acquisition (SLA). The network achieves a significant improvement in image quality for both 5- and 7-line MLA resulting in a decorrelation measure similar to that of SLA while having the frame rate of MLA.

S. Vedula, O. Senouf, G. Zurakhov, A. M. Bronstein, M. Zibulevsky, O. Michailovich, D. Adam, D. Gaitini, High quality ultrasonic multi-line transmission through deep learning, Proc. Machine Learning for Medical Image Reconstruction (MLMIR), 2018 details

High quality ultrasonic multi-line transmission through deep learning

S. Vedula, O. Senouf, G. Zurakhov, A. M. Bronstein, M. Zibulevsky, O. Michailovich, D. Adam, D. Gaitini
Proc. Machine Learning for Medical Image Reconstruction (MLMIR), 2018

Frame rate is a crucial consideration in cardiac ultrasound imaging and 3D sonography. Several methods have been proposed in the medical ultrasound literature aiming at accelerating the image acquisition. In this paper, we consider one such method called multi-line transmission (MLT), in which several evenly separated focused beams are transmitted simultaneously. While MLT reduces the acquisition time, it comes at the expense of a heavy loss of contrast due to the interactions between the beams (cross-talk artifact). In this paper, we introduce a data-driven method to reduce the artifacts arising in MLT. To this end, we propose to train an end-to-end convolutional neural network consisting of correction layers followed by a constant apodization layer. The network is trained on pairs of raw data obtained through MLT and the corresponding single-line transmission (SLT) data. Experimental evaluation demonstrates signi cant improvement both in the visual image quality and in objective measures such as contrast ratio and contrast-to-noise ratio, while preserving resolution unlike traditional apodization-based methods. We show that the proposed method is able to generalize
well across di erent patients and anatomies on real and phantom data.

A. Tsitsulin, D. Mottin, P. Karras, A. M. Bronstein, E, Mueller, SGR: Self-supervised spectral graph representation learning, Proc. KDD Deep Learning Day, 2018 details

SGR: Self-supervised spectral graph representation learning

A. Tsitsulin, D. Mottin, P. Karras, A. M. Bronstein, E, Mueller
Proc. KDD Deep Learning Day, 2018

Representing a graph as a vector is a challenging task; ideally, the representation should be easily computable and conducive to efficient comparisons among graphs, tailored to the particular data and an analytical task at hand. Unfortunately, a “one-size-fits-all” solution is unattainable, as different analytical tasks may require different attention to global or local graph features. We develop SGR, the first, to our knowledge, method for learning graph representations in a self-supervised manner. Grounded on spectral graph analysis, SGR seamlessly combines all aforementioned desirable properties. In extensive experiments, we show how our approach works on large graph collections, facilitates self-supervised representation learning across a variety of application domains, and performs competitively to state-of-the-art methods without re-training.

H. Haim, S. Elmalem, R. Giryes, A. M. Bronstein, E. Marom, Depth estimation from a single image using deep learned phase coded mask, IEEE Trans. Computational Imaging, Vol. 2(3), 2018 (Winner of the OSA Student Grand Challenge The Optical System of the Future) details

Depth estimation from a single image using deep learned phase coded mask

H. Haim, S. Elmalem, R. Giryes, A. M. Bronstein, E. Marom
IEEE Trans. Computational Imaging, Vol. 2(3), 2018 (Winner of the OSA Student Grand Challenge The Optical System of the Future)

Depth estimation from a single image is a well-known challenge in computer vision. With the advent of deep learning, several approaches for monocular depth estimation have been proposed, all of which have inherent limitations due to the scarce depth cues that exist in a single image. Moreover, these methods are very demanding computationally, which makes them inadequate for systems with limited processing power. In this paper, a phase-coded aperture camera for depth estimation is proposed. The camera is equipped with an optical phase mask that provides unambiguous depth-related color characteristics for the captured image. These are used for estimating the scene depth map using a fully convolutional neural network. The phase-coded aperture structure is learned jointly with the network weights using backpropagation. The strong depth cues (encoded in the image by the phase mask, designed together with the network weights) allow a much simpler neural network architecture for faster and more accurate depth estimation. Performance achieved on simulated images as well as on a real optical setup is superior to the state-of-the-art monocular depth estimation methods (both with respect to the depth accuracy and required processing power), and is competitive with more complex and expensive depth estimation methods such as light-field cameras.

C. Baskin, N. Liss, E. Zheltonozhskii, A. M. Bronstein, A. Mendelson, Streaming architectures for large-scale quantized neural networks on an FPGA-based dataflow platform, IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), 2018 details

Streaming architectures for large-scale quantized neural networks on an FPGA-based dataflow platform

C. Baskin, N. Liss, E. Zheltonozhskii, A. M. Bronstein, A. Mendelson
IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), 2018

Deep neural networks (DNNs) are used by different applications that are executed on a range of computer architectures, from IoT devices to supercomputers. The footprint of these networks is huge as well as their computational and communication needs. In order to ease the pressure on resources, research indicates that in many cases a low precision representation (1-2 bit per parameter) of weights and other parameters can achieve similar accuracy while requiring less resources. Using quantized values enables the use of FPGAs to run NNs, since FPGAs are well fitted to these primitives; e.g., FPGAs provide efficient support for bitwise operations and can work with arbitrary-precision representation of numbers. This paper presents a new streaming architecture for running QNNs on FPGAs. The proposed architecture scales out better than alternatives, allowing us to take advantage of systems with multiple FPGAs. We also included support for skip connections, that are used in state-of-the art NNs, and shown that our architecture allows to add those connections almost for free. All this allowed us to implement an 18-layer ResNet for 224×224 images classification, achieving 57.5% top-1 accuracy. In addition, we implemented a full-sized quantized AlexNet. In contrast to previous works, we use 2-bit activations instead of 1-bit ones, which improves AlexNet’s top-1 accuracy from 41.8% to 51.03% for the ImageNet classification. Both AlexNet and ResNet can handle 1000-class real-time classification on an FPGA. Our implementation of ResNet-18 consumes 5× less power and is 4× slower for ImageNet, when compared to the same NN on the latest Nvidia GPUs. Smaller NNs, that fit a single FPGA, are running faster then on GPUs on small (32×32) inputs, while consuming up to 20× less energy and power.

R. Giryes, Y. C. Eldar, A. M. Bronstein, G. Sapiro, Tradeoffs between convergence speed and reconstruction accuracy in inverse problems, IEEE Trans. on Signal Processing, Vol. 66(7), 2018 details

Tradeoffs between convergence speed and reconstruction accuracy in inverse problems

R. Giryes, Y. C. Eldar, A. M. Bronstein, G. Sapiro
IEEE Trans. on Signal Processing, Vol. 66(7), 2018

Solving inverse problems with iterative algorithms is popular, especially for large data. Due to time constraints, the number of possible iterations is usually limited, potentially affecting the achievable accuracy. Given an error one is willing to tolerate, an important question is whether it is possible to modify the original iterations to obtain faster convergence to a minimizer achieving the allowed error without increasing the computational cost of each iteration considerably. Relying on recent recovery techniques developed for settings in which the desired signal belongs to some low-dimensional set, we show that using a coarse estimate of this set may lead to faster convergence at the cost of an additional reconstruction error related to the accuracy of the set approximation. Our theory ties to recent advances in sparse recovery, compressed sensing, and deep learning. Particularly, it may provide a possible explanation to the successful approximation of the L1-minimization solution by neural networks with layers representing iterations, as practiced in the learned iterative shrinkage-thresholding algorithm.

O. Litany, T. Remez, E. Rodolà, A. M. Bronstein, M. M. Bronstein, Deep Functional Maps: Structured prediction for dense shape correspondence, Proc. Int'l Conf. on Computer Vision (ICCV), 2017 details

Deep Functional Maps: Structured prediction for dense shape correspondence

O. Litany, T. Remez, E. Rodolà, A. M. Bronstein, M. M. Bronstein
Proc. Int'l Conf. on Computer Vision (ICCV), 2017

We introduce a new framework for learning dense correspondence between deformable 3D shapes. Existing learning based approaches model shape correspondence as a labelling problem, where each point of a query shape receives a label identifying a point on some reference domain; the correspondence is then constructed a posteriori by composing the label predictions of two input shapes. We propose a paradigm shift and design a structured prediction model in the space of functional maps, linear operators that provide a compact representation of the correspondence. We model the learning process via a deep residual network which takes dense descriptor fields defined on two shapes as input, and outputs a soft map between the two given objects. The resulting correspondence is shown to be accurate on several challenging benchmarks comprising multiple categories, synthetic models, real scans with acquisition artifacts, topological noise, and partiality.

T. Remez, O. Litany, R. Giryes, A. M. Bronstein, Deep class-aware image denoising, Proc. Int'l Conf. on Image Processing (ICIP), 2017 details

Deep class-aware image denoising

T. Remez, O. Litany, R. Giryes, A. M. Bronstein
Proc. Int'l Conf. on Image Processing (ICIP), 2017
Picture for Deep class-aware image denoising

The increasing demand for high image quality in mobile devices brings forth the need for better computational enhancement techniques, and image denoising in particular. To this end, we propose a new fully convolutional deep neural network architecture which is simple yet powerful and achieves state-of-the-art performance for additive Gaussian noise removal. Furthermore, we claim that the personal photo-collections can usually be categorized into a small set of semantic classes. However simple, this observation has not been exploited in image denoising until now. We show that a significant boost in performance of up to 0.4dB PSNR can be achieved by making our network class-aware, namely, by fine-tuning it for images belonging to a specific semantic class. Relying on the hugely successful existing image classifiers, this research advocates for using a class-aware approach in all image enhancement tasks.

T. Remez, O. Litany, R. Giryes, A. M. Bronstein, Deep class-aware denoising, arXiv:1701.01698 details

Deep class-aware denoising

T. Remez, O. Litany, R. Giryes, A. M. Bronstein
arXiv:1701.01698
Picture for Deep class-aware denoising

The increasing demand for high image quality in mobile devices brings forth the need for better computational enhancement techniques, and image denoising in particular. At the same time, the images captured by these devices can be categorized into a small set of semantic classes. However simple, this observation has not been exploited in image denoising until now. In this paper, we demonstrate how the reconstruction quality improves when a denoiser is aware of the type of content in the image. To this end, we first propose a new fully convolutional deep neural network architecture which is simple yet powerful as it achieves state-of-the-art performance even without be- ing class-aware. We further show that a significant boost in performance of up to 0.4 dB PSNR can be achieved by making our network class-aware, namely, by fine-tuning it for images belonging to a specific semantic class. Relying on the hugely successful existing image classifiers, this research advocates for using a class-aware approach in all image enhancement tasks.

T. Remez, O. Litany, R. Giryes, A. M. Bronstein, Deep convolutional denoising of low-light images, arXiv:1701.01687 details

Deep convolutional denoising of low-light images

T. Remez, O. Litany, R. Giryes, A. M. Bronstein
arXiv:1701.01687
Picture for Deep convolutional denoising of low-light images

Poisson distribution is used for modeling noise in photon-limited imaging. While canonical examples include relatively exotic types of sensing like spectral imaging or astronomy, the problem is relevant to regular photography now more than ever due to the booming market for mobile cameras. Restricted form factor limits the amount of absorbed light, thus computational post-processing is called for. In this paper, we make use of the powerful framework of deep convolutional neural networks for Poisson denoising. We demonstrate how by training the same network with images having a specific peak value, our denoiser outperforms previous state-of-the-art by a large margin both visually and quantitatively. Being flexible and data-driven, our solution resolves the heavy ad hoc engineering used in previous methods and is an order of magnitude faster. We further show that by adding a reasonable prior on the class of the image being processed, another significant boost in performance is achieved.

T. Remez, O. Litany, S. Yoseff, H. Haim, A. M. Bronstein, FPGA system for real-time computational extended depth of field imaging using phase aperture coding, arXiv:1608.01074 details

FPGA system for real-time computational extended depth of field imaging using phase aperture coding

T. Remez, O. Litany, S. Yoseff, H. Haim, A. M. Bronstein
arXiv:1608.01074
Picture for FPGA system for real-time computational extended depth of field imaging using phase aperture coding

We present a proof-of-concept end-to-end system for computational extended depth of field (EDOF) imaging. The acquisition is performed through a phase-coded aperture implemented by placing a thin wavelength-dependent op- tical mask inside the pupil of a conventional camera lens, as a result of which, each color channel is focused at a different depth. The reconstruction process re- ceives the raw Bayer image as the input, and performs blind estimation of the output color image in focus at an extended range of depths using a patch-wise sparse prior. We present a fast non-iterative reconstruction algorithm operating with constant latency in fixed-point arithmetics and achieving real-time perfor- mance in a prototype FPGA implementation. The output of the system, on simu- lated and real-life scenes, is qualitatively and quantitatively better than the result of clear-aperture imaging followed by state-of-the-art blind deblurring.

R. Giryes, G. Sapiro, A. M. Bronstein, Deep neural networks with random Gaussian weights: A universal classification strategy?, IEEE Trans. Signal Processing, Vol. 64(13), 2016 details

Deep neural networks with random Gaussian weights: A universal classification strategy?

R. Giryes, G. Sapiro, A. M. Bronstein
IEEE Trans. Signal Processing, Vol. 64(13), 2016

Three important properties of a classification machinery are: (i) the system preserves the important information of the input data; (ii) the training examples convey information for unseen data; and (iii) the system is able to treat differently points from different classes. In this work, we show that these fundamental properties are inherited by the architecture of deep neural networks. We formally prove that these networks with random Gaussian weights perform a distance-preserving embedding of the data, with a special treatment for in-class and out-of-class data. Similar points at the input of the network are likely to have the same The theoretical analysis of deep networks here presented exploits tools used in the compressed sensing and dictionary learning literature, thereby making a formal connection between these important topics. The derived results allow drawing conclusions on the metric learning properties of the network and their relation to its structure; and provide bounds on the required size of the training set such that the training examples would represent faithfully the unseen data. The results are validated with state-of-the-art trained networks.

T. Remez, O. Litany, A. M. Bronstein, A Picture is Worth a Billion Bits: Real-time image reconstruction from dense binary pixels, arXiv:1510.04601 details

A Picture is Worth a Billion Bits: Real-time image reconstruction from dense binary pixels

T. Remez, O. Litany, A. M. Bronstein
arXiv:1510.04601
Picture for A Picture is Worth a Billion Bits: Real-time image reconstruction from dense binary pixels

The pursuit of smaller pixel sizes at ever-increasing resolution in digital image sensors is mainly driven by the stringent price and form-factor requirements of sensors and optics in the cellular phone market. Recently, Eric Fossum proposed a novel concept of an image sensor with dense sub-diffraction limit one-bit pixels (jots), which can be considered a digital emulation of silver halide photographic film. This idea has been recently embodied as the EPFL Gigavision camera. A major bottleneck in the design of such sensors is the image reconstruction process, producing a continuous high dynamic range image from oversampled bi- nary measurements. The extreme quantization of the Pois- son statistics is incompatible with the assumptions of most standard image processing and enhancement frameworks. The recently proposed maximum-likelihood (ML) approach addresses this difficulty, but suffers from image artifacts and has impractically high computational complexity. In this work, we study a variant of a sensor with binary thresh- old pixels and propose a reconstruction algorithm combin- ing an ML data fitting term with a sparse synthesis prior. We also show an efficient hardware-friendly real-time approximation of this inverse operator. Promising results are shown on synthetic data as well as on HDR data emulated using multiple exposures of a regular CMOS sensor.

P. Sprechmann, A. M. Bronstein, G. Sapiro, Learning efficient sparse and low-rank models, IEEE Trans. Pattern Analysis and Machine Intelligence (PAMI), Vol. 37(9), 2015 details

Learning efficient sparse and low-rank models

P. Sprechmann, A. M. Bronstein, G. Sapiro
IEEE Trans. Pattern Analysis and Machine Intelligence (PAMI), Vol. 37(9), 2015
Picture for Learning efficient sparse and low-rank models

Parsimony, including sparsity and low rank, has been shown to successfully model data in numerous machine learning and signal processing tasks. Traditionally, parsimonious modeling approaches rely on an iterative algorithm that minimizes an objective function with parsimony-promoting terms. The inherently sequential structure and data-dependent complexity and latency of iterative optimization constitute a major limitation in many applications requiring real-time performance or involving large-scale data. Another limitation encountered by these models is the difficulty of their inclusion in supervised learning scenarios, where the higher-level training objective would depend on the solution of the lower-level pursuit problem. The resulting bilevel optimization problems are in general notoriously difficult to solve. In this paper, we propose to move the emphasis from the model to the pursuit algorithm, and develop a process-centric view of parsimonious modeling, in which a deterministic fixed-complexity pursuit process is used in lieu of iterative optimization. We show a principled way to construct learnable pursuit process architectures for structured sparse and robust low rank models from the iteration of proximal descent algorithms. These architectures approximate the exact parsimonious representation with a fraction of the complexity of the standard optimization methods. We also show that carefully chosen training regimes allow to naturally extend parsimonious models to discriminative settings. State-of-the-art results are demonstrated on several challenging problems in image and audio processing with several orders of magnitude speedup compared to the exact optimization algorithms.

O. Menashe, A. M. Bronstein, Real-time compressed imaging of scattering volumes, Proc. Int'l Conf. on Image Processing (ICIP), 2014 details

Real-time compressed imaging of scattering volumes

O. Menashe, A. M. Bronstein
Proc. Int'l Conf. on Image Processing (ICIP), 2014
Picture for Real-time compressed imaging of scattering volumes

We propose a method and a prototype imaging system for real-time reconstruction of volumetric piecewise-smooth scattering media. The volume is illuminated by a sequence of structured binary patterns emitted from a fan beam projector, and the scattered light is collected by a two-dimensional sensor, thus creating an under-complete set of compressed measurements. We show a fixed-complexity and latency reconstruction algorithm capable of estimating the scattering coefficients in real-time. We also show a simple greedy algorithm for learning the optimal illumination patterns. Our results demonstrate faithful reconstruction from highly compressed measurements. Furthermore, a method for compressed registration of the measured volume to a known template is presented, showing excellent alignment with just a single projection. Though our prototype system operates in visible light, the presented methodology is suitable for fast x-ray scattering imaging, in particular in real-time vascular medical imaging.

P. Sprechmann, R. Litman, T. Ben Yakar, A. M. Bronstein, G. Sapiro, Efficient supervised sparse analysis and synthesis operators, Proc. Neural Information Proc. Systems (NIPS), 2013 details

Efficient supervised sparse analysis and synthesis operators

P. Sprechmann, R. Litman, T. Ben Yakar, A. M. Bronstein, G. Sapiro
Proc. Neural Information Proc. Systems (NIPS), 2013
Picture for Efficient supervised sparse analysis and synthesis operators

In this paper, we propose a new and computationally efficient framework for learning sparse models. We formulate a unified approach that contains as particular cases models promoting sparse synthesis and analysis type of priors, and mixtures thereof. The supervised training of the proposed model is formulated as a bilevel optimization problem, in which the operators are optimized to achieve the best possible performance on a specific task, e.g., reconstruction or classification. By restricting the operators to be shift invariant, our approach can be thought as a way of learning analysis+synthesis sparsity-promoting convolutional operators. Leveraging recent ideas on fast trainable regressors designed to approximate exact sparse codes, we propose a way of constructing feed-forward neural networks capable of approximating the learned models at a fraction of the computational cost of exact solvers. In the shift-invariant case, this leads to a principled way of constructing task-specific convolutional networks. We illustrate the proposed models on several experiments in music analysis and image processing applications.

P. Sprechmann, A. M. Bronstein, M. M. Bronstein, G. Sapiro, Learnable low rank sparse models for speech denoising, Proc. Int'l Conf. on Acoustics, Speech, and Signal Processing (ICASSP), 2013 details

Learnable low rank sparse models for speech denoising

P. Sprechmann, A. M. Bronstein, M. M. Bronstein, G. Sapiro
Proc. Int'l Conf. on Acoustics, Speech, and Signal Processing (ICASSP), 2013
Picture for Learnable low rank sparse models for speech denoising

In this paper we present a framework for real time enhancement of speech signals. Our method leverages a new process-centric approach for sparse and parsimonious models, where the representation pursuit is obtained applying a deterministic function or process rather than solving an optimization problem. We first propose a rank-regularized robust version of non-negative matrix factorization (NMF) for modeling time-frequency representations of speech signals in which the spectral frames are decomposed as sparse linear combinations of atoms of a low-rank dictionary. Then, a parametric family of pursuit processes is derived from the iteration of the proximal descent method for solving this model. We present several experiments showing successful results and the potential of the proposed framework. Incorporating discriminative learning makes the proposed method significantly outperform exact NMF algorithms, with fixed latency and at a fraction of it’s computational complexity.

P. Sprechmann, A. M. Bronstein, G. Sapiro, Real-time online singing voice separation from monaural recordings using robust low-rank modeling, Proc. Annual Conference of the Int'l Society for Music Information Retrieval (ISMIR), 2012 (Best poster presentation award) details

Real-time online singing voice separation from monaural recordings using robust low-rank modeling

P. Sprechmann, A. M. Bronstein, G. Sapiro
Proc. Annual Conference of the Int'l Society for Music Information Retrieval (ISMIR), 2012 (Best poster presentation award)
Picture for Real-time online singing voice separation from monaural recordings using robust low-rank modeling

Separating the leading vocals from the musical accompaniment is a challenging task that appears naturally in several music processing applications. Robust principal component analysis (RPCA) has been recently employed to this problem producing very successful results. The method decomposes the signal into a low-rank component corresponding to the accompaniment with its repetitive structure, and a sparse component corresponding to the voice with its quasi-harmonic structure. In this paper, we first introduce a non-negative variant of RPCA, termed as robust low-rank non-negative matrix factorization (RNMF). This new framework better suits audio applications. We then propose two efficient feed-forward architectures that approximate the RPCA and RNMF with low latency and a fraction of the complexity of the original optimization method. These approximants allow incorporating elements of unsupervised, semi- and fully-supervised learning into the RPCA and RNMF frameworks. Our basic implementation shows several orders of magnitude speedup compared to the exact solvers with no performance degradation, and allows online and faster-than-real-time processing. Evaluation on the MIR-1K dataset demonstrates state-of-the-art performance.

P. Sprechmann, A. M. Bronstein, G. Sapiro, Learning efficient structured sparse models, Proc. Int'l Conf. on Machine Learning (ICML), 2012 details

Learning efficient structured sparse models

P. Sprechmann, A. M. Bronstein, G. Sapiro
Proc. Int'l Conf. on Machine Learning (ICML), 2012
Picture for Learning efficient structured sparse models

We present a comprehensive framework for structured sparse coding and modeling extending the recent ideas of using learnable fast regressors to approximate exact sparse codes. For this purpose, we propose an efficient feed forward architecture derived from the iteration of the block-coordinate algorithm. This architecture approximates the exact structured sparse codes with a fraction of the complexity of the standard optimization methods. We also show that by using different training objective functions, the proposed learnable sparse encoders are not only restricted to be approximants of the exact sparse code for a pre-given dictionary, but can be rather used as full-featured sparse encoders or even modelers. A simple implementation shows several orders of magnitude speedup compared to the state-of-the-art exact optimization algorithms at minimal performance degradation, making the proposed framework suitable for real time and large-scale applications.

A. M. Bronstein, M. M. Bronstein, M. Zibulevsky, Y. Y. Zeevi, Optimal nonlinear estimation of photon coordinates in PET, Proc. Int'l Symposium on Biomedical Imaging (ISBI), 2002 details

Optimal nonlinear estimation of photon coordinates in PET

A. M. Bronstein, M. M. Bronstein, M. Zibulevsky, Y. Y. Zeevi
Proc. Int'l Symposium on Biomedical Imaging (ISBI), 2002
Picture for Optimal nonlinear estimation of photon coordinates in PET

We consider detection of high-energy photons in PET using thick scintillation crystals. Parallax effect and multiple Compton interactions in this type of crystals significantly reduce the accuracy of conventional detection methods. In order to estimate the scintillation point coordinates based on photomultiplier responses, we use asymptotically optimal nonlinear techniques, implemented by feed-forward neural networks, radial basis functions (RBF) networks, and neuro-fuzzy systems. Incorporation of information about angles of incidence of photons significantly improves the accuracy of estimation. The proposed estimators are fast enough to perform detection using conventional computers.