Blind source separation is a fundamental problem in signal processing. Classically, this problem was tackled using techniques like Independent Component Analysis (ICA), sparse signal representations, non-negative matrix factorization and recently, deep learning. In this project, the student(s) will implement state-of-the-art source separation techniques and work towards building novel techniques to reach performance beyond the state-of-the-art. Applications that we will specifically focus on will include music and speech signals and medical images.
Autonomous multi-rotor aerial vehicles (MAVs) are an emerging technology, which has a large number of current and potential applications in a wide range of industries. These airborne instruments are becoming growingly autonomous thanks to modern artificial intelligence technologies, with their navigation and interaction capabilities based predominantly on visual sensing. While vision has attracted considerable attention, it suffers from a poor performance in low light and direct sunlight conditions and is vulnerable to occlusions. This project targets to change this situation, endowing drones with “ears”. The proposed research aims at the development of novel algorithms based on a combination of machine learning and physical modeling, and real-time systems for acoustic-based autonomous mapping, localization, and interaction of MAVs.
Reconstructing images naively from burst imaging (acquiring images > 100fps) sequences tend to suffer from blur due to hand-tremor/motion and from low photon counts in each image. In this project we aim to jointly perform blind deblurring (by estimating the transformation matrices between the shots), low-light image denoising and super-resolution, in order to accurately register a sequence of burst images. The students will work with a reasonably accurate forward model of the imaging process and the goal is to learn the optimal burst parameters simultaneously with the reconstruction operator (CNN) on perfectly registered simulated images.
This project aims to generate a video avatar playing the piano from the recording of a pianist. A related but nearly opposite application — namely, generating piano music from video, is also proposed. The project will rely on a recent work in the lab that involves a digital piano rigged with depth cameras to acquire the instrument audio and midi synchronously with the video of pianist’s hands, as well as an algorithm that extracts the 3D locations of the finger joints.
Modern UAVs rely extensively on visual sensing. In fact, one of the basic capabilities allowing the drone to navigate in and interact with unknown environments is simultaneous localization and mapping (SLAM), a process of constructing or updating a map of the environment while simultaneously keeping track of the aircraft location within. Today’s most effective SLAM algorithms are vision-based (vSLAM). One of the key steps in building such UAVs is object/obstacle detection.
In this project, we will first design an object detection framework for data that has been already acquired from drones. We will then analyze its vulnerabilities towards adversarial attacks
and thereafter design algorithms that are robust to such attacks.
Modern UAVs rely extensively on visual sensing. In fact, one of the basic capabilities allowing the drone to navigate in and interact with unknown environments is simultaneous localization and mapping (SLAM), a process of constructing or updating a map of the environment while simultaneously keeping track of the aircraft location within. Today’s most effective SLAM algorithms are vision-based (vSLAM).
In this project, we will build algorithms for visual SLAM by fusing streams of data acquired from RGB cameras, IMUs, rotor encoders, and IR markers set along the flight trajectory. We will then analyze the susceptibility of such systems to adversarial attacks.
Magnetic Resonance Imaging (MRI) is a leading modality in medical imaging since it is non-invasive and produces excellent contrast. However, the long acquisition time of MRI currently prohibits its use in many applications – such as cardiac imaging, emergency rooms etc. During the past few years, compressed sensing and deep learning have been in the forefront of MR image reconstruction, leading to great improvement in image quality with reduced scan times. In this project, we will work towards building novel techniques to push the current benchmarks in deep learning based MRI.
Very little works exists in the literature on the quantization of sequence models or attention mechanisms. Our experience with the quantization of vision models suggests that non-linear operations such as batch norm are highly sensitive to integer quantization. In batch normalization, the heart of the problem lies in the variance calculation that ”stretches” the input tensor X with range of values into a much larger range . We expect this accuracy degradation would aggravated with the ”stretching” of the exponent calculation in the softmax layer. We note that unlike classification tasks that can replace the softmax with argmax during infrerence, this is clearly not attainable for sequence models. At a more general level, good integer quantization is usually maintained for tensor with Gaussian-like distribution and even improved with distributions that are light tailed relative to Gaussian distribution. Conversely, quantization of heavy tailed distributions has detrimental impact on validation accuracy. Thus, an important step for quantization of these models would be to collect the statistical distributions of the tensors that are to be quantized to lower bit-width representation.
In this project, we will inverstigate efficient quantization approaches in the context of sequence models.
When quantizing a neural network, it is often desired to set different bitwidth for different layers. To that end, we need to derive a method to measure the effect of quantization errors in individual layers on the overall model prediction accuracy. Then, by combining the effect caused by all layers, the optimal bit-width can be decided for each layer. Without such a measure, an exhaustive search for optimal bitwidth on each layer is required, which makes the quantization process less efficient.
The cosine-similarity, mean-square-error (MSE) and signal-to-noise-ratio (SNR) have all been proposed as metrics to measure the sensitivity of DNN layers to quantization. We have shown that the cosine-similarity measure has significant benefits compared to the MSE measure. Yet, there is no theoretical analysis to show how these measures relate to the accuracy of the DNN model.
In this project, we would like to conduct a theoretical and empirical investigation to find out how quantization at the layer domain effects noise in the feature domain. Considering first classification tasks, there should be a minimal noise level that cause miss-classification at the last layer (softmax). This error level can now be propagated backwards to set the tolerance to noise at other lower layers. We might be able to borrow insights and models from communication systems where noise accumulation was extensively studied.
Rapid deployment of the state-of-the-art deep neural networks (DNNs) to energy efficient accelerators without time-consuming fine tuning or the availability of the full data sets is highly appealing. It has been noted by several prior arts that neural network distributions, are near Gaussian in practice, sometimes further controlled by procedures such as batch normalization. With Gaussian distribution, values are more likely to be close to mean. Therefore, to improve precision we might want to allocate smaller step size in the area around mean, and make the step size larger at the tails (where values are less likely to happen). This type of non-uniform quantization effectively allocates a smaller step size on average without increasing the number of quantization levels. Unfortunately, the use of non-linear quantization is very difficult in practice and requires excessive use of look-up tables.
Instead, in this project we will study the use of nonlinear transformation of data so that a standard uniform quantization scheme can be applied, allowing to get more aggressive quantizations with less accuracy drop on standard fixed-point HW.
It has been shown that it is possible to significantly quantize both the activations and weights of neural networks when used during propagation, while preserving near-state-of-the-art performance on standard benchmarks. Many efforts are being done to leverage these observations suggesting low precision hardware (Intel, NVIDA, etc). Parallel efforts are also devoted to design efficient models that can run on CPU, or even on the mobile phone. The idea is to use extremely computation efficient architectures (i.e., architectures with much less parameters compared to the traditional architectures) that maintain comparable accuracy while achieving significant speedups.
In this project we would like to study the trade-offs between quantization and over-parameterization of neural networks from a theoretical perspective. At a higher level we would like to study how these efforts for optimizing the number of operations interacts with the parallel efforts of network quantization. Would future models be harder for quantization? Can HW support of non-uniform quantization be helpful here?
Home automation technologies are becoming ubiquitous with the advent of affordable low-power sensors and actuators. In a typical smart home system, traditional devices like light switches and water taps are replaced by connected and electronically controlled actuators, and many other home appliances can report their status (e.g., energy consumption) and be remotely controlled. Combined with such an infrastructure, AI techniques promise to bring to a new level the efficiency and convenience of contemporary dwellings. In this project, we will develop a deep learning-based controller for an actual smart home system.
This project is reserved for dedicated excellent students with exceptional hands-on programming and system engineering skills and can be used as a segue to graduate research.