Relevant publications

Bioinformatics & Computational Chemistry

H. Abraham, B. Gahtan, A. Kobovich, O. Leitersdorf, A. M. Bronstein, E. Yaakobi, Beyond the alphabet: deep signal embedding for enhanced DNA clustering, arXiv:2410.06188, 2024 details

Beyond the alphabet: deep signal embedding for enhanced DNA clustering

H. Abraham, B. Gahtan, A. Kobovich, O. Leitersdorf, A. M. Bronstein, E. Yaakobi
arXiv:2410.06188, 2024
Picture for Beyond the alphabet: deep signal embedding for enhanced DNA clustering

The emerging field of DNA storage employs strands of DNA bases (A/T/C/G) as a storage medium for digital information to enable massive density and durability. The DNA storage pipeline includes: (1) encoding the raw data into sequences of DNA bases; (2) synthesizing the sequences as DNA strands that are stored over time as an unordered set; (3) sequencing the DNA strands to generate DNA reads; and (4) deducing the original data. The DNA synthesis and sequencing stages each generate several independent error-prone duplicates of each strand which are then utilized in the final stage to reconstruct the best estimate for the original strand. Specifically, the reads are first clustered into groups likely originating from the same strand (based on their similarity to each other), and then each group approximates the strand that led to the reads of that group. This work improves the DNA clustering stage by embedding it as part of the DNA sequencing. Traditional DNA storage solutions begin after the DNA sequencing process generates discrete DNA reads (A/T/C/G), yet we identify that there is untapped potential in using the raw signals generated by the Nanopore DNA sequencing machine before they are discretized into bases, a process known as basecalling, which is done using a deep neural network. We propose a deep neural network that clusters these signals directly, demonstrating superior accuracy, and reduced computation times compared to current approaches that cluster after basecalling.

A. A. Rosenberg, S. Vedula, A. M. Bronstein, A. Marx, Seeing Double: Molecular dynamics simulations reveal the stability of certain alternate protein conformations in crystal structures, bioRxiv 2024.08.31.610605, 2024 details

Seeing Double: Molecular dynamics simulations reveal the stability of certain alternate protein conformations in crystal structures

A. A. Rosenberg, S. Vedula, A. M. Bronstein, A. Marx
bioRxiv 2024.08.31.610605, 2024
Picture for Seeing Double: Molecular dynamics simulations reveal the stability of certain alternate protein conformations in crystal structures

Proteins jiggle around, adopting ensembles of interchanging conformations. Here we show through a large-scale analysis of the Protein Data Bank and using molecular dynamics simulations, that segments of protein chains can also commonly adopt dual, transiently stable conformations which is not explained by direct interactions. Our analysis highlights how alternate conformations can be maintained as non-interchanging, separated states intrinsic to the protein chain, namely through steric barriers or the adoption of transient secondary structure elements. We further demonstrate that despite the commonality of the phenomenon, current structural ensemble prediction methods fail to capture these bimodal distributions of conformations.

A. A. Rosenberg, A. Marx, A. M. Bronstein, A dataset of alternately located segments in protein crystal structures, Scientific Data, 11 (783), 2024 details

A dataset of alternately located segments in protein crystal structures

A. A. Rosenberg, A. Marx, A. M. Bronstein
Scientific Data, 11 (783), 2024

Protein Data Bank (PDB) files list the relative spatial location of atoms in a protein structure as the final output of the process of fitting and refining to experimentally determined electron density measurements. Where experimental evidence exists for multiple conformations, atoms are modelled in alternate locations. Programs reading PDB files commonly ignore these alternate conformations by default leaving users oblivious to the presence of alternate conformations in the structures they analyze. This has led to underappreciation of their prevalence, under characterisation of their features and limited the accessibility to this high-resolution data representing structural ensembles. We have trawled PDB files to extract structural features of residues with alternately located atoms. The output includes the distance between alternate conformations and identifies the location of these segments within the protein chain and in proximity of all other atoms within a defined radius. This dataset should be of use in efforts to predict multiple structures from a single sequence and support studies investigating protein flexibility and the association with protein function.

D. Freedman, E. Rozenberg, A. M. Bronstein, A theoretical framework for an efficient normalizing flow-based solution to the Schrödinger equation, arXiv preprint arXiv:2406.00047, 2024 details

A theoretical framework for an efficient normalizing flow-based solution to the Schrödinger equation

D. Freedman, E. Rozenberg, A. M. Bronstein
arXiv preprint arXiv:2406.00047, 2024
Picture for A theoretical framework for an efficient normalizing flow-based solution to the Schrödinger equation

A central problem in quantum mechanics involves solving the Electronic Schrödinger Equation for a molecule or material. The Variational Monte Carlo approach to this problem approximates a particular variational objective via sampling, and then optimizes this approximated objective over a chosen parameterized family of wavefunctions, known as the ansatz. Recently neural networks have been used as the ansatz, with accompanying success. However, sampling from such wavefunctions has required the use of a Markov Chain Monte Carlo approach, which is inherently inefficient. In this work, we propose a solution to this problem via an ansatz which is cheap to sample from, yet satisfies the requisite quantum mechanical properties. We prove that a normalizing flow using the following two essential ingredients satisfies our requirements: (a) a base distribution which is constructed from Determinantal Point Processes; (b) flow layers which are equivariant to a particular subgroup of the permutation group. We then show how to construct both continuous and discrete normalizing flows which satisfy the requisite equivariance. We further demonstrate the manner in which the non-smooth nature (“cusps”) of the wavefunction may be captured, and how the framework may be generalized to provide induction across multiple molecules. The resulting theoretical framework entails an efficient approach to solving the Electronic Schrödinger Equation.

A. A. Rosenberg, N. Yehishalom, A. Marx, A. M. Bronstein, An amino-domino model described by a cross-peptide-bond Ramachandran plot defines amino acid pairs as local structural units, Proc. US National Academy of Sciences (PNAS), 2023 details

An amino-domino model described by a cross-peptide-bond Ramachandran plot defines amino acid pairs as local structural units

A. A. Rosenberg, N. Yehishalom, A. Marx, A. M. Bronstein
Proc. US National Academy of Sciences (PNAS), 2023

Protein structure, both at the global and local level, dictates function. Proteins fold from chains of amino acids, forming secondary structures, α-helices and β-strands, that, at least for globular proteins, subsequently fold into a three-dimensional structure. Here, we show that a Ramachandran-type plot focusing on the two dihedral angles separated by the peptide bond, and entirely contained within an amino acid pair, defines a local structural unit. We further demonstrate the usefulness of this cross-peptide-bond Ramachandran plot by showing that it captures β-turn conformations in coil regions, that traditional Ramachandran plot outliers fall into occupied regions of our plot, and that thermophilic proteins prefer specific amino acid pair conformations. Further, we demonstrate experimentally that the effect of a point mutation on backbone conformation and protein stability depends on the amino acid pair context, i.e., the identity of the adjacent amino acid, in a manner predictable by our method.

T. Weiss, L. Cosmo, E. Mayo Yanes, S. Chakraborty, A. M. Bronstein, R. Gershoni-Poranne, Guided diffusion for inverse molecular design, Nature Computational Science 3(10), 873–882, 2023 details

Guided diffusion for inverse molecular design

T. Weiss, L. Cosmo, E. Mayo Yanes, S. Chakraborty, A. M. Bronstein, R. Gershoni-Poranne
Nature Computational Science 3(10), 873–882, 2023

The holy grail of materials science is de novo molecular design — i.e., the ability to engineer molecules with desired characteristics. Recently, this goal has become increasingly achievable thanks to developments such as equivariant graph neural networks that can better predict molecular properties, and to the improved performance of generation tasks, in particular of conditional generation, in text-to-image generators and large language models. Herein, we introduce GaUDI, a guided diffusion model for inverse molecular design, which combines these advances and can generate novel molecules with desired properties. GaUDI decouples the generator and the property-predicting models and can be guided using both point-wise targets and open-ended targets (e.g., minimum/maximum). We demonstrate GaUDI’s effectiveness using single- and multiple-objective tasks applied to newly-generated data sets of polycyclic aromatic systems, achieving nearly 100% validity of generated molecules. Further, for some tasks, GaUDI discovers better molecules than those present in our data set of 475k molecules.

T. Weiss, A. Wahab, A. M. Bronstein, R. Gershoni-Poranne, Interpretable deep learning unveils structure-property relationships in polybenzenoid hydrocarbons, Journal of Organic Chemistry, 2023 details

Interpretable deep learning unveils structure-property relationships in polybenzenoid hydrocarbons

T. Weiss, A. Wahab, A. M. Bronstein, R. Gershoni-Poranne
Journal of Organic Chemistry, 2023

In this work, interpretable deep learning was used to identify structure-property relationships governing the HOMO-LUMO gap and relative stability of polybenzenoid hydrocarbons (PBHs). To this end, a ring-based graph representation was used. In addition to affording reduced training times and excellent predictive ability, this representation could be combined with a subunit-based perception of PBHs, allowing chemical insights to be presented in terms of intuitive and simple structural motifs. The resulting insights agree with conventional organic chemistry knowledge and electronic structure-based analyses, and also reveal new behaviors and identify influential structural motifs. In particular, we evaluated and compared the effects of linear, angular, and branching motifs on these two molecular properties, as well as explored the role of dispersion in mitigating torsional strain inherent in non-planar PBHs. Hence, the observed regularities and the proposed analysis contribute to a deeper understanding of the behavior of PBHs and form the foundation for design strategies for new functional PBHs.

A. M. Bronstein, A. Marx, Water stabilizes an alternate turn conformation in horse heart myoglobin, Nature Scientific Reports, 2023 details

Water stabilizes an alternate turn conformation in horse heart myoglobin

A. M. Bronstein, A. Marx
Nature Scientific Reports, 2023
Picture for Water stabilizes an alternate turn conformation in horse heart myoglobin

Comparison of myoglobin structures reveals that protein isolated from horse heart consistently adopts an alternate turn conformation in comparison to its homologues. Analysis of hundreds of high-resolution structures discounts crystallization conditions or the surrounding amino acid protein environment as explaining this difference, that is also not captured by the AlphaFold prediction. Rather, a water molecule is identified as stabilizing the conformation in the horse heart structure, which immediately reverts to the whale conformation in molecular dynamics simulations excluding that structural water.

L. Ackerman-Schraier, A. A. Rosenberg, A. Marx, A. M. Bronstein, Machine learning approaches demonstrate that protein structures carry information about their genetic coding, Nature Scientific Reports, 2022 details

Machine learning approaches demonstrate that protein structures carry information about their genetic coding

L. Ackerman-Schraier, A. A. Rosenberg, A. Marx, A. M. Bronstein
Nature Scientific Reports, 2022
Picture for Machine learning approaches demonstrate that protein structures carry information about their genetic coding

Synonymous codons translate into the same amino acid. Although the identity of synonymous codons is often considered
inconsequential to the final protein structure there is mounting evidence for an association between the two. Our study
examined this association using regression and classification models, finding that codon sequences predict protein backbone dihedral angles with a lower error than amino acid sequences, and that models trained with true dihedral angles have better classification of synonymous codons given structural information than models trained with random dihedral angles. Using this classification approach, we investigated local codon-codon dependencies and tested whether synonymous codon identity can be predicted more accurately from codon context than amino acid context alone, and most specifically which codon context position carries the most predictive power.

A. A. Rosenberg, N. Yehishalom, A. Marx, A. M. Bronstein, Defining amino acid pairs as structural units suggests mutation sensitivity to adjacent residues, biorXiv/2022/513383, 2022 details

Defining amino acid pairs as structural units suggests mutation sensitivity to adjacent residues

A. A. Rosenberg, N. Yehishalom, A. Marx, A. M. Bronstein
biorXiv/2022/513383, 2022
Picture for Defining amino acid pairs as structural units suggests mutation sensitivity to adjacent residues

Proteins fold from chains of amino acids, forming secondary structures, α-helices and β-strands, that, at least for globular proteins, subsequently fold into a three-dimensional structure. A large-scale analysis of high-resolution protein structures suggests that amino acid pairs constitute another layer of ordered structure, more local than these conventionally defined secondary structures. We develop a cross-peptide-bond Ramachandran plot that captures the 15 conformational preferences of the amino acid pairs and show that the effect of a particular mutation on the stability of a protein depends in a predictable manner on the adjacent amino acid context.

A. Rosenberg, A. Marx, A. M. Bronstein, Codon-specific Ramachandran plots show amino acid backbone conformation depends on identity of the translated codon, Nature Communications, 2022 details

Codon-specific Ramachandran plots show amino acid backbone conformation depends on identity of the translated codon

A. Rosenberg, A. Marx, A. M. Bronstein
Nature Communications, 2022

Synonymous codons translate into chemically identical amino acids. Once considered inconsequential to the formation of the protein product, there is now significant evidence to suggest that codon usage affects co-translational protein folding and the final structure of the expressed protein. Here we develop a method for computing and comparing codon-specific Ramachandran plots and demonstrate that the backbone dihedral angle distributions of some synonymous codons are distinguishable with statistical significance for some secondary structures. This shows that there exists a dependence between codon identity and backbone torsion of the translated amino acid. Although these findings cannot pinpoint the causal direction of this dependence, we discuss the vast biological implications should coding be shown to directly shape protein conformation and demonstrate the usefulness of this method as a tool for probing associations between codon usage and protein structure. Finally, we urge for the inclusion of exact genetic information into structural databases.