# ProteinQure Announces A Breakthrough Therapeutic with Remarkable Efficacy

CB Insights (a leading tech and startup publication) has included ProteinQure as a top AI company in the Healthcare space for 2020. We thought we could celebrate by highlighting some of the methods that ProteinQure has used to help generate novel proteins for therapeutic purposes.

The field of Machine Learning (ML) is generating a continuous avalanche of papers, even when restricted to the domain of Computational Biology. At ProteinQure, we stay up to date with the newest academic developments with an eye for pragmatically applying them to our projects. Here are some papers we consider to be “hidden gems” with unique insights you may have missed.

In most cases, Deep Learning requires huge amounts of training data. Acquiring that training data for drug discovery is usually expensive since it involves procuring data points from wet-lab experiments. Even if a huge dataset for training neural networks is assembled, it will still be bottlenecked by neural networks’ inability to extrapolate to new data points. This can be a real problem in drug discovery since the dimensionality of data space increases exponentially even with minor increases in sequence space to explore. It’s unfeasible to curate a dataset that is representative of the complete potential input distribution.

Biophysics based approaches like Molecular Dynamics (MD) simulations circumvent this issue by directly modeling protein dynamics/properties. Unfortunately, trajectories from MD simulations are often computationally expensive to obtain, especially for larger timescales.

To accelerate an MD simulation, a subset of possible trajectories is sampled as a way to “cheat” around simulating every time-step of a trajectory. Sampling is performed until an acceptable trajectory is found, which can take a long time.

In “Neural Networks based Variationally Enhanced Sampling” (NN-VES) a group from ETH Zurich proposes biasing this trajectory sampling using ML. It does so by assuming that a system can be described in terms of a simpler set of variables, called Collective Variables. The desired state is modeled as a probability distribution over this set of Collective Variables.

By optimizing over the target distribution, NN-VES is able to find an acceptable trajectory much faster than a vanilla MD simulation.

In most cases, Deep Learning requires huge amounts of training data. Acquiring that training data for drug discovery is usually expensive since it involves procuring data points from wet-lab experiments. Even if a huge dataset for training neural networks is assembled, it will still be bottlenecked by neural networks’ inability to extrapolate to new data points. This can be a real problem in drug discovery since the dimensionality of data space increases exponentially even with minor increases in sequence space to explore. It’s unfeasible to curate a dataset that is representative of the complete potential input distribution.

Biophysics based approaches like Molecular Dynamics (MD) simulations circumvent this issue by directly modeling protein dynamics/properties. Unfortunately, trajectories from MD simulations are often computationally expensive to obtain, especially for larger timescales.

To accelerate an MD simulation, a subset of possible trajectories is sampled as a way to “cheat” around simulating every time-step of a trajectory. Sampling is performed until an acceptable trajectory is found, which can take a long time.

In “Neural Networks based Variationally Enhanced Sampling” (NN-VES) a group from ETH Zurich proposes biasing this trajectory sampling using ML. It does so by assuming that a system can be described in terms of a simpler set of variables, called Collective Variables. The desired state is modeled as a probability distribution over this set of Collective Variables.

By optimizing over the target distribution, NN-VES is able to find an acceptable trajectory much faster than a vanilla MD simulation.

In most cases, Deep Learning requires huge amounts of training data. Acquiring that training data for drug discovery is usually expensive since it involves procuring data points from wet-lab experiments. Even if a huge dataset for training neural networks is assembled, it will still be bottlenecked by neural networks’ inability to extrapolate to new data points. This can be a real problem in drug discovery since the dimensionality of data space increases exponentially even with minor increases in sequence space to explore. It’s unfeasible to curate a dataset that is representative of the complete potential input distribution.

Biophysics based approaches like Molecular Dynamics (MD) simulations circumvent this issue by directly modeling protein dynamics/properties. Unfortunately, trajectories from MD simulations are often computationally expensive to obtain, especially for larger timescales.

To accelerate an MD simulation, a subset of possible trajectories is sampled as a way to “cheat” around simulating every time-step of a trajectory. Sampling is performed until an acceptable trajectory is found, which can take a long time.

In “Neural Networks based Variationally Enhanced Sampling” (NN-VES) a group from ETH Zurich proposes biasing this trajectory sampling using ML. It does so by assuming that a system can be described in terms of a simpler set of variables, called Collective Variables. The desired state is modeled as a probability distribution over this set of Collective Variables.

By optimizing over the target distribution, NN-VES is able to find an acceptable trajectory much faster than a vanilla MD simulation.