Deep learning in biology and medicine
In this post, we examine applications of deep learning to three key biomedical problems: patient classification, fundamental biological processes, and treatment of patients. The objective is to predict whether deep learning will transform these tasks.
The post is based on the very comprehensive paper “Opportunities and obstacles for deep learning in biology and medicine”.
The paper places a high bar i.e. on the lines of Andy Grove’s inflection point to refer to a change in technologies or environment that requires a business to be fundamentally reshaped.
The three classes of applications are described as follows:
Disease and patient categorization: the accurate classification of diseases and disease subtypes. In oncology, current “gold standard” approaches include histology, which requires interpretation by experts, or assessment of molecular markers such as cell surface receptors or gene expression.
Fundamental biological study: application of deep learning to fundamental biological questions using methods based on leveraging large amounts.
Treatment of patients: new methods to recommend patient treatments, predict treatment outcomes, and guide the development of new therapies.
Within these, areas where deep learning plays a part for biology and medicine are
Deep learning and patient categorization
- Imaging applications in healthcare
- Electronic health records
- Challenges and opportunities in patient categorization
Deep learning to study the fundamental biological processes underlying human disease
- Gene expression
- Transcription factors and RNA-binding proteins
- Promoters, enhancers, and related epigenomic tasks
- Micro-RNA binding
- Protein secondary and tertiary structure
- Morphological phenotypes
- Single-cell data
- Sequencing and variant calling
The impact of deep learning in treating disease and developing new treatments
- Clinical decision making
- Drug repositioning
- Drug development
There are a number of areas that impact deep learning in biology and medicine
- Evaluation metrics for imbalanced classification
- Formulation of classification labels
- Formulation of a performance upper bound
- Interpretation and explainable results
- Hardware limitations and scaling
- Data, code, and model sharing
- Multimodal, multi-task, and transfer learning
I found two particularly interesting aspects: interpretability and data limitations. As per the paper:
- deep learning lags behind most Bayesian models in terms of interpretability but the interpretability of deep learning is comparable to other widely-used machine learning methods such as random forests or SVMs.
- A lack of large-scale, high-quality, correctly labeled training data has impacted deep learning in nearly all applications discussed, from healthcare to genomics to drug discovery.
- The challenges of training complex, high- parameter neural networks from few examples are obvious, but uncertainty in the labels of those examples can be just as problematic.
- For some types of data, especially images, it is straightforward to augment training datasets by splitting a single labeled example into multiple
- Simulated or semi-synthetic training data has been employed in multiple biomedical domains, though many of these ideas are not specific to deep
- Data can be simulated to create negative examples when only positive training instances are available.
- Multimodal, multi-task, and transfer learning, can also combat data limitations to some
The authors conclude that deep learning has yet to revolutionize or definitively resolve any of these problems, but that even when improvement over a previous baseline has been modest, there are signs that deep learning methods may speed or aid human investigation.
This is a comprehensive paper which I recommend
The paper also has a github repository
Image source pixabay