Machine Learning on Biological Sequences
Advisor: Hajk-Georg Drost
PhD Program: International Max Planck Research School (IMPRS) 'From Molecules to Organisms'
Location: Max Planck Institute for Biology
The ambition of our research group is to organise the protein universe of the Tree of Life (ToL) to perform subsequent machine learning, causal inference and predictive tasks.
This approach allows us to explore how generalisable some species specific genomic patterns are when extended to all species across the entire ToL. To achieve this, we have to overcome computational bottlenecks by developing our own comparative methodology and software solutions to be able to scale to the ToL.
For example, we introduced the protein aligner DIAMOND2 (Buchfink et al., 2021) to replace the gold standard BLAST for tree-of-life scale sequence search applications. We are currently extending the DIAMOND2 biosphere genomics framework to performestablished phylogenomic and functional genomics tasks but now at tree-of-life scale.
For this line of ToL research, we seek a curious doctoral researcher who is interested in learning more about applying machine learning to biological data. The doctoral researcher will work on biological sequence data for various predictive tasks, both in comparative (gene evolution) and functional (e.g. AlphaFold2, ESMFold, etc) genomics applications.
More information about the research of Hajk-Georg Drost and a selection of recent publications can be found on his faculty page.