The Poetry Fan Who Taught an LLM to Read and Write DNA

TLC (Teaching and Learning College)

The Poetry Fan Who Taught an LLM to Read and Write DNA

February 07, 2025 at 02:15AM

Computer scientist Brian Hie knows that it’s hard for humans to understand the sequences of nucleotide bases A, T, C and G that make up DNA. So he created Evo, a genomic large language model (LLM) to help scientists better understand “the code for life.” In this interview with Quanta Magazine, Hie chats with Ingrid Wickelgren about how his love of poetry preceded the project, and how helping humans to better understand DNA sequencing can fuel discoveries that will help people and the planet.

I have very broad interests, and I explored a lot of career paths. At one point in my life, I wanted to pursue a Ph.D. in English literature. In high school and college, I learned to appreciate poetry. The type of poetry I really liked had lyrics that have lots of structure and grand concepts and use language in very new and interesting ways.

The affinity for scanning a sonnet or identifying structure in a well-composed English lyric is similar to wanting to develop models that make genomic or protein sequences more interpretable and reveal their hidden structure. It’s almost like literary criticism on biology sequences. In that way, I’m still doing literary criticism.

I also expect the models to aid biological discovery. When you sequence a new organism from nature, you just get DNA. It’s very hard to identify what parts of the genome correspond to different functions. If the models can learn the concept of, say, a phage defense system or a biosynthetic pathway, they will help us annotate and discover new biological systems in sequencing data. The algorithm is fluent in the language, whereas humans are very much not.



from Longreads https://longreads.com/2025/02/06/the-poetry-fan-who-taught-an-llm-to-read-and-write-dna/
via IFTTT

Watch
Tags

Post a Comment

0Comments
Post a Comment (0)