Modern frontier LLMs are descendants of a single model architecture: the Transformer, first published in 2017. Transformers are data hungry. A state of the art LLM today is trained on text that would take an average reader on the order of 100,000 years to read. Internet data, digitized books, and more recently, expert annotation marketplaces have enabled this scale of training data. But in many fields, including the life sciences, data is scarce. Scarcer still is interventional data, from which models learn causation more readily. In a world racing to scale the same architecture with more compute and data, what worked when scale was not an option? And what role did model architecture play in the answer?
Save the date!
Please join on May 14-15, 2026