Andrew White | FutureHouse
FutureHouse
Health Research Alliance
March 2025
Science is changing independent of AI
Arxiv.org,10.6084/m9.figshare.17064419.v3
Intellectual bottlenecks are growing
📝 Increasing paper count ($\approx$5M per year)
🧬 Larger data sets from cheaper experiments (genome at
$200 per person, $1 / GB of sequencing)
🔍Increasingly less disruptive papers (96% decline in biology)
Accelerate Scientific Discovery with Language Agents
Can LLMs do science already?
LAB-Bench: Measuring Capabilities of Language Models for Biology Research
Jon M. Laurent, Joseph D. Janizek, Michael Ruzo, Michaela M. Hinks, Michael J. Hammerling, Siddharth Narayanan, Manvitha Ponnapati, Andrew D. White, Samuel G. Rodriques arXiv:2407.10362, 2024
MMLU-Pro
Not textbook knowledge
Human baselines exceed LLMs
LitQA2 Question
Which three residues with evolutionary divergence in the G domains of RAS isoforms also impose selectivity constraints of pan-KRAS non-covalent inhibition?
GPT-4 Response (prompted to cite)
The three residues with evolutionary divergence in the G domains of KRAS, HRAS, and NRAS isoforms that impose selectivity constraints for pan-KRAS non-covalent inhibition are Thr35, Asp38, and Glu37. It has been observed that these three amino acids significantly influence the binding affinity and selectivity of potential non-covalent inhibitors, thereby determining the efficacy of different therapeutic strategies targeting the RAS isoforms (Hunter et al., 2015). The understanding of these residues' evolutionary conservation and differences across isoforms can aid the development of more effective pan-RAS inhibitors (Lu et al., 2016).
Wrong, hallucinated references
PaperQA Agent Response
Three key residues in the G domains of RAS isoforms that affect KRAS inhibitor selectivity have been identified:
What is an agent?
Agent: trained, makes decisions
Environment: untrained, has tools, state
Environment | Key Tools | |
---|---|---|
PaperQA | Literature Research | Search, Citation Traversal |
ProteinCrow | Designing novel proteins | AlphaFold2, Molecular Dynamics |
ChemCrow | Designing new molecules | Retrosynthesis, self-driving robotic lab |
Language agents achieve superhuman synthesis of scientific knowledge
Michael D. Skarlinski, Sam Cox, Jon M. Laurent, James D. Braza, Michaela Hinks, Michael J. Hammerling, Manvitha Ponnapati, Samuel G. Rodriques, Andrew D. White arXiv:2409.13740, 2024
Better at answering questions than PhD biology experts
Accuracy vs Precision
Better than human written Wikipedia articles
Difference between PaperQA and Wikipedia
WikiCrow | Wikipedia | |
---|---|---|
Unsupported Breakout (total) | 23 | 42 |
Reasoning Issues | 12 | 26 |
Attribution Issues | 10 | 16 |
Trivial Statements | 1 | 0 |
Can detect if a claim is contradicted anywhere in literature
Progress on Accuracy