OpenAI Releases LifeSciBench: Measuring AI Systems' Capabilities in Real-World Scientific Research Scenarios
CoinFeed, June 20 - OpenAI officially released a new evaluation benchmark, LifeSciBench, designed to measure AI systems' capabilities in real-world scientific research scenarios. LifeSciBench is based on 750 expert-crafted tasks covering 7 categories of research workflows and 7 biology domains. The tasks were sourced from 173 researchers with PhD backgrounds and experience in biotech or pharmaceutical industries. The benchmark emphasizes the assessment of complex scientific research capabilities, including evidence integration, experimental design, data analysis, scientific reasoning, and scientific communication, rather than simple factual questions. Over 79% of tasks involve multi-step reasoning, with an average of about 4 reasoning steps per question, and include 1,062 real scientific research-related data attachments (such as papers, figures, sequence data, and structure files).