Insights from the CZI-hosted Benchmarking and Evaluation Workshop

Learn about insights from a recent CZI workshop on building robust benchmarking for AI in biology

By Liz Fahsbender | December 9, 2024

Wide shot of the participants of the Benchmarking and Evaluation Workshop, standing in front of the CZI headquarters.

Overview

CZI recently hosted a workshop focused on benchmarking and evaluation of AI models in biology, and the insights gained have reinforced our commitment to supporting the development of a robust benchmarking infrastructure. This blog post aims to share key takeaways from the workshop and highlight CZI's vision for the future of benchmarking in this domain.

The Importance of Benchmarks and Evaluation

At CZI, we recognize the transformative potential of modeling in driving biomedical breakthroughs. However, ensuring the reliability and efficacy of these models is paramount. This is where benchmarks play a crucial role. By providing standardized evaluation frameworks, benchmarks enable us to:

  • Assess model performance objectively: Benchmarks offer a level playing field for comparing different models and tracking progress over time.
  • Foster trust and transparency: Robust benchmarks build confidence in model predictions and promote transparency in model development.
  • Accelerate scientific discovery: By facilitating model selection and improvement, benchmarks pave the way for more rapid and reliable scientific discoveries.

A Workshop Built on Discussion

The workshop brought together a diverse group of experts from various fields, including imaging, genomics, and proteomics. We prioritized interactive discussions and breakout sessions, creating an environment where participants could actively engage with the challenges and opportunities in benchmarking. This format allowed for a dynamic exchange of ideas and perspectives, leading to rich insights and a strong sense of community ownership over the outcomes.

To facilitate these discussions, we organized the workshop around key themes, including:

  • The current state of benchmarking: We explored the existing landscape of benchmarking in different biological domains, identifying areas of strength and areas where further development is needed.
  • Challenges and opportunities: In small group discussions, we dove into the specific challenges associated with benchmarking increasingly complex AI models in biology, as well as the opportunities for improvement and innovation.
  • Data for benchmarking: We discussed the critical aspects of creating and sharing high-quality benchmarking datasets, including data validation, metadata standards, and accessibility.
  • Community-driven benchmarking: We emphasized the importance of community involvement in proposing, developing, and implementing benchmarks, exploring successful examples of community-driven efforts.

Key Insights from the CZI Workshop

Through engaging discussions and collaborative sessions, several key themes emerged:

  • Prioritizing reproducibility: Reproducibility is the cornerstone of scientific progress. The workshop emphasized the need for clear documentation, well-structured code, and standardized data preprocessing to ensure findings can be recapitulated.
  • Ensuring data quality: High-quality, well-annotated datasets are essential for constructing meaningful benchmarks. Standardized data formats, comprehensive metadata, and rigorous data curation are crucial for facilitating data sharing and reuse.
  • Moving beyond single metrics: Evaluating model performance requires a multifaceted approach. Relying solely on single metrics can be misleading. Instead, we must consider multiple aspects of model behavior and tailor evaluation strategies to specific research questions.
  • Embracing community collaboration: Developing a thriving benchmarking ecosystem necessitates active community engagement. Sharing data, tools, and best practices, as well as collaboratively developing new benchmarks, will be vital for addressing the diverse needs of the research community.
  • Aggregating benchmarking assets: Bringing together datasets, tools, and best practices into a centralized hub will streamline the evaluation process and promote collaboration.
  • Benchmark maintenance and evolution: Rapid advancements in biological technologies and AI models require continuous re-evaluation and adaptation of evaluation metrics to ensure their ongoing relevance and rigor.

CZI's Vision for Benchmarking

CZI is committed to playing a driving force in fostering a robust benchmarking infrastructure for biological modeling. To this end, we are dedicated to:

  • Working closely with the community: We will actively seek input from researchers to ensure that our benchmarking efforts are transparent, objective, and aligned with the needs of the broader scientific community.
  • Investing in data and infrastructure: We plan to support the development of a robust benchmarking infrastructure while continuing to support the generation of high-quality datasets and benchmarks, making them accessible to facilitate model evaluation and comparison.
  • Promoting collaboration and knowledge sharing: We will foster collaboration among researchers by bringing people together in formats similar to this meeting and promote the dissemination of best practices in benchmarking and model evaluation via publications and other informal communications.

Looking Ahead

We are excited about the future of benchmarking in biological modeling and its potential to accelerate biomedical research. We believe that by working together and prioritizing reproducibility, data quality, and community engagement, we can build a robust and transparent benchmarking infrastructure that empowers researchers to develop and deploy impactful models.

At the outset, this means being open about how benchmarks are developed, validated, and interpreted, with clear documentation that inspires confidence among researchers and the public alike. We will prioritize making benchmarks easy to run and accessible to a wide array of stakeholders, from experimental biologists to machine learning researchers. This requires not only technical foundations offering multiple access methods and support for interoperability, but also ensuring that benchmarks are scientifically rigorous and biologically relevant, designed in collaboration with experts who possess a deep understanding of both AI and biology.

As we look forward, we aim to be future-focused, anticipating the evolving needs of biological research by both hosting present-day leaderboards alongside forward-looking competitions. Ultimately, we envision a collaborative ecosystem where the community actively contributes to defining standards and metrics, with CZI serving as a facilitator of infrastructure to support collective knowledge and innovation in biological benchmarking.

Stay tuned for our upcoming publication with in-depth learnings and actionable recommendations!