Benchmarks

CZI’s benchmarking suite consists of tools to enable robust and broad task-based benchmarking to drive virtual cell model development. It has three connected components:

  • cz-benchmarks - an open-source Python package for embedding evaluations directly into training or inference code. It contains standardized, community-informed benchmark tasks and metric definitions.
  • VCP CLI - a programmatic interface to interact with the core resources on the Platform - datasets, models, and benchmarks - in a standardized way.
  • The Platform (see below) - an interactive, no-code, web-based interface to explore and compare the benchmarking results.

Together, these pieces make it easy to run benchmarks, evaluate model performance, and explore how models are performing across different tasks. Check out our Benchmarking Principles.

Loading

Updates & Micropublications

At CZI, we are interested in providing the infrastructure and methods support to advance biology-centric benchmarking, therefore bridging the gap between model developers and model users and advancing AI in biology. Read about a few of our initial collaborations and projects in this domain!