SubCell Evaluation Data From OpenCell

Version v1.0,

processed

released 18 Nov 2024

Developed By
  • The Lundberg Lab (Stanford University)
  • Chan Zuckerberg Initiative

This dataset was created to evaluate SubCell models using images from OpenCell. The maximum-intensity z-projection images were cropped centering around each nucleus. The pixel size and image size were further adjusted to match the training data for SubCell. Both the cropped images without further resizing and final processed data are available in 16-bit PNG files.

Dataset Overview

Data Type

Fluorescence microscopy images

Citation

Available Winter 2024

Dataset Card Authors

Chan Zuckerberg Initiative

Dataset Card Contact

virtualcellmodels@chanzuckerberg.com

Uses

Primary Use Cases

  • Evaluate performance and extensibility of SubCell models
  • Train or evaluate other machine learning models

Out-of-Scope or Unauthorized Use Cases

Do not use the dataset for the following purposes:

Dataset Structure

Under s3://czi-subcell-public/opencell-processed/ there are three items:

  • The intermediate folder (18.6 GB) contains single-cell crops from the z-projected images from OpenCell.
  • The resized folder (101.5 GB) contains single-cell crops resized to match the pixel size of SubCell training data from the HPA.
  • Opencell.metadata.formatted.csv (21.5 MB) contains metadata.

Personal and Sensitive Information

No personal and sensitive information is included.

Dataset Creation

Curation Rationale

This dataset was generated to evaluate the performance and extensibility of SubCell models, which expects a certain input format and pixel size.

Source data

  • OpenCell Dataset

Who are the source data producers?

The Leonetti Lab (Chan Zuckerberg Biohub) and the Mann Lab (Max Plank Institute)

Data Collection and Processing

The maximum-intensity z-projection images from OpenCell were used. The nuclei were identified with StardDist and crops of 256 x 256 pixels were generated centering around each identified nucleus. The cropped images were further resized from the original pixel size of OpenCell images of 0.206349 μm/pixel to 0.0800885 μm/pixel in order to match that of the training data of SubCell from Human Protein Atlas (see the HPA for SubCell Dataset), and the images were resized accordingly to be 640 x 640 pixels.

Annotation process

The localization of each protein was obtained from the original OpenCell annotation here: https://opencell.czbiohub.org/download.

Who are the annotators?

The team that generated and processed the source OpenCell data.

Bias, Risks, and Limitations

  • The cropped images may include overlapping regions if they originated from the same field-of-view image.
  • The resized images contain modified information in each pixel and may not be appropriate for places where raw microscopy data is needed.