SubCell Evaluation Data From OpenCell
Version v1.0, processed
released 18 Nov 2024
processed
released 18 Nov 2024- The Lundberg Lab (Stanford University)
- Chan Zuckerberg Initiative
This dataset was created to evaluate SubCell models using images from OpenCell. The maximum-intensity z-projection images were cropped centering around each nucleus. The pixel size and image size were further adjusted to match the training data for SubCell. Both the cropped images without further resizing and final processed data are available in 16-bit PNG files.
Dataset Overview
Data Type
Fluorescence microscopy images
Citation
Available Winter 2024
Dataset Card Authors
Chan Zuckerberg Initiative
Dataset Card Contact
virtualcellmodels@chanzuckerberg.comUses
Primary Use Cases
- Evaluate performance and extensibility of SubCell models
- Train or evaluate other machine learning models
Out-of-Scope or Unauthorized Use Cases
Do not use the dataset for the following purposes:
- Usage not covered by the license.
- Any use that is not in accordance with the Acceptable Use Policy
Dataset Structure
Under s3://czi-subcell-public/opencell-processed/
there are three items:
- The
intermediate
folder (18.6 GB) contains single-cell crops from the z-projected images from OpenCell. - The
resized
folder (101.5 GB) contains single-cell crops resized to match the pixel size of SubCell training data from the HPA. Opencell.metadata.formatted.csv
(21.5 MB) contains metadata.
Personal and Sensitive Information
No personal and sensitive information is included.
Dataset Creation
Curation Rationale
This dataset was generated to evaluate the performance and extensibility of SubCell models, which expects a certain input format and pixel size.
Source data
- OpenCell Dataset
Who are the source data producers?
The Leonetti Lab (Chan Zuckerberg Biohub) and the Mann Lab (Max Plank Institute)
Data Collection and Processing
The maximum-intensity z-projection images from OpenCell were used. The nuclei were identified with StardDist and crops of 256 x 256 pixels were generated centering around each identified nucleus. The cropped images were further resized from the original pixel size of OpenCell images of 0.206349 μm/pixel to 0.0800885 μm/pixel in order to match that of the training data of SubCell from Human Protein Atlas (see the HPA for SubCell Dataset), and the images were resized accordingly to be 640 x 640 pixels.
Annotation process
The localization of each protein was obtained from the original OpenCell annotation here: https://opencell.czbiohub.org/download.
Who are the annotators?
The team that generated and processed the source OpenCell data.
Bias, Risks, and Limitations
- The cropped images may include overlapping regions if they originated from the same field-of-view image.
- The resized images contain modified information in each pixel and may not be appropriate for places where raw microscopy data is needed.