Allen Cell WTC-11 hiPSC Single-Cell Image Dataset
Version v1.0, source
released 04 Jan 2023
source
released 04 Jan 2023- Allen Institute for Cell Science
This dataset was generated to understand intracellular organization and cell-to-cell variation in a large number of single cells. It contains 3D images of live cells across 25 cell lines derived from human induced pluripotent stem cells (WTC-11). In each cell line, a protein found in a particular organelle or cellular structure was endogenously tagged with mEGFP and imaged alongside the cell membrane and DNA. The field-of-view images (16-bit), single-cell crops (16-bit) and their corresponding segmentation files (8-bit) are provided in OME-TIFF format.
Dataset Overview
Data Type
Fluorescence microscopy images
Citation
Publication: Integrated intracellular organization and its variations in human iPS cells. Nature (2023) https://doi.org/10.1038/s41586-022-05563-7
Dataset: Allen Institute for Cell Science (2021). hiPSC Single-cell Image Dataset [DATASET]. Available from allencell.org/data-downloading
Dataset Card Authors
Chan Zuckerberg Initiative
Dataset Card Contact
virtualcellmodels@chanzuckerberg.comUses
Primary Use Cases
- Investigate intracellular organization in live, single cells
- Study cell-to-cell variations
Out-of-Scope or Unauthorized Use Cases
Do not use the dataset for the following purposes:
- Usage not covered by the license.
- Any use that is not in accordance with the Acceptable Use Policy
Dataset Structure
The images are organized by processing steps. Folder fov_path
contains the raw field-of-view images and their
corresponding segmentations are in folder fov_seg_path
for the cell and nuclei, in folder struc_seg_path
for the
cellular structure visualized by mEGFP. Cropped single-cell images are in folder crop_raw
and their corresponding
segmentations are in folder crop_seg
. All images are in OME-TIFF format.
The image filenames are random strings and their information can be looked up in the metadata.csv
(1.7 GB). The
comprehensive metadata table also includes annotations such as cell cycle stage for the individual cells.
For how images are organized in folders, see https://open.quiltdata.com/b/allencell/packages/aics/hipsc_single_cell_image_dataset.
Personal and Sensitive Information
No personal and sensitive information is included.
Dataset Creation
Curation Rationale
These images were taken to understand 3D cellular structure localizations and cell-to-cell variations in live cells.
Data Collection and Processing
A protein found in a particular organelle or cellular structure was endogenously tagged with mEGFP using CRISPR-Cas9 system in WTC-11 cells to generate a stable cell line. 25 such cell lines were generated. The cells were incubated in cell membrane and DNA dyes prior to imaging, so each image contains 3 channels: the tagged protein, cell membrane, and DNA. Images were taken with a confocal microscope with 50–150 z-slices. More details see the reference: https://www.nature.com/articles/s41586-022-05563-7.
Annotation Process
This dataset comes with comprehensive annotations in the 1200 columns of the metadata file. They include features extracted from single cells such as cell, nuclear, and intracellular volumes; and morphology of the tagged cellular structure, computed through segmentations using tools in https://github.com/AllenCell/segmenter_model_zoo and https://github.com/AllenCell/aics-segmentation. They also include field-of-view or colony-level features such as whether a cell is on the edge of a colony. Lastly, each cell has a label for its cell cycle stage generated by the combination of rule-based criteria and a deep learning based classifier https://open.quiltdata.com/b/allencell/packages/aics/mitotic_annotation. More details see https://www.nature.com/articles/s41586-022-05563-7.
Who are the annotators?
The annotation was done by the team who generated and processed the data.
Bias, Risks, and Limitations
- The cropped images have varying pixel sizes and numbers of z-stack slices.
- mGFP tags sometimes could impact protein localization, function, and protein-protein interactions.
- The cell line with tagged ACTN1 which is used in the dataset is not available for the public due to some technical difficulties. Another cell line with tagged ACTN2 is available.
More Information
The extracted features can be explored at: https://cfe.allencell.org/
Acknowledgements
See source reference: https://www.nature.com/articles/s41586-022-05563-7