Adamson et al.
Version v1.0, source
released 15 Dec 2017
source
released 15 Dec 2017Britt Adamson, Thomas M Norman, Marco Jost, Min Y Cho, James K Nuñez, Yuwen Chen, Jacqueline E Villalta, Luke A Gilbert, Max A Horlbeck, Marco Y Hein, Ryan A Pak, Andrew N Gray, Carol A Gross, Atray Dixit, Oren Parnas, Aviv Regev, Jonathan S Weissman
This dataset comprises single-cell RNA sequencing (scRNA-seq) data generated from a multiplexed CRISPR screening platform. It captures transcriptional profiles resulting from targeted genetic perturbations, facilitating the systematic study of the unfolded protein response (UPR) at a single-cell resolution.
Dataset Overview
Data Type
Single-cell RNA sequencing data
Citation
Publication: A Multiplexed Single-Cell CRISPR Screening Platform Enables Systematic Dissection of the Unfolded Protein Response. Cell (2016). https://doi.org/10.1016/j.cell.2016.11.048
Dataset Card Authors
Ana-Maria Istrate, CZI
Dataset Card Contact
virtualcellmodels@chanzuckerberg.comUses
Primary Use Cases
- Analyzing genetic interactions affecting the unfolded protein response
- Studying transcriptional responses to specific genetic perturbations
- Modeling cellular state changes under stress conditions
Out-of-Scope or Unauthorized Use Cases
- Discriminatory or biased analyses
- Any use that is not in accordance with the Acceptable Use Policy link
- Any use prohibited by the dataset's license
Intended Users
- Researchers and scientists in genomics and cellular biology
- Bioinformaticians analyzing single-cell data
Dataset Structure
The dataset includes scRNA-seq data from a multiplexed CRISPR screening platform, detailing transcriptional profiles under various genetic perturbations related to the unfolded protein response.
Personal and Sensitive Information
The dataset should not contain personal or sensitive information.
Dataset Creation
Curation Rationale
To systematically dissect the unfolded protein response by analyzing the effects of targeted genetic perturbations at a single-cell level.
Source Data
Single-cell RNA sequencing data from a multiplexed CRISPR screening platform targeting genes involved in the unfolded protein response.
Who are the source data producers?
The data was produced by the authors of the study.
Data Collection and Processing
Original data was collected using a multiplexed CRISPR screening platform combined with single-cell RNA sequencing to profile transcriptional responses to genetic perturbations as detailed in [1]. We are using a processed version of the dataset from GEARS [2] v=0.0.2.
-
Data Processing: Cell observations in the dataset are log-normalized and filtered to the top 5000 highly variable genes. The test is divided into train/val/test splits. The dataset split procedure is detailed in the GEARS [2] Supplementary Material. The processed version of the dataset is distributed as follows:
- 44340 observations spanning 86 single-gene perturbations, split 57/7/22 across train/val/test
- 24263 control samples
-
Reproducibility:
from GEARS import PertData dataset_name = 'adamson' pert_data = PertData("data/") pert_data.load(data_name=dataset_name) pert_data.prepare_split(split=split, seed=1) pert_data.get_dataloader(batch_size=64, test_batch_size=64)
-
Reference: GEARS [2] Supplementary Material (Supplementary Note 3: Data preprocessing, Supplementary Note 10: Generating a data split for model evaluation),
Bias, Risks, and Limitations
- Potential Biases:
- The dataset may not represent all possible genetic interactions affecting the unfolded protein response.
- Risks:
- Misinterpretation of genetic interaction effects.
- Limitations:
- The data may not generalize to all cell types or organisms.
Caveats and Recommendations
- Users should consider the specific experimental conditions and cell types when interpreting the data.
- We are committed to advancing the responsible development and use of artificial intelligence. Please follow our Acceptable Use Policy link when using our services.
Glossary
- CRISPR Screening: A technique that uses CRISPR-Cas9 technology to introduce targeted genetic perturbations for studying gene function.
- Unfolded Protein Response (UPR): A cellular stress response related to the endoplasmic reticulum stress, aimed at restoring normal function by halting protein translation and activating signaling pathways.
- scRNA-seq: Single-cell RNA sequencing, a method to examine the gene expression of individual cells.
More Information
- For detailed methodologies and analyses, refer to the original publication: A Multiplexed Single-Cell CRISPR Screening Platform Enables Systematic Dissection of the Unfolded Protein Response.
Acknowledgements
The authors acknowledge the contributions of their respective institutions and funding bodies.
Entity Count
Note that the values below are for the processed version of the dataset:
- Cells: 68603 observations
- Genes: 5060 genes
References
- [1] Adamson, Britt, et al. "A multiplexed single-cell CRISPR screening platform enables systematic dissection of the unfolded protein response." Cell 167.7 (2016): 1867-1882.
- [2] Roohani, Yusuf, Kexin Huang, and Jure Leskovec. "Predicting transcriptional outcomes of novel multigene perturbations with GEARS." Nature Biotechnology 42.6 (2024): 927-935.