DynaCLR
Version v0.1.0 released 01 Jul 2025
License
BSD-3-ClauseRepository
https://github.com/mehta-lab/viscyDynaCLR is self-supervised method for embedding cell and organelle dynamics via Contrastive Learning of Representations (CLR) of time-lapse images. It supports diverse downstream biological tasks, such as cell state classification (with minimal human annotation), knowledge transfer between fluorescent and label-free imaging channels, and alignment of cell state dynamics.
Developed By
Model Details
Demo
Check out our Hugging Face demo showing the embeddings and learned representations of dynamic cell states.
Model Architecture
The DynaCLR model architecture consists of three main components designed to map 3D multi-channel patches of single cells to a temporally regularized embedding space:
- Spatial Projection Stem: A convolution layer with a kernel size of (for 3D datasets) or (for 2D datasets), followed by a reshaping operation to map the down-sampled axial dimension to channels. This efficiently projects anisotropic 3D input into a 2D feature map.
- Encoder Backbone: Adapted from the ConvNeXt Tiny architecture. The original stem and head modules are removed, and the backbone outputs a 768-dimensional embedding vector .
- MLP Head: A 2-layer Multi-Layer Perceptron (MLP) head projects the 768-dimensional vector onto a lower 32-dimensional vector to speed up training.
The model accepts 3D multi-channel patches of single cells .
Parameters
The exact parameter count is defined by the modified ConvNeXt Tiny backbone. Computational complexity is estimated as follows:
- Forward Pass Cost: Approximately 754 GFLOPs for input patches of size .
- Training Step Cost: Approximately 2.26 TFLOPs per step.
- Total Training Cost: Approximately 0.5-1 PFLOPs for 100K iterations.
Model Card Authors
Eduardo Hirata-Miyasaki (Biohub)
Citation
Eduardo Hirata-Miyasaki, Soorya Pradeep, Ziwen Liu, Alishba Imran, Taylla Milena Theodoro, Ivan E. Ivanov, Sudip Khadka, See-Chi Lee, Michelle Grunberg, Hunter Woosley, Madhura Bhave, Carolina Arias, Shalin B. Mehta. "DynaCLR: Contrastive Learning of Cellular Dynamics with Temporal Regularization." arXiv:2410.11281v2 [cs.CV], 2025.
Primary Contact Email
shalin.mehta@czbiohub.orgTo submit feature requests or report issues with the model, please open an issue on the GitHub repository.
System Requirements
- GPU-accelerated workstation or cloud GPU instance.
Model Variants
Model Variant | Description | URL |
|---|---|---|
| DynaCLR-ALFI | Trained on U2OS cells from ALFI dataset | https://github.com/mehta-lab/viscy |
| DynaCLR-microglia | Trained on Microglia (IL-17, IF-beta) | https://github.com/mehta-lab/viscy |
| DynaCLR-DENV-VS+Ph | Trained on Phase + Viral Sensor | https://github.com/mehta-lab/viscy |
| DynaCLR-DENV-ER+Ph | Trained on Phase + SEC61 | https://github.com/mehta-lab/viscy |
| DynaCLR-DENV-ER | Trained on SEC61 only | https://github.com/mehta-lab/viscy |
| DynaCLR-Teacher-VS | Teacher model trained on Viral Sensor | https://github.com/mehta-lab/viscy |
| DynaCLR-Student-Ph | Student model trained on Phase | https://github.com/mehta-lab/viscy |
Intended Use
Primary Use Cases
The primary use cases for DynaCLR embeddings include:
- Cell State Classification: Robust classification of dynamic states such as cell division and viral infection using sparse annotations.
- Organelle Remodeling Analysis: Discovery of organelle responses (e.g., ER condensation) due to perturbations like infection.
- Trajectory Alignment: Alignment of asynchronous cellular responses and broken cell tracks using Dynamic Time Warping (DTW) on embeddings.
- Cross-Modal Knowledge Distillation: Distilling cell states from fluorescence channels to label-free channels to enable label-free prediction.
- Clustering: Clustering heterogeneous cell migration patterns and morphotypes.
Out-of-Scope or Unauthorized Use Cases
Do not use the model for the following purposes:
- Clinical Diagnosis: This model is a research tool for basic biology and therapeutic discovery and is not intended for direct clinical diagnostic use without further extensive validation.
- Unethical Surveillance: Any use of the model for bio-surveillance or purposes that violate ethical guidelines regarding biological data.
- Use that violates applicable laws, regulations (including trade compliance laws), or third-party rights such as privacy or intellectual property rights
- Any use that is prohibited by the BSD 3-Clause license or Acceptable Use Policy.
Training Data
The models were trained on five distinct time-lapse datasets:
- ALFI: 2D label-free DIC movies of 3 cell types (HeLa, RPE1, U2OS) with 7-minute resolution.
- Microglia: 3D label-free movies of pharmacologically perturbed human microglia (IL-17, IF-).
- Dengue Infection (A549): 5D datasets (3D volumes + time + channels) of A549 cells infected with Dengue virus (MOI 0 and 5).
- Infection/Cycle (High Res): 5D datasets representing infection and cell cycle dynamics.
- SEC61 Organelle: Two 5D datasets encoding infection and SEC61 (ER marker) at 10-minute and 30-minute resolutions.
Training Procedure
- Pre-processing: Data was converted to OME-Zarr format. Phase images were reconstructed and normalized per field-of-view; fluorescence images were scaled to specific percentiles.
- Tracking: Single-cell tracks were generated using Ultrack.
- Sampling Strategy: The model uses Time-aware contrastive sampling, where positive pairs are images of the same cell at time and , and negative pairs are sampled from different cells. This imposes a temporal regularization prior.
- Augmentations: Extensive augmentations were applied, including random spatial scaling, rotation, shearing, contrast adjustment, intensity scaling, and Gaussian noise/smoothing.
Training Code
The model architecture, training, and prediction code is available at: https://github.com/mehta-lab/viscy.
Speeds, Sizes, Times
- Hardware: Training was performed on an HPC cluster using 2-4 GPUs with distributed data parallel (DDP) strategy.
- Training Time: Varied from ~1 hour (ALFI dataset) to ~48 hours (infection and organelle remodeling models).
- Throughput: Each training step is estimated at 2.26 TFLOPs.
Training Hyperparameters
- Optimizer: AdamW
- Learning Rate:
- Batch Size: 128
- Loss Functions: NT-Xent loss (temperature 0.3 for ALFI, 0.5 for others) or Triplet loss (margin 0.5)
- Time Offset (): A hyperparameter empirically chosen based on timescales (e.g., adjacent frames or specific intervals like 7-91 mins).
Data Sources
The following datasets were used for training and evaluation:
Performance Metrics
Metrics
The model was evaluated using task-specific and task-agnostic metrics:
- Classification F1 Score: Used to measure the accuracy of linear classifiers trained on embeddings for tasks like infection state and cell division.
- Smoothness: The ratio of the mean distance between adjacent timepoints to the mean distance between random timepoints; lower values indicate temporally regularized embeddings.
- Dynamic Range (DR): The difference between the peaks of embedding distance distributions for random vs. adjacent frame pairs.
- Alignment Cost: Evaluated via Dynamic Time Warping (DTW) to align asynchronous trajectories.
Evaluation Datasets
- ALFI Test Set: Trained on U2OS cells, tested on unperturbed HeLa and RPE1 cells to evaluate generalization.
- Microglia Test Set: Models trained on IL-17/IF- conditions were tested on hold-out glioblastoma-treated conditions.
- Dengue Test Set: Models were tested on independent test data acquired with different microscopes and different time resolutions (10 min vs 30 min).
Evaluation Results
- Classification: DynaCLR models achieved F1 scores >98% for cell cycle and infection state classification, comparable to or better than baselines.
- Temporal Regularization: Time-aware sampling significantly improved smoothness (0.12 vs. 0.35 for ImageNet) and dynamic range (1.29 vs. 0.8 for ImageNet) compared to baselines on the ALFI dataset.
- Generalization: DynaCLR embeddings generalized effectively to out-of-distribution data acquired with diverse imaging systems and cell types.
- Annotation Efficiency: Knowledge distillation enabled the generation of 133,214 pseudo-labels from a teacher model trained on only 10,510 annotations, allowing a student model to achieve high accuracy from label-free images.
Biases, Risks, and Limitations
Potential Biases
- Data Bias: The models are trained on specific cell lines (HeLa, RPE1, U2OS, A549, Microglia) and may not generalize perfectly to other cell types without fine-tuning.
- Annotation Bias: Human annotation of 3D movies is expensive and prone to bias. The model may learn these biases during the supervised validation or distillation phases.
Risks
- Misinterpretation of Pseudo-time: Asynchronous cell state dynamics are aligned to a pseudo-time axis; incorrect alignment could lead to false conclusions about the sequence of morphological changes.
- Feature Attribution: While interpretable features correlate with embeddings, relying solely on attribution maps without experimental validation carries a risk of biological misinterpretation.
Limitations
- Tracking Dependencies: The method relies on single-cell tracking (Ultrack) to generate training patches. Tracking errors can affect input quality, although the method showed robustness to some errors.
- Late Infection Stages: The semantic segmentation model used for annotations struggled to capture late infection stages and cell death due to loss of fluorescence signal.
- Computational Cost: Training on large 5D datasets is computationally expensive, taking up to 48 hours on HPC clusters.
Caveats and Recommendations
-
Hyperparameter Tuning: The time offset () for contrastive sampling should be empirically chosen based on the time scales of the dynamic process being studied.
-
Validation: Users should validate embeddings on their specific biological systems using known controls before drawing conclusions about novel phenotypes.
-
We are committed to advancing the responsible development and use of artificial intelligence. Please follow our Acceptable Use Policy when using the model.
-
Should you have any security or privacy issues or questions related to the model, please reach out to our team at security@chanzuckerberg.com or privacy@chanzuckerberg.com, respectively.
Acknowledgements
We thank Talon Chandler and Sandra Schmid for critical feedback on the manuscript. The Chan Zuckerberg Initiative funded this research through the Biohub San Francisco.