DynaCLR

Version v0.1.0 released 01 Jul 2025

License

Repository

DynaCLR is self-supervised method for embedding cell and organelle dynamics via Contrastive Learning of Representations (CLR) of time-lapse images. It supports diverse downstream biological tasks, such as cell state classification (with minimal human annotation), knowledge transfer between fluorescent and label-free imaging channels, and alignment of cell state dynamics.

Developed By

Eduardo Hirata-Miyasaki¹, Soorya Pradeep¹, Ziwen Liu¹, Alishba Imran¹, Taylla Milena Theodoro¹, Ivan E. Ivanov¹, Sudip Khadka¹, See-Chi Lee¹, Michelle Grunberg¹, Hunter Woosley¹, Madhura Bhave¹, Carolina Arias¹, Shalin B. Mehta¹

1 Chan Zuckerberg Biohub San Francisco

Get Started with Model

Model Details

Demo

Check out our Hugging Face demo showing the embeddings and learned representations of dynamic cell states.

Model Architecture

The DynaCLR model architecture consists of three main components designed to map 3D multi-channel patches of single cells to a temporally regularized embedding space:

Spatial Projection Stem: A convolution layer with a kernel size of $5 \times 4 \times 4$ (for 3D datasets) or $1 \times 4 \times 4$ (for 2D datasets), followed by a reshaping operation to map the down-sampled axial dimension to channels. This efficiently projects anisotropic 3D input into a 2D feature map.
Encoder Backbone: Adapted from the ConvNeXt Tiny architecture. The original stem and head modules are removed, and the backbone outputs a 768-dimensional embedding vector $z$ .
MLP Head: A 2-layer Multi-Layer Perceptron (MLP) head projects the 768-dimensional vector onto a lower 32-dimensional vector $p$ to speed up training.

The model accepts 3D multi-channel patches of single cells $x_{i} (t)$ .

Parameters

The exact parameter count is defined by the modified ConvNeXt Tiny backbone. Computational complexity is estimated as follows:

Forward Pass Cost: Approximately 754 GFLOPs for input patches of size $2 \times 15 \times 256 \times 256$ .
Training Step Cost: Approximately 2.26 TFLOPs per step.
Total Training Cost: Approximately 0.5-1 PFLOPs for 100K iterations.

Model Card Authors

Eduardo Hirata-Miyasaki (Biohub)

Citation

Eduardo Hirata-Miyasaki, Soorya Pradeep, Ziwen Liu, Alishba Imran, Taylla Milena Theodoro, Ivan E. Ivanov, Sudip Khadka, See-Chi Lee, Michelle Grunberg, Hunter Woosley, Madhura Bhave, Carolina Arias, Shalin B. Mehta. "DynaCLR: Contrastive Learning of Cellular Dynamics with Temporal Regularization." arXiv:2410.11281v2 [cs.CV], 2025.

Primary Contact Email

shalin.mehta@czbiohub.org

To submit feature requests or report issues with the model, please open an issue on the GitHub repository.

System Requirements

GPU-accelerated workstation or cloud GPU instance.

Model Variants

Model Variant	Description	URL
DynaCLR-ALFI	Trained on U2OS cells from ALFI dataset	https://github.com/mehta-lab/viscy
DynaCLR-microglia	Trained on Microglia (IL-17, IF-beta)	https://github.com/mehta-lab/viscy
DynaCLR-DENV-VS+Ph	Trained on Phase + Viral Sensor	https://github.com/mehta-lab/viscy
DynaCLR-DENV-ER+Ph	Trained on Phase + SEC61	https://github.com/mehta-lab/viscy
DynaCLR-DENV-ER	Trained on SEC61 only	https://github.com/mehta-lab/viscy
DynaCLR-Teacher-VS	Teacher model trained on Viral Sensor	https://github.com/mehta-lab/viscy
DynaCLR-Student-Ph	Student model trained on Phase	https://github.com/mehta-lab/viscy

Intended Use

Primary Use Cases

The primary use cases for DynaCLR embeddings include:

Cell State Classification: Robust classification of dynamic states such as cell division and viral infection using sparse annotations.
Organelle Remodeling Analysis: Discovery of organelle responses (e.g., ER condensation) due to perturbations like infection.
Trajectory Alignment: Alignment of asynchronous cellular responses and broken cell tracks using Dynamic Time Warping (DTW) on embeddings.
Cross-Modal Knowledge Distillation: Distilling cell states from fluorescence channels to label-free channels to enable label-free prediction.
Clustering: Clustering heterogeneous cell migration patterns and morphotypes.

Out-of-Scope or Unauthorized Use Cases

Do not use the model for the following purposes:

Clinical Diagnosis: This model is a research tool for basic biology and therapeutic discovery and is not intended for direct clinical diagnostic use without further extensive validation.
Unethical Surveillance: Any use of the model for bio-surveillance or purposes that violate ethical guidelines regarding biological data.
Use that violates applicable laws, regulations (including trade compliance laws), or third-party rights such as privacy or intellectual property rights
Any use that is prohibited by the BSD 3-Clause license or Acceptable Use Policy.

Training Data

The models were trained on five distinct time-lapse datasets:

ALFI: 2D label-free DIC movies of 3 cell types (HeLa, RPE1, U2OS) with 7-minute resolution.
Microglia: 3D label-free movies of pharmacologically perturbed human microglia (IL-17, IF- $β$ ).
Dengue Infection (A549): 5D datasets (3D volumes + time + channels) of A549 cells infected with Dengue virus (MOI 0 and 5).
Infection/Cycle (High Res): 5D datasets representing infection and cell cycle dynamics.
SEC61 Organelle: Two 5D datasets encoding infection and SEC61 (ER marker) at 10-minute and 30-minute resolutions.

Training Procedure

Pre-processing: Data was converted to OME-Zarr format. Phase images were reconstructed and normalized per field-of-view; fluorescence images were scaled to specific percentiles.
Tracking: Single-cell tracks were generated using Ultrack.
Sampling Strategy: The model uses Time-aware contrastive sampling, where positive pairs are images of the same cell at time $t$ and $t + τ$ , and negative pairs are sampled from different cells. This imposes a temporal regularization prior.
Augmentations: Extensive augmentations were applied, including random spatial scaling, rotation, shearing, contrast adjustment, intensity scaling, and Gaussian noise/smoothing.

Training Code

The model architecture, training, and prediction code is available at: https://github.com/mehta-lab/viscy.

Speeds, Sizes, Times

Hardware: Training was performed on an HPC cluster using 2-4 GPUs with distributed data parallel (DDP) strategy.
Training Time: Varied from ~1 hour (ALFI dataset) to ~48 hours (infection and organelle remodeling models).
Throughput: Each training step is estimated at 2.26 TFLOPs.

Training Hyperparameters

Optimizer: AdamW
Learning Rate: $2 \times 1 0^{- 5}$
Batch Size: 128
Loss Functions: NT-Xent loss (temperature 0.3 for ALFI, 0.5 for others) or Triplet loss (margin 0.5)
Time Offset ( $τ$ ): A hyperparameter empirically chosen based on timescales (e.g., adjacent frames or specific intervals like 7-91 mins).

Data Sources

The following datasets were used for training and evaluation:

Performance Metrics

Metrics

The model was evaluated using task-specific and task-agnostic metrics:

Classification F1 Score: Used to measure the accuracy of linear classifiers trained on embeddings for tasks like infection state and cell division.
Smoothness: The ratio of the mean distance between adjacent timepoints to the mean distance between random timepoints; lower values indicate temporally regularized embeddings.
Dynamic Range (DR): The difference between the peaks of embedding distance distributions for random vs. adjacent frame pairs.
Alignment Cost: Evaluated via Dynamic Time Warping (DTW) to align asynchronous trajectories.

Evaluation Datasets

ALFI Test Set: Trained on U2OS cells, tested on unperturbed HeLa and RPE1 cells to evaluate generalization.
Microglia Test Set: Models trained on IL-17/IF- $β$ conditions were tested on hold-out glioblastoma-treated conditions.
Dengue Test Set: Models were tested on independent test data acquired with different microscopes and different time resolutions (10 min vs 30 min).

Evaluation Results

Classification: DynaCLR models achieved F1 scores >98% for cell cycle and infection state classification, comparable to or better than baselines.
Temporal Regularization: Time-aware sampling significantly improved smoothness (0.12 vs. 0.35 for ImageNet) and dynamic range (1.29 vs. 0.8 for ImageNet) compared to baselines on the ALFI dataset.
Generalization: DynaCLR embeddings generalized effectively to out-of-distribution data acquired with diverse imaging systems and cell types.
Annotation Efficiency: Knowledge distillation enabled the generation of 133,214 pseudo-labels from a teacher model trained on only 10,510 annotations, allowing a student model to achieve high accuracy from label-free images.

Biases, Risks, and Limitations

Potential Biases

Data Bias: The models are trained on specific cell lines (HeLa, RPE1, U2OS, A549, Microglia) and may not generalize perfectly to other cell types without fine-tuning.
Annotation Bias: Human annotation of 3D movies is expensive and prone to bias. The model may learn these biases during the supervised validation or distillation phases.

Risks

Misinterpretation of Pseudo-time: Asynchronous cell state dynamics are aligned to a pseudo-time axis; incorrect alignment could lead to false conclusions about the sequence of morphological changes.
Feature Attribution: While interpretable features correlate with embeddings, relying solely on attribution maps without experimental validation carries a risk of biological misinterpretation.

Limitations

Tracking Dependencies: The method relies on single-cell tracking (Ultrack) to generate training patches. Tracking errors can affect input quality, although the method showed robustness to some errors.
Late Infection Stages: The semantic segmentation model used for annotations struggled to capture late infection stages and cell death due to loss of fluorescence signal.
Computational Cost: Training on large 5D datasets is computationally expensive, taking up to 48 hours on HPC clusters.

Caveats and Recommendations

Hyperparameter Tuning: The time offset ( $τ$ ) for contrastive sampling should be empirically chosen based on the time scales of the dynamic process being studied.
Validation: Users should validate embeddings on their specific biological systems using known controls before drawing conclusions about novel phenotypes.
We are committed to advancing the responsible development and use of artificial intelligence. Please follow our Acceptable Use Policy when using the model.
Should you have any security or privacy issues or questions related to the model, please reach out to our team at security@chanzuckerberg.com or privacy@chanzuckerberg.com, respectively.

Acknowledgements

We thank Talon Chandler and Sandra Schmid for critical feedback on the manuscript. The Chan Zuckerberg Initiative funded this research through the Biohub San Francisco.

Get Started with Model