SABER

Version v1.0.0 released 14 Nov 2025

License

Repository

The Segment Anything Based Expert Recognition (SABER) model is trained to identify vesicles in cryo-electron tomography (cryoET) and microscopy (EM) datasets. The model enables segmentation directly from video-based training translated into effective 3D tomogram analysis, enabling users to utilize zero-shot inference and enhance prediction accuracy through data-driven training.

Developed By

Jonathan Schwartz¹

1 Chan Zuckerberg Imaging Institute

Get Started with Model

Model Details

Model Architecture

This model is based on SABER (SAM2-Augmented Biological Entity Recognition), a framework that adapts Meta's Segment Anything Model 2 (SAM2) for cryo-electron tomography segmentation tasks. The architecture leverages SAM2's transformer-based image encoder with position encodings, a prompt encoder, and a lightweight mask decoder, fine-tuned specifically for vesicle segmentation in cryoET data.

Parameters

SAM2-Large base model (~300M parameters) with 1,027,529 trainable parameters fine-tuned for vesicle segmentation

Model Card Authors

Jonathan Schwartz (CZII)

Primary Contact Email

Jonathan Schwartz jonathan.schwartz@czii.org

To submit feature requests or report issues with the model, please open an issue on the GitHub repository.

System Requirements

Requires CUDA-capable GPU such as T4 or better.

Intended Use

Primary Use Cases

This model is designed for the following cryoET applications:

Vesicle segmentation in cryoET tomograms and 2D micrographs.
2D slice annotation for rapid screening and quality control.
3D volume segmentation for quantitative analysis of vesicle populations.
High-throughput organelle detection in cellular cryoET data.

The model is optimized for cryoET data from cellular samples and can be integrated into analysis pipelines using the copick data format.

Out-of-Scope or Unauthorized Use Cases

Do not use the model for the following purposes:

Use that violates applicable laws, regulations (including trade compliance laws), or third party rights such as privacy or intellectual property rights.
Any use that is prohibited by the MIT license.
Any use that is prohibited by the Acceptable Use Policy.

Training Data

This model was trained on the Affinity-Captured Endo-/Lysosomes dataset (ID: DS-10444) from the CryoET Data Portal. The full dataset contains 327 annotations and 362 tomograms of affinity-captured LAMP1-GFP positive endosome and lysosome organelles from HEK293T cells.

Out of the entire dataset, 40 tomograms were used for training and 10 were held out for validations to train a claissifer that recognizes veiscles.

Training Procedure

The model is trained on volumetric patches extracted from tomograms and centered on annotated particles. Data augmentation includes random 3D rotations, intensity scaling, and additive Gaussian noise to improve generalization. Validation is performed using sliding window inference to handle full-resolution volumes. Training employs Exponential Moving Average (EMA) with a cosine annealing learning rate scheduler. The model is optimized to maximize class-averaged F-beta scores tracked throughout training. Early stopping monitors both training loss stability and validation metrics, with the best-performing model checkpoint saved based on the target metric.

Training Code

Training scripts available at: https://github.com/chanzuckerberg/saber

Speeds, Sizes, Times

Model Checkpoint Size: 4.1 MB
Inference Speed: 500 tomograms per hour.

Training Hyperparameters

Optimization:

Optimizer: AdamW
Learning rate scheduler: CosineAnnealingLR
Loss function: FocalLoss
Batch size: 32
Number of epochs: 100
Model selection metric: F1 score (best validation F1)

Data Sources

The following datasets were used for training and evaluation:

Affinity-Captured Endo-/Lysosomes

Performance Metrics

Metrics

The model was evaluated using standard segmentation metrics appropriate for 3D biological image analysis, including the F1 score. The best model checkpoint was selected based on the highest F1 score achieved during validation across 100 training epochs.

F1 Score (Primary metric): Harmonic mean of precision and recall, used for model selection during training.

Evaluation Datasets

Validation set: 10 tomograms held out from the Affinity-Captured Endo-/Lysosomes dataset (ID: DS-10444).

Evaluation Results

Metric	Training set	Validation set
F1 Score	0.83	0.87
Precision	0.84	0.86
Recall	0.82	0.88

Biases, Risks, and Limitations

Potential Biases

The model may reflect biases present in the training data, including:

Vesicle appearance and size distributions specific to the cell types in the Affinity-Captured Endo-/Lysosomes dataset.
Imaging parameters (defocus, dose, tilt angles) from the training dataset.
Certain vesicle morphologies (e.g., highly elongated or irregular) may be underrepresented in training data.

Risks

Areas of risk may include but are not limited to:

Inaccurate segmentations: Model may produce false positives or miss vesicles, particularly in challenging imaging conditions (high noise, crowding, atypical morphology).
Misinterpretation of biological structures: Non-vesicular structures with similar appearance may be incorrectly segmented as vesicles.
Domain shift: Performance degradation on data from different microscopes, sample types, or imaging protocols.

Limitations

The model is trained for binary vesicle segmentation only and has not been validated for other organelles or multi-class segmentation without retraining.
Performance depends on tomogram quality (resolution, contrast, noise level).

Caveats and Recommendations

Review and validate outputs generated by the model.
We are committed to advancing the responsible development and use of artificial intelligence. Please follow our Acceptable Use Policy when using the model.
Should you have any security or privacy issues or questions related to the services, please reach out to our team at security@chanzuckerberg.com or privacy@chanzuckerberg.com respectively.

Acknowledgements

This work was conducted at the Chan Zuckerberg Initiative. We thank the CryoET Data Portal team for providing high-quality training data.

Get Started with Model