BPD (Biological Particle Detector)
Version v1.0 released 01 Apr 2025
License
Apache 2.0BPD is a computer vision model designed to localize proteins in 3D cellular images obtained through cryo-electron tomography (cryoET). It is based on a U-Net architecture and trained on experimental cryoET runs provided by the CZII CryoET Object Identification Kaggle competition. BPD represents the fifth-place solution of the competition.
Developed By
Model Details
Model Architecture
BPD uses a 3D U-Net architecture with 2 downsampling and 2 upsampling levels. Each level processes features through 28, 32, and 36 channels. Before each resolution change, a block of two 3D convolutions (each followed by BatchNorm and ReLU) extracts features, with trilinear interpolation handling upsampling and downsampling.
Note: The final model consists of an ensemble of 4 identical 3D U-Nets (as described above) trained with different random seeds.
Parameters
350K x 4
Citation
Peck, A., et al., (2025) A Realistic Phantom Dataset for Benchmarking Cryo-ET Data Annotation. Nature Methods. DOI: 10.1101/2024.11.04.621686
Primary Contact Email
Youssef Ouertani ouertaniyoussef@yahoo.fr
To submit feature requests or report issues with the model, please open an issue on the GitHub repository.
System Requirements
The algorithm needs an Nvidia GPU and CUDA to run at reasonable speed (in particular for training). The model was trained on a GPU P100. For running on other GPUs, some parameter values (e.g., patch and batch sizes) may need to be changed to adapt to available memory.
Intended Use
Primary Use Cases
Localization of protein complexes within tomograms
Out-of-Scope or Unauthorized Use Cases
Do not use the model for the following purposes:
- Use that violates applicable laws or regulations (including trade compliance laws), or third party rights such as privacy or intellectual property rights.
- Any use that is prohibited by the Apache-2.0 license.
- Any use that is prohibited by the Acceptable Use Policy.
Training Details
Training Data
The training data was provided by the Chan Zuckerberg Imaging Institute (CZII) and included seven experimental runs with ground truth annotations for six protein complexes (apo-ferritin, beta-amylase, beta-galactosidase, cytosolic ribosomes, thyroglobulin and virus like particle).
Training Procedure
The model was trained on 3D tomogram volumes with spherical labels (radius = log2(given_radius)*0.8), normalized using min-max scaling based on averaged (5, 99) percentiles across all 7 tomograms. Each epoch consisted of 1024 randomly sampled 128×128×128 patches (batches of 4) with data augmentation including flipping, z-axis rotations (90°/180°/270°), and ±3% intensity shifts. Training ran for 35 epochs (4 hours total) using Adam (lr=0.0001, β₁=0.9, β₂=0.999) with fp16 mixed precision, gradient clipping, and label-smoothed cross-entropy (smoothing=0.01).
Training Code
Kaggle Competition NotebookData Sources
Training data is available through the CZII - CryoET Object Identification Challenge deposition site.
Performance Metrics
Metrics
The model was evaluated by calculating the F-beta metric with a beta value of 4. The F-beta metric with a beta value of 4 is used to prioritize recall over precision, heavily penalizing missed particles while being more lenient on false positives. In this context, a particle is considered "true" if it lies within a factor of 0.5 of the particle of interest's radius. There are five particles of interest, with three "easy" particles (ribosome, virus-like particles, and apo-ferritin) assigned a weight of 1 and two "hard" particles (thyroglobulin and β-galactosidase) assigned a weight of 2. The results are micro-averaged across multiple tomograms, ensuring that precision and recall are computed across the entire dataset before applying the F-beta formula. The higher beta value (4) and particle weights emphasize the correct identification of particles, particularly the "hard" ones, making recall the dominant factor in evaluating performance.
Evaluation Datasets
The evaluation datasets included public and private test datasets found in the CryoET Data Portal deposition site for the CZII CryoET Object Identification Kaggle competition. The public and private test datasets contain 121 and 364 experimental runs, respectively.
Evaluation Results
Public Score | Private Score |
|---|---|
| 0.77982 | 0.78252 |
Biases, Risks, and Limitations
Potential Biases
- The model was trained on five particle types and won’t work with particles not present in the training data.
Risks
Areas of risk may include but are not limited to:
- Inaccurate outputs or hallucinations
- Incorrect prediction
Limitations
- The model's performance may be limited by the size of the training set.
Caveats and Recommendations
- Review and validate outputs generated by the model.
- We are committed to advancing the responsible development and use of artificial intelligence. Please follow our Acceptable Use Policy when using the model.
Acknowledgements
This research is supported by the Chan Zuckerberg Imaging Institute.
If you have recommendations for this model card please contact virtualcellmodels@chanzuckerberg.com.