Quickstart: TopCUP
Run TopCUP Model with Pretrained Weights
Estimated time to complete: 5 minutes
Learning Goals
By the end of this quickstart you will be able to:
- Create a copick configuration file for loading the cryoET dataset.
- Run model inference to extract particle locations via the TopCUP CLI.
Prerequisites
python>=3.10(At the time of publication, Colab defaults to Python 3.12)- Standard Google Colab GPU runtime (T4 or better recommended)
Introduction
The Top CryoET U-Net Picker (TopCUP) is a 3D U-Net–based ensemble model designed for particle picking in cryo-electron tomography (cryoET) volumes. It uses a segmentation heatmap approach to identify particle locations. TopCUP is fully integrated with copick — a flexible cryoET dataset API developed at the Chan Zuckerberg Imaging Institute (CZII). This integration makes it easy to apply the model directly to any cryoET dataset in copick format.
For this tutorial, we will use 3 tomograms from the Private Test Dataset (Dataset ID: DS-10446). This dataset is publicly available on the CZ CryoET Data Portal and will be streamed directly using copick and cryoet portal APIs.
Users running TopCUP with their own data will need to create their own copick configuration file with defined pickable objects and metadata parameters described below.
- Inputs: copick configuration file (in this quickstart, we will stream this in)
- Outputs: The model will automatically save the particle picks (locations in Angstrom) as a CSV file inside the specified output directory
Setup
The copick configuration file must define pickable objects (i.e., the protein complexes you want to detect) and three key metadata parameters for each object:
score_weight: weight for each class in the DenseCrossEntropy lossscore_threshold: threshold to filter final picks per class, reducing false positivesscore_weight: weight for each class in the F-beta score evaluation
You can find additional instructions and template configurations for accessing datasets across different platforms from the official copick page.
An example of a copick file is linked here at the model Github.
Installation
First, download the Git repository, which includes necessary packages.
!pip install git+https://github.com/czimaginginstitute/czii_cryoet_mlchallenge_winning_models.gitCopick Configuration File
Since this quickstart uses datasets from the CryoET Data Portal, we can automatically generate a copick configuration file from the copick API, and add metadata for each particle.
The code below adds metadata for the particles and streams in our copick file.
import os, copick
#metadata for pickable objects/particles
metadata = {
"ferritin-complex": {
"score_weight": 1,
"score_threshold": 0.16,
"class_loss_weight": 256
},
"thyroglobulin": {
"score_weight": 2,
"score_threshold": 0.18,
"class_loss_weight": 256
},
"beta-galactosidase": {
"score_weight": 2,
"score_threshold": 0.13,
"class_loss_weight": 256
},
"beta-amylase": {
"score_weight": 0,
"score_threshold": 0.25,
"class_loss_weight": 256
},
"cytosolic-ribosome": {
"score_weight": 1,
"score_threshold": 0.19,
"class_loss_weight": 256
},
"virus-like-capsid": {
"score_weight": 1,
"score_threshold": 0.5,
"class_loss_weight": 256
}
}
#generate the copick file for our selected protein complexes
copick_config_path = os.path.abspath('./copick_config_portal.json')
overlay_path = os.path.abspath('./tmp_overlay')
copick_root = copick.from_czcdp_datasets(
[10446], # ML Challenge private test dataset
overlay_path,
{'auto_mkdir': True}, #overlay_root, self-defined
output_path = copick_config_path,
)
# only consider the 6 particles
config_pickable_objects = []
for p in copick_root.config.pickable_objects:
if p.name in metadata:
p.metadata = metadata[p.name]
config_pickable_objects.append(p)
copick_root.config.pickable_objects = config_pickable_objects
#save the copick config for later use
copick_root.save_config(copick_config_path)Additional Copick Command Options
You can explore dataset-specific options such as run_names, pixelsize, and tomo_type using the copick API.
import copick
# Check available run names, show first 5 tomograms
for run in copick_root.runs[:5]:
pss = [str(vs.voxel_size) for vs in run.voxel_spacings]
ps =','.join(set(pss))
print(f"run name: {run.name}, available voxelsize/pixelsize: {ps} A")Output:
run name: 17803, available voxelsize/pixelsize: 10.012,4.99 A
run name: 17804, available voxelsize/pixelsize: 10.012,4.99 A
run name: 17805, available voxelsize/pixelsize: 10.012,4.99 A
run name: 17806, available voxelsize/pixelsize: 10.012,4.99 A
run name: 17807, available voxelsize/pixelsize: 10.012,4.99 A# Get a single run
run = copick_root.get_run('17803')
voxel_spacing_obj = run.get_voxel_spacing(10.012)
# Check available reconstruction_type
tts = [t.tomo_type for t in voxel_spacing_obj.tomograms]
tt = ','.join(tts)
print(f'run {run.name} has tomogram_type: {tt}')Output:
run 17803 has tomogram_type: wbp-denoised-denoiset-ctfdeconv,wbp-filtered-ctfdeconvRun Model Inference
To explore the available options for running the TopCUP CLI, use the --help flag. In your terminal, run topcup inference --help. This will display all command-line options and arguments for running TopCUP inference, see below:
Usage: topcup inference [OPTIONS]
Options:
-c, --copick_config FILE copick config file path [required]
-ts, --run_names TEXT Tomogram dataset run names
-bs, --batch_size INTEGER batch size for data loader
-p, --pretrained_weights TEXT Pretrained weights file paths (use comma for
multiple paths). Default is None.
-pa, --pattern TEXT The key for pattern matching checkpoints.
Default is *.ckpt
--pixelsize FLOAT Pixelsize in angstrom. Default is 10.0A.
-tt, --tomo_type TEXT
Tomogram type. Default is denoised.
-u, --user_id TEXT Needed for training, the user_id used for
the ground truth picks.
-o, --output_dir TEXT output directory for saving prediction results
(csv).
-g, --gpus INTEGER Number of GPUs for inference. Default is 1.
-gt, --has_ground_truth BOOLEAN
Inference with ground truth annoatations
-h, --help Show this message and exit.Download checkpoints
To run inference, we need to download checkpoints in a local directory.
import urllib.request
import os
from pathlib import Path
TOPCUP_CHECKPOINTS_URL = [
"https://huggingface.co/kevinzhao/TopCUP/resolve/main/topcup_weights/topcup_phantom_6_tomograms.ckpt",
"https://huggingface.co/kevinzhao/TopCUP/resolve/main/topcup_weights/topcup_phantom_12_tomograms.ckpt",
"https://huggingface.co/kevinzhao/TopCUP/resolve/main/topcup_weights/topcup_phantom_24_tomograms.ckpt",
]
# local directory to save the checkpoints
cache = Path("./checkpoints")
cache.mkdir(parents=True, exist_ok=True)
for url in TOPCUP_CHECKPOINTS_URL:
filename = url.split("/")[-1]
dest = cache / filename
if not dest.exists():
print(f"Downloading {filename} ...")
try:
urllib.request.urlretrieve(url, dest)
print(f"→ Saved to {dest}")
except Exception as e:
print(f"Failed to download {url}: {e}")
else:
print(f"Already exists: {dest}")Extract Protein Locations
The following code runs inference for the first 3 tomograms from the Private Testing Dataset using the TopCUP CLI.
# code for running model for inference in Jupyter with live printouts. You can also run the commands directly in a terminal.
from topcup.cli.cli import cli
# Let's do inference for the first 3 tomograms
cli.main(
args=[
"inference",
"-c", f"{copick_config_path}",
"-ts", "17803,17804,17805",
"-p", f"{cache}",
"--pixelsize", "10.012",
"-o", "output/inference",
"-tt", "wbp-denoised-denoiset-ctfdeconv",
"-pa", "*.ckpt",
],
standalone_mode=False, # so click doesn’t exit on exceptions
)Model Outputs
The model will automatically save the particle picks (locations in Angstrom) as a CSV file inside the specified output directory (using the -o flag).
Contact and Acknowledgments
For issues with this quickstart please contact kevin.zhao@czii.org.
Special thank you to Christof Henkel for developing the segmentation models and Utz Ermel for developing copick.
References
- Peck, A., et al., (2025) A Realistic Phantom Dataset for Benchmarking Cryo-ET Data Annotation. Nature Methods. DOI: 10.1101/2024.11.04.621686
Responsible Use
We are committed to advancing the responsible development and use of artificial intelligence. Please follow our Acceptable Use Policy when engaging with our services. Should you have any security or privacy issues or questions related to the services, please reach out to our team at security@chanzuckerberg.com or privacy@chanzuckerberg.com respectively.