Quickstart: TopCUP

Run TopCUP Model with Pretrained Weights

Estimated time to complete: 5 minutes

Learning Goals

By the end of this quickstart you will be able to:

Create a copick configuration file for loading the cryoET dataset.
Run model inference to extract particle locations via the TopCUP CLI.

Prerequisites

python>=3.10 (At the time of publication, Colab defaults to Python 3.12)
Standard Google Colab GPU runtime (T4 or better recommended)

Introduction

The Top CryoET U-Net Picker (TopCUP) is a 3D U-Net–based ensemble model designed for particle picking in cryo-electron tomography (cryoET) volumes. It uses a segmentation heatmap approach to identify particle locations. TopCUP is fully integrated with copick — a flexible cryoET dataset API developed at the Chan Zuckerberg Imaging Institute (CZII). This integration makes it easy to apply the model directly to any cryoET dataset in copick format.

For this tutorial, we will use 3 tomograms from the Private Test Dataset (Dataset ID: DS-10446). This dataset is publicly available on the CZ CryoET Data Portal and will be streamed directly using copick and cryoet portal APIs.

Users running TopCUP with their own data will need to create their own copick configuration file with defined pickable objects and metadata parameters described below.

Inputs: copick configuration file (in this quickstart, we will stream this in)
Outputs: The model will automatically save the particle picks (locations in Angstrom) as a CSV file inside the specified output directory

Setup

The copick configuration file must define pickable objects (i.e., the protein complexes you want to detect) and three key metadata parameters for each object:

score_weight: weight for each class in the DenseCrossEntropy loss
score_threshold: threshold to filter final picks per class, reducing false positives
score_weight: weight for each class in the F-beta score evaluation

You can find additional instructions and template configurations for accessing datasets across different platforms from the official copick page.

An example of a copick file is linked here at the model Github.

Installation

First, download the Git repository, which includes necessary packages.

!pip install git+https://github.com/czimaginginstitute/czii_cryoet_mlchallenge_winning_models.git

Copick Configuration File

Since this quickstart uses datasets from the CryoET Data Portal, we can automatically generate a copick configuration file from the copick API, and add metadata for each particle.

The code below adds metadata for the particles and streams in our copick file.

import os, copick

#metadata for pickable objects/particles
metadata = {
    "ferritin-complex": {
        "score_weight": 1,
        "score_threshold": 0.16,
        "class_loss_weight": 256
    },
    "thyroglobulin": {
        "score_weight": 2,
        "score_threshold": 0.18,
        "class_loss_weight": 256
    },
    "beta-galactosidase": {
        "score_weight": 2,
        "score_threshold": 0.13,
        "class_loss_weight": 256
    },
    "beta-amylase": {
        "score_weight": 0,
        "score_threshold": 0.25,
        "class_loss_weight": 256
    },
    "cytosolic-ribosome": {
        "score_weight": 1,
        "score_threshold": 0.19,
        "class_loss_weight": 256
    },
    "virus-like-capsid": {
        "score_weight": 1,
        "score_threshold": 0.5,
        "class_loss_weight": 256
    }
}

#generate the copick file for our selected protein complexes
copick_config_path = os.path.abspath('./copick_config_portal.json')
overlay_path = os.path.abspath('./tmp_overlay')
copick_root = copick.from_czcdp_datasets(
    [10446], # ML Challenge private test dataset
    overlay_path,
    {'auto_mkdir': True}, #overlay_root, self-defined
    output_path = copick_config_path,
)

# only consider the 6 particles
config_pickable_objects = []
for p in copick_root.config.pickable_objects:
    if p.name in metadata:
        p.metadata = metadata[p.name]
        config_pickable_objects.append(p)

copick_root.config.pickable_objects = config_pickable_objects
#save the copick config for later use
copick_root.save_config(copick_config_path)

Additional Copick Command Options

You can explore dataset-specific options such as run_names, pixelsize, and tomo_type using the copick API.

import copick

# Check available run names, show first 5 tomograms
for run in copick_root.runs[:5]:
    pss = [str(vs.voxel_size) for vs in run.voxel_spacings]
    ps =','.join(set(pss))
    print(f"run name: {run.name}, available voxelsize/pixelsize: {ps} A")

Output:

    run name: 17803, available voxelsize/pixelsize: 10.012,4.99 A
    run name: 17804, available voxelsize/pixelsize: 10.012,4.99 A
    run name: 17805, available voxelsize/pixelsize: 10.012,4.99 A
    run name: 17806, available voxelsize/pixelsize: 10.012,4.99 A
    run name: 17807, available voxelsize/pixelsize: 10.012,4.99 A

# Get a single run
run = copick_root.get_run('17803')
voxel_spacing_obj = run.get_voxel_spacing(10.012)

# Check available reconstruction_type
tts = [t.tomo_type for t in voxel_spacing_obj.tomograms]
tt = ','.join(tts)
print(f'run {run.name} has tomogram_type: {tt}')

Output:
    run 17803 has tomogram_type: wbp-denoised-denoiset-ctfdeconv,wbp-filtered-ctfdeconv

Run Model Inference

To explore the available options for running the TopCUP CLI, use the --help flag. In your terminal, run topcup inference --help. This will display all command-line options and arguments for running TopCUP inference, see below:

Usage: topcup inference [OPTIONS]

Options:
  -c, --copick_config FILE        copick config file path  [required]
  -ts, --run_names TEXT           Tomogram dataset run names
  -bs, --batch_size INTEGER       batch size for data loader
  -p, --pretrained_weights TEXT   Pretrained weights file paths (use comma for
                                  multiple paths). Default is None.
  -pa, --pattern TEXT             The key for pattern matching checkpoints.
                                  Default is *.ckpt
  --pixelsize FLOAT               Pixelsize in angstrom. Default is 10.0A.
  -tt, --tomo_type TEXT
                                  Tomogram type. Default is denoised.
  -u, --user_id TEXT              Needed for training, the user_id used for
                                  the ground truth picks.
  -o, --output_dir TEXT           output directory for saving prediction results
                                  (csv).
  -g, --gpus INTEGER              Number of GPUs for inference. Default is 1.
  -gt, --has_ground_truth BOOLEAN
                                  Inference with ground truth annoatations
  -h, --help                      Show this message and exit.

Download checkpoints

To run inference, we need to download checkpoints in a local directory.

import urllib.request
import os
from pathlib import Path

TOPCUP_CHECKPOINTS_URL = [
    "https://huggingface.co/kevinzhao/TopCUP/resolve/main/topcup_weights/topcup_phantom_6_tomograms.ckpt",
    "https://huggingface.co/kevinzhao/TopCUP/resolve/main/topcup_weights/topcup_phantom_12_tomograms.ckpt",
    "https://huggingface.co/kevinzhao/TopCUP/resolve/main/topcup_weights/topcup_phantom_24_tomograms.ckpt",
]

# local directory to save the checkpoints
cache = Path("./checkpoints")
cache.mkdir(parents=True, exist_ok=True)

for url in TOPCUP_CHECKPOINTS_URL:
    filename = url.split("/")[-1]
    dest = cache / filename
    if not dest.exists():
        print(f"Downloading {filename} ...")
        try:
            urllib.request.urlretrieve(url, dest)
            print(f"→ Saved to {dest}")
        except Exception as e:
            print(f"Failed to download {url}: {e}")
    else:
        print(f"Already exists: {dest}")

Extract Protein Locations

The following code runs inference for the first 3 tomograms from the Private Testing Dataset using the TopCUP CLI.

# code for running model for inference in Jupyter with live printouts. You can also run the commands directly in a terminal.

from topcup.cli.cli import cli

# Let's do inference for the first 3 tomograms
cli.main(
    args=[
        "inference",
        "-c", f"{copick_config_path}",
        "-ts", "17803,17804,17805",
        "-p", f"{cache}",
        "--pixelsize", "10.012",
        "-o", "output/inference",
        "-tt", "wbp-denoised-denoiset-ctfdeconv",
        "-pa", "*.ckpt",
    ],
    standalone_mode=False,  # so click doesn’t exit on exceptions
)

Model Outputs

The model will automatically save the particle picks (locations in Angstrom) as a CSV file inside the specified output directory (using the -o flag).

Contact and Acknowledgments

For issues with this quickstart please contact kevin.zhao@czii.org.

Special thank you to Christof Henkel for developing the segmentation models and Utz Ermel for developing copick.

References

Peck, A., et al., (2025) A Realistic Phantom Dataset for Benchmarking Cryo-ET Data Annotation. Nature Methods. DOI: 10.1101/2024.11.04.621686

Responsible Use

We are committed to advancing the responsible development and use of artificial intelligence. Please follow our Acceptable Use Policy when engaging with our services. Should you have any security or privacy issues or questions related to the services, please reach out to our team at security@chanzuckerberg.com or privacy@chanzuckerberg.com respectively.