Try Models

DecoderTCR Quickstart

Estimated time to complete: 15 minutes

Learning Goals

This notebook demonstrates how to use DecoderTCR for:

  1. TCR-pMHC interaction prediction
  2. pMHC binding prediction

Prerequisites

  • Python>=3.8
  • GPU compute

Setup

Installation

First, clone the repository and navigate to it, then install necessary packages below.

cd <your_path>/DecoderTCR
pip install -e .
Set the environment variable for model caching (optional):

```python
import os
os.environ['TORCH_HUB_DIR'] = '<cache path>'  # Set your cache directory

1. TCR-pMHC Interaction Prediction

The below step allows you to predict TCR-pMHC binding using interaction scores (comparing TCR+pMHC vs pMHC alone).

from DecoderTCR.utils.predict_TpM import load_model, predict_single

# Load model from checkpoint
checkpoint_path = '<path to checkpoint>'  # Set your checkpoint path
model = load_model(checkpoint_path=checkpoint_path, device='cuda:0')
import torch

# Example sample
sample = {
    'HLA_seq': 'GSHSMRYFFTSVSRPGRGEPRFIAVGYVDDTQFVRFDSDAASQRMEPRAPWIEQEGPEYWDGETRKVKAHSQTHRVDLGTLRGYYNQSEAGSHTVQRMYGCDVGSDWRFLRGYHQYAYDGKDYIALKEDLRSWTAADMAAQTTKHKWEAAHVAEQLRAYLEGTCVEWLRRYLENGKETLQRTDAPKTHMTHHAVSDHEATLRCWALSFYPAEITLTWQRDGEDQTQDTELVETRPAGDGTFQKWAAVVVPSGQEQRYTCHVQHEGLPKPLTLRWEPSSQPTIPIVGIIAGLVLFGAVITGAVVAAVMWRRKSSDRKGGSYSQAASSDSAQGSDVSLTACKVMIQRTPKIQVYSRHPAENGKSNFLNCYVSGFHPSDIEVDLLKNGERIEKVEHSDLSFSKDWSFYLLYYTEFTPTEKDEYACRVNHVTLSQPKIVKWDRDM',
    'epitope': 'YLQPRTFLL',
    'TCR_seq': 'MISLRVLLVILWLQLSWVWSQRKEVEQDPGPFNVPEGATVAFNCTYSNSASQSFFWYRQDCRKEPKLLMSVYSSGNEDGRFTAQLNRASQYISLLIRDSKLSDSATYLCVVNIDTDKLIFGTGTRLQVFPNIQNPDPAVYQLRDSKSSDKSVCLFTDFDSQTNVSQSKDSDVYITDKTVLDMRSMDFKSNSAVAWSNKSDFACANAFNNSIIPEDTFFPSPESSCDVKLVEKSFETDTNLNFQNLSVIGFRILLLKVAGFNLLMTLRLWSSMISLRVLLVILWLQLSWVWSQRKEVEQDPGPFNVPEGATVAFNCTYSNSASQSFFWYRQDCRKEPKLLMSVYSSGNEDGRFTAQLNRASQYISLLIRDSKLSDSATYLCVVNIDTDKLIFGTGTRLQVFPNIQNPDPAVYQLRDSKSSDKSVCLFTDFDSQTNVSQSKDSDVYITDKTVLDMRSMDFKSNSAVAWSNKSDFACANAFNNSIIPEDTFFPSPESSCDVKLVEKSFETDTNLNFQNLSVIGFRILLLKVAGFNLLMTLRLWSSMDTWLVCWAIFSLLKAGLTEPEVTQTPSHQVTQMGQEVILRCVPISNHLYFYWYRQILGQKVEFLVSFYNNEISEKSEIFDDQFSVERPDGSNFTLKIRSTKLEDSAMYFCATGGDHNTGELFFGEGSRLTVLEDLNKVFPPEVAVFEPSEAEISHTQKATLVCLATGFFPDHVELSWWVNGKEVHSGVSTDPQPLKEQPALNDSRYCLSSRLRVSATFWQNPRNHFRCQVQFYGLSENDEWTQDRAKPVTQIVSAEAWGRADCGFTSVSYQQGVLSATILYEILLGKATLYAVLVSALVLMAMVKRKDF'
}

# Predict
with torch.no_grad():
    score = predict_single(model, sample, device='cuda:0')

print(f"TCR-pMHC Interaction Score: {score:.4f}")

2. pMHC Binding Prediction

Find score epitope-HLA bindings using span pseudo-likelihood (no TCR required).

from DecoderTCR.utils.predict_pMHC import load_model as load_model_pMHC, predict_single as predict_single_pMHC

# Load model (can reuse the same checkpoint)
model_pMHC = load_model_pMHC(checkpoint_path=checkpoint_path, device='cuda:0')
# Example sample (no TCR needed)
sample_pMHC = {
    'HLA_seq': 'GSHSMRYFFTSVSRPGRGEPRFIAVGYVDDTQFVRFDSDAASQRMEPRAPWIEQEGPEYWDGETRKVKAHSQTHRVDLGTLRGYYNQSEAGSHTVQRMYGCDVGSDWRFLRGYHQYAYDGKDYIALKEDLRSWTAADMAAQTTKHKWEAAHVAEQLRAYLEGTCVEWLRRYLENGKETLQRTDAPKTHMTHHAVSDHEATLRCWALSFYPAEITLTWQRDGEDQTQDTELVETRPAGDGTFQKWAAVVVPSGQEQRYTCHVQHEGLPKPLTLRWE',
    'epitope': 'GILGFVFTL'
}

# Predict
with torch.no_grad():
    score_pMHC = predict_single_pMHC(model_pMHC, sample_pMHC, device='cuda:0')

print(f"pMHC Binding Score (Pseudo-likelihood): {score_pMHC:.4f}")

3. pMHC embeddings

To attain sequence level embeddings from DecoderTCR:

from DecoderTCR.model.DecoderTCR import DecoderTCRModel

checkpoint_path = '<path to checkpoint>'  # Set your checkpoint path
model = DecoderTCRModel.load_from_checkpoint(checkpoint_path = checkpoint_path, base_model = 'ESM2_3B')
from DecoderTCR.utils.tokenizer import tokenize


sample_pMHC = {
    'HLA_seq': 'GSHSMRYFFTSVSRPGRGEPRFIAVGYVDDTQFVRFDSDAASQRMEPRAPWIEQEGPEYWDGETRKVKAHSQTHRVDLGTLRGYYNQSEAGSHTVQRMYGCDVGSDWRFLRGYHQYAYDGKDYIALKEDLRSWTAADMAAQTTKHKWEAAHVAEQLRAYLEGTCVEWLRRYLENGKETLQRTDAPKTHMTHHAVSDHEATLRCWALSFYPAEITLTWQRDGEDQTQDTELVETRPAGDGTFQKWAAVVVPSGQEQRYTCHVQHEGLPKPLTLRWE',
    'epitope': 'GILGFVFTL'
}

input_token = tokenize(sample_pMHC['HLA_seq'] + sample_pMHC['epitope']).cuda(0)
embed = model.get_embeddings(input_token)
print(embed.size())

Contact and Acknowledgments

For issues with this quickstart please contact: Ben Lai benlai@chanzuckerberg.com.

Responsible Use

We are committed to advancing the responsible development and use of artificial intelligence. Please follow our Acceptable Use Policy when engaging with our services.

Should you have any security or privacy issues or questions related to the services, please reach out to our team at security@chanzuckerberg.com or privacy@chanzuckerberg.com respectively.