Bulk Data Downloads

Human Predictome dataset (Schmid et al., BioRxiv 2025)

Includes:
All human protein-protein interactions (~1.6 million pairs)
If you use this data for your research please cite: Proteome-wide in silico screening for human protein-protein interactions

Download instructions:

Smaller dataset (top 16K by SPOC score):

The smaller dataset contains the top 16,000 protein-protein interactions ranked by SPOC score.

Download Download top 16K dataset

Full dataset (all 1.6 million pairs):

The full dataset is compressed and split across 100 files. We recommend using wget or curl to download the dataset. The following manifest files contain a list of the URLs for each file and can be used with either tool.

Using wget (resumable):
wget -c -i manifest.txt

Download Download manifest.txt

You can also use the same manifest with your tool of choice to download the dataset.

Using curl (resumable):
curl -fL -C - -K manifest.curl.txt

Download Download manifest.curl.txt

Predictomes paper associated data (Schmid and Walter, Mol Cell 2025)

Includes:
All genome maintanence pairs (~40,000 pairs)
3 proteome wide screens (DONSON, STK19, USP37) (~60,000 pairs)
All SPOC training/testing pairs (~50,000 pairs)
All 30 ranking experiment datasets (~30,000 pairs)
If you use this data for your research please cite: Predictomes, a classifier-curated database of AlphaFold-modeled protein-protein interactions