Bulk Data Downloads

Human Predictome dataset (Schmid et al., BioRxiv 2025)

Includes:
All human protein-protein interactions (~1.6 million pairs)
If you use this data for your research please cite: Proteome-wide in silico screening for human protein-protein interactions

Download instructions:

Smaller dataset (top 16K by SPOC score):

The smaller dataset contains the top 16,000 protein-protein interactions ranked by SPOC score.

Download Download top 16K dataset (53.21 GB)


Pair scores file:

The pair scores file contains the SPOC scores for all protein-protein interactions in the dataset.

Download Download pair scores (28.4 MB compressed, 116 MB uncompressed)


Full dataset (all 1.6 million pairs, ~4.32 TB):

The full dataset is compressed and split across 100 files. We recommend using wget or curl to download the dataset. The following manifest files contain a list of the URLs for each file and can be used with either tool.

Using wget (resumable):
wget -c -i manifest.txt

Download Download manifest.txt

You can also use the same manifest with your tool of choice to download the dataset.

Using curl (resumable):
curl -fL -C - -K manifest.curl.txt

Download Download manifest.curl.txt


Individual dataset file information:
File Size Index (list of protein pairs in the file)
data_split_00.tar38.0 GBindex_data_split_00.csv
data_split_01.tar41.4 GBindex_data_split_01.csv
data_split_02.tar37.9 GBindex_data_split_02.csv
data_split_03.tar39.9 GBindex_data_split_03.csv
data_split_04.tar37.7 GBindex_data_split_04.csv
data_split_05.tar36.3 GBindex_data_split_05.csv
data_split_06.tar40.7 GBindex_data_split_06.csv
data_split_07.tar41.1 GBindex_data_split_07.csv
data_split_08.tar39.0 GBindex_data_split_08.csv
data_split_09.tar38.3 GBindex_data_split_09.csv
data_split_10.tar47.2 GBindex_data_split_10.csv
data_split_11.tar51.0 GBindex_data_split_11.csv
data_split_12.tar47.5 GBindex_data_split_12.csv
data_split_13.tar49.6 GBindex_data_split_13.csv
data_split_14.tar45.9 GBindex_data_split_14.csv
data_split_15.tar44.7 GBindex_data_split_15.csv
data_split_16.tar50.7 GBindex_data_split_16.csv
data_split_17.tar51.0 GBindex_data_split_17.csv
data_split_18.tar48.6 GBindex_data_split_18.csv
data_split_19.tar47.6 GBindex_data_split_19.csv
data_split_20.tar40.2 GBindex_data_split_20.csv
data_split_21.tar43.6 GBindex_data_split_21.csv
data_split_22.tar40.6 GBindex_data_split_22.csv
data_split_23.tar42.9 GBindex_data_split_23.csv
data_split_24.tar40.1 GBindex_data_split_24.csv
data_split_25.tar38.3 GBindex_data_split_25.csv
data_split_26.tar43.9 GBindex_data_split_26.csv
data_split_27.tar43.5 GBindex_data_split_27.csv
data_split_28.tar41.7 GBindex_data_split_28.csv
data_split_29.tar40.9 GBindex_data_split_29.csv
data_split_30.tar43.0 GBindex_data_split_30.csv
data_split_31.tar46.9 GBindex_data_split_31.csv
data_split_32.tar43.1 GBindex_data_split_32.csv
data_split_33.tar45.7 GBindex_data_split_33.csv
data_split_34.tar43.0 GBindex_data_split_34.csv
data_split_35.tar41.4 GBindex_data_split_35.csv
data_split_36.tar46.4 GBindex_data_split_36.csv
data_split_37.tar46.7 GBindex_data_split_37.csv
data_split_38.tar45.0 GBindex_data_split_38.csv
data_split_39.tar42.7 GBindex_data_split_39.csv
data_split_40.tar38.7 GBindex_data_split_40.csv
data_split_41.tar42.2 GBindex_data_split_41.csv
data_split_42.tar39.1 GBindex_data_split_42.csv
data_split_43.tar41.6 GBindex_data_split_43.csv
data_split_44.tar38.6 GBindex_data_split_44.csv
data_split_45.tar37.8 GBindex_data_split_45.csv
data_split_46.tar41.9 GBindex_data_split_46.csv
data_split_47.tar42.6 GBindex_data_split_47.csv
data_split_48.tar40.5 GBindex_data_split_48.csv
data_split_49.tar39.2 GBindex_data_split_49.csv
data_split_50.tar37.0 GBindex_data_split_50.csv
data_split_51.tar40.2 GBindex_data_split_51.csv
data_split_52.tar37.1 GBindex_data_split_52.csv
data_split_53.tar39.2 GBindex_data_split_53.csv
data_split_54.tar37.2 GBindex_data_split_54.csv
data_split_55.tar35.9 GBindex_data_split_55.csv
data_split_56.tar39.8 GBindex_data_split_56.csv
data_split_57.tar40.5 GBindex_data_split_57.csv
data_split_58.tar38.8 GBindex_data_split_58.csv
data_split_59.tar37.5 GBindex_data_split_59.csv
data_split_60.tar44.4 GBindex_data_split_60.csv
data_split_61.tar48.1 GBindex_data_split_61.csv
data_split_62.tar44.4 GBindex_data_split_62.csv
data_split_63.tar47.2 GBindex_data_split_63.csv
data_split_64.tar43.3 GBindex_data_split_64.csv
data_split_65.tar41.9 GBindex_data_split_65.csv
data_split_66.tar47.0 GBindex_data_split_66.csv
data_split_67.tar48.2 GBindex_data_split_67.csv
data_split_68.tar46.0 GBindex_data_split_68.csv
data_split_69.tar45.3 GBindex_data_split_69.csv
data_split_70.tar44.0 GBindex_data_split_70.csv
data_split_71.tar48.1 GBindex_data_split_71.csv
data_split_72.tar44.9 GBindex_data_split_72.csv
data_split_73.tar46.8 GBindex_data_split_73.csv
data_split_74.tar42.7 GBindex_data_split_74.csv
data_split_75.tar42.7 GBindex_data_split_75.csv
data_split_76.tar47.5 GBindex_data_split_76.csv
data_split_77.tar47.3 GBindex_data_split_77.csv
data_split_78.tar46.7 GBindex_data_split_78.csv
data_split_79.tar44.7 GBindex_data_split_79.csv
data_split_80.tar45.9 GBindex_data_split_80.csv
data_split_81.tar48.6 GBindex_data_split_81.csv
data_split_82.tar45.1 GBindex_data_split_82.csv
data_split_83.tar46.8 GBindex_data_split_83.csv
data_split_84.tar43.9 GBindex_data_split_84.csv
data_split_85.tar42.8 GBindex_data_split_85.csv
data_split_86.tar47.8 GBindex_data_split_86.csv
data_split_87.tar48.4 GBindex_data_split_87.csv
data_split_88.tar47.8 GBindex_data_split_88.csv
data_split_89.tar45.2 GBindex_data_split_89.csv
data_split_90.tar41.8 GBindex_data_split_90.csv
data_split_91.tar45.3 GBindex_data_split_91.csv
data_split_92.tar42.4 GBindex_data_split_92.csv
data_split_93.tar44.1 GBindex_data_split_93.csv
data_split_94.tar40.9 GBindex_data_split_94.csv
data_split_95.tar40.0 GBindex_data_split_95.csv
data_split_96.tar44.4 GBindex_data_split_96.csv
data_split_97.tar45.0 GBindex_data_split_97.csv
data_split_98.tar43.1 GBindex_data_split_98.csv
data_split_99.tar41.6 GBindex_data_split_99.csv
Show all 100 files

Predictomes paper associated data (Schmid and Walter, Mol Cell 2025)

Includes:
All genome maintanence pairs (~40,000 pairs)
3 proteome wide screens (DONSON, STK19, USP37) (~60,000 pairs)
All SPOC training/testing pairs (~50,000 pairs)
All 30 ranking experiment datasets (~30,000 pairs)
If you use this data for your research please cite: Predictomes, a classifier-curated database of AlphaFold-modeled protein-protein interactions