The starting point for this data set was 200,809 cifs extracted from ICSD on 7 September 2021 using this script https://github.com/lrcfmd/ICSDClient All structures are from EXPERIMENTAL_INORGANIC subset of ICSD. 94,783 structures contained an atoms with occupancy less than one and were identified as disordered 247 .cif files were unreadable for the ASE and through an exception 105,779 .cif files were processed to calculate pair histograms This produced 728,436 files with pair histograms for 3,388 unique atom pairs. The number of structures that contained a given atom pair is given in ICSD_co-occurence_matrix.xlsx For example, Helium was found in 12 out of 105,779 structures and only in one of these 12 Hydrogen was also present. Zipped folder SPP.zip contains 3,388 subfolders with the data for each identified atom pair. For example, the Li-O folder contains six files: Li-O.ref - the list of 2,384 reference structures that were used to build the Li-O SPP (no filtering) Li-O.av - the simple average of all histogram data for all reference structures before any smoothing is applied. These are the files needed to recompute the potentials using fit_potential.py. Li-O.rdf - the average RDF obtained from the histogram data in Li-O.av by applying Gaussian smoothing with sigma=0.1 A and normalising the data by (4 pi r^2 dr). Li-O.pot - the final SPP produced form RDF in Li-O.rdf plot.gnu - the gnuplot file that plots raw ICSD data as an RDF, Gaussian smoothed RDF data and SPP converted back to RDF. Li-O.png - the graph produced from gnu.gnu