Available data
The Simulacrum is a synthetic dataset based on datasets held on the National Disease Registration Service (NDRS) Cancer Analysis System (CAS) at NHS England. The Simulacrum has similar data structure and statistical properties to the real cancer datasets, however, is made up entirely of synthetic patients and does not contain private patient information.
Simulacrum can be used to conduct data exploration, scope research hypotheses and write and develop code that can be run on CAS data to produce analysis outputs. To learn more about the Simulacrum please refer to our Background section.
There are two versions of Simulacrum available for download from the website:
Simulacrum v1.2.0
- This includes patient and tumour information for diagnosis years from 2013-2017 and systemic anti-cancer therapy (e.g. chemotherapy) treatment data.
Simulacrum v2.1.0
- This is the latest release and includes newer diagnosis years from 2016-2019 and additional radiotherapy and genomics testing data.
Typical data items found in the Simulacrum:
- Synthetic patient characteristics, such as age, sex and ethnicity;
- Synthetic tumour characteristics, such as cancer site, staging and pathology information;
- The vital status of each synthetic patient, i.e., whether the patient has died or been censored, which can be used for survival analysis;
- Details about systemic anti-cancer therapy (SACT) treatments that synthetic patients receive, e.g., chemotherapy, such as which drugs were prescribed or the weight of the patient at the start of a regimen;
- Details about radiotherapy treatments, such as which part of the body they were administered to and at what strength/dosages (only in Simulacrum v2);
- Details about somatic genomic tests, such as which tumour genes were tested and test outcomes, i.e., normal vs abnormal result (only in Simulacrum v2).
For a detailed description of all the data fields within each version, please see our Simulacrum v1.2.0 data dictionary and Simulacrum v2.1.0 data dictionary, both downloadable from our Library page.
A breakdown of characteristics of the available Simulacrum v1.2.0 and v2.1.0 releases can be found in the comparison chart below.
Simulacrum v1.2.0 | Simulacrum v2.1.0 | |
Diagnosis years | 2013-2017 | 2016-2019 |
Synthetic patients | 2,200,626 | 1,871,605 |
Synthetic tumours | 2,371,281 | 1,995,570 |
Synthetic patients with SACT | 366,266 | 352,372 |
Synthetic patients with RTDS | -- | 413,169 |
Synthetic patients with genomic testing data | -- | 94,908 |
Years of SACT data | 2012 onwards | 2012-2022 |
Years of RTDS data | -- | 2012-2022 |
Years of genomics data | -- | 2016-2019 |
SACT regimens | 730,472 | 781,389 |
SACT cycles | 2,442,037 | 2,741,674 |
SACT drug administrations | 6,385,828 | 7,662,030 |
RTDS episodes | -- | 656,560 |
RTDS prescriptions | -- | 657,648 |
RTDS exposures | -- | 13,201,531 |
Genomic tests | -- | 255,728 |
Non-melanoma skin cancer diagnoses (C44) | 607,619 | 514,517 |
Breast cancer diagnoses (C50) | 226,406 | 187,204 |
Prostate cancer diagnoses (C61) | 201,785 | 179,478 |
Lung cancer diagnoses (C34) | 169,118 | 156,927 |