About the project
What is the Simulacrum?
Also because we have kept the data model the same as the real one in PHE, the Simulacrum can be used to write and test queries that (with the right permissions and ethical approval) could be run on the real data.
The Simulacrum was developed by Health Data Insight, with support from AstraZeneca and IQVIA.
Why was the Simulacrum developed?
The Simulacrum has been created to protect patient confidentiality but at the same time make it possible for anyone who needs to ask questions on cancer data to do so.
Why do researchers want to access the data held by NCRAS?
How should I use the Simulacrum?
Download the Simulacrum and query the data to conduct your research. The more complex the queries, the more approximate the results. The Simulacrum data is synthetic and therefore not completely accurate so is not suitable for clinical decisions. You can request to have your queries run on the real NCRAS data. Get in touch with simulacrumenquiries@phe.gov.uk to find out more.
Are there plans to expand the Simulacrum?
Protecting patient confidentiality
The Simulacrum was built to facilitate research based on data held by the National Cancer Registration and Analysis Service in PHE while protecting patient confidentiality. By using synthetic data in place of the real data, researchers can work with the data that has the look and feel of the real data, and maintains the same data model – without any risk to patient confidentiality.
How can you be sure that you cannot identify a real patient in the Simulacrum?
If I do not want my data to be included in the Simulacrum, what can I do about this?
The team building the Simulacrum only ever used anonymous data that was provided with the approval of PHE’s Office for Data Release and was made from data pooling more than 50 similar cases. Because the original data is completely anonymous PHE has released examples of the original data used to build the Simulacrum on data.gov.uk. The synthetic data in the Simulacrum therefore has no real patient data in it – and even the synthetic patients we have created will not mimic any one individual.
However, if you are a cancer patient, and you do not wish for your data to be used in PHE’s National Cancer Registration and Analysis Service, you can ask PHE to remove all of your details from the cancer registry at any time. This will not affect your treatment or care. For details of how to opt out, please visit: https://www.ndrs.nhs.uk/national-disease-registration-service/patients/opting-out/
Did AstraZeneca and IQVIA see any individual patient data during the development of the Simulacrum?
Was any individual patient identifiable data shared with HDI, AstraZeneca or IQVIA in the development of the Simulacrum?
Design and use of the simulated data
How is the synthetic data generated?
How often will the Simulacrum be updated to include more recent diagnoses?
Who is able to use the Simulacrum and for what purposes?
The Simulacrum is entirely synthetic data and is available for anyone to use. Because it only approximates to the original data results from the Simulacrum should not be used for clinical decisions.
The data model (not the data – which is synthetic) in the Simulacrum is the same as the original one in PHE. Once a user has refined their query using the Simulacrum, they can make request to Public Health England to have their queries run on the real data. Get in touch with simulacrumenquiries@phe.gov.uk to find out more. In this way, the Simulacrum can be used to assist research for public health, epidemiology, commissioning and service planning.
How robust is the data for clinical research purposes?
The data contained within the Simulacrum is synthetic; and it should never be used to make clinical decisions. The more complex the queries, the more approximate the results. Researchers who wish to run their analyses on the real data can make a formal request to Public Health England. Get in touch with simulacrumenquiries@phe.gov.uk to find out more.
Is the Simulacrum relevant for use beyond the UK?
Will this approach be extended for use in other disease areas, for example cardiovascular and metabolic diseases?
Will synthetic data be viable for use with regulators and market access decision-makers?
The data contained within the Simulacrum is synthetic; we therefore do not recommend submitting analyses based purely on the Simulacrum to regulatory agencies or market access decision makers without appropriate caveats. Researchers who wish to run their analyses on the real data can make a formal request to Public Health England. Get in touch with simularumenquiries@phe.gov.uk to find out more.
Can I use my preferred analytical package with the Simulacrum?
Can I publish any results generated directly from Simulacrum?
Yes, you can publish your results from the Simulacrum. By doing so you accept your published results are based on synthetic data. If you wish to run your queries on the real data, you can make a request to Public Health England. Get in touch with simulacrumenquiries@phe.gov.uk to find out more. We ask that anyone publishing Simulacrum results acknowledges the Simulacrum project. Please visit our acknowledgments page for more information.
Information about project sponsors
The Simulacrum is a joint project between Health Data Insight CIC, AstraZeneca (AZ) and IQVIA. Started in January 2016, staff from all organisations worked together to develop the statistical and technical elements of the build. The parties involved in this initiative firmly believe that improving access to this data will directly benefit patients by improving health outcomes.
What are the roles of AZ and IQVIA on the project?
AZ and IQVIA co-funded the development of the Simulacrum pilot.
At no time were AZ or IQVIA given access to patient identifiable data.
What is the role of Public Health England (PHE) in the project?
What is the role of Health Data Insight CIC on the project?
HDI owns the intellectual property and was responsible for managing the testing and development of the Simulacrum.