Background

The Simulacrum is synthetic cancer data which imitates some of the data held securely by the National Cancer Registration and Analysis Service (NCRAS) by the National Disease Registration Service which is part of Public Health England (PHE).

Researchers and others need access to data to answer important questions about diseases like cancer, but if real data is used patient confidentiality must be protected.  The Simulacrum has been created to help researchers and others who want to ask questions using cancer data to do so on a data set that looks and feels a lot like the real thing, but never compromises patient confidentiality.  This removes the need for many of the essential controls that are required to access the real patient data in Public Health England.

The Simulacrum looks and feels like the real cancer data within NCRAS, but does not contain any real patient information.  Anyone can use it to learn more about cancer in England without compromising patient privacy.  Also, because we have kept the data model the same as the real one in PHE, the Simulacrum can be used to write and test queries that (with the right permissions and ethical approval) could be run on the real data.

NCRAS collects data on all cancers diagnosed in England.  This is then linked with other data from the NHS to create a large and complex database that is held in the Cancer Analysis System (CAS). The Simulacrum has been designed to mimic some of the data held on the CAS.

Although the data is synthetic, the Simulacrum maintains most of the properties of the original data with a high degree of accuracy.  But, there are limitations; the more complex the data query the more approximate the results. However, because the data model (but not the data) is the same as the real model in the Cancer Analysis System in PHE, researchers can use the Simulacrum to plan and test their hypotheses before making a formal request to PHE to analyse the real data.