Simulating multiple types of data while trying to protect patient privacy is a complex mathematical challenge. Although the data in the Simulacrum is very accurate for simple queries, more complex queries have less reliable results.
For example, if you want to count one field at a time, such as, ‘the number of breast cancers at stage four’ the accuracy of the data is very high and the results are strongly indicative. But, a more complex query, such as, ‘the number of breast cancers diagnosed at stage four who received drug X and survived for more than 90 days’ will be more approximate.
The Simulacrum data is synthetic and must not be used to make clinical decisions or causal inferences. But, because the structure of the Simulacrum data is the same as the real data, it can be used to plan and refine analyses before making a formal request for a data release from NDRS.
The Simulacrum can be used to:
- Learn about format and structure of the datasets in the Cancer Analysis System (CAS);
- Develop and test code to analyse complex cohorts of cancer patients;
- Calculate direct results, which are highly reliable for simple queries, and indicative for more complex queries, to get a picture of what real results might look like;
- Plan analysis and research before requesting data from NDRS.
Before using the Simulacrum, we recommend you use the CAS explorer to determine if the patient and tumour tables in Simulacrum contain the data variables you need for your research.