Getting started

Here’s a step-by-step guide to using the Simulacrum. Join our mailing list to keep up to date with new versions of the Simulacrum. All useful documents relating to Simulacrum and its usage can be found on our Library.

1. Downloading Simulacrum

Learn about the Simulacrum data in our ‘What’s in the Simulacrum’ section and choose which available version you would like to download. 

Download the Simulacrum version of choice from our download page. 

Now you can begin to explore the data!

2. Analysing Simulacrum data

The Simulacrum datasets are too large to open in Microsoft Excel, so we recommend using RStudio, Stata, a Python IDE or a SQL Database to import and analyse the data. 

Before analysis, we advise reading the Simulacrum User Guide to understand more about the datasets that Simulacrum is based on and how to successfully link and query them.  

You can find some SQL query examples in our User Guidance or in the NCRAS guide for writing SQL queries on CAS data. 

3. Interpreting your results

It’s important to remember that the Simulacrum is synthetic and that results from analyses will not always accurately reflect the real data. The more complex the queries, the more approximate the Simulacrum results are. Thus, it should not be used alone to make inferences or clinical decisions.  

Please refer to the Limitations page and User Guide to understand this in more detail. 

Instead, it can be used to:

  • Learn about format and structure of the datasets in the Cancer Analysis System (CAS) 
  • Develop and test code to analyse complex cohorts of cancer patients  
  • Calculate direct results, which are highly reliable for simple queries, and indicative for more complex queries, to get a picture of what real results might look like
  • Plan analysis and research before requesting data from NDRS. 

Once you have refined your queries using the Simulacrum, with the right approvals, you can make a request to have your queries run on the real NDRS data.  

For guidance on the process for making requests for Simulacrum code to be run on the real data, please see the NCRAS guide to using the Simulacrum. You can also contact with us via simulacrumdata@healthdatainsight.org.uk or Contact for further help.

4. Publishing results

You are welcome to publish your research from the synthetic data available in the Simulacrum. But by doing so you accept your published results are based on synthetic data, which have limitations and should not be used to make clinical decisions. We recommend that request you consider whether your results are representative of the real data by requesting your queries to be run on the real data.

Anyone publishing data with results obtained by using the Simulacrum should acknowledge its use. Please visit citations for more information.