Skip to content

Core Diabetes Dataset

This documentation defines the core diabetes dataset, including codelists and implementation rules for deriving its components, alongside code to extract these in a reproducible way. It also contains information on related projects.

 

Getting started

Use the navigation on the left to jump directly to the parts of the dataset you need. Start with the core dataset overview to understand variables and derivations, explore the Exeter 10,000 project for cohort-specific information, and review the codelists section for the validated clinical code groups used throughout the framework.

 

Core dataset

The core diabetes dataset defines a standardised set of variables describing diabetes diagnosis, classification, treatment, and key outcomes, derived from routinely collected health data. It is designed to support consistent, reproducible analyses across studies by providing clear variable definitions, validated codelists, and explicit implementation rules for data extraction and derivation. This dataset was agreed by expert consensus with input from patients and the public. It was developed as part of a 2025-2026 NHS-funded driver project.

The core data set can be found here: Core diabetes dataset

 

Exeter 10,000

The Exeter 10,000 data set can be found here: Exeter 10,000

 

Codelists

Codelists and algorithms for defining variables in routine primary and secondary care data (SNOMED, dm+d, ICD10 and OPCS4), as well as implementation rules, are provided for the components of the core diabetes dataset and other variables.

 

Logo