Introduction

This documentation defines the core diabetes dataset, including codelists and implementation rules for deriving its components, alongside code to extract these in a reproducible way. It also contains information on related projects.

 

Core diabetes dataset

The core diabetes dataset defines a standardised set of variables describing diabetes diagnosis, classification, treatment, and key outcomes, derived from routinely collected health data. It is designed to support consistent, reproducible analyses across studies by providing clear variable definitions, validated codelists, and explicit implementation rules for data extraction and derivation. This dataset was agreed by expert consensus with input from patients and the public. It was developed as part of a 2025-2026 NHS-funded driver project.

The core data set can be found here: Core diabetes dataset

 

Exeter 10,000

The Exeter 10,000 data set can be found here: Exeter 10,000

 

Codelists

Codelists and algorithms for defining variables in routine primary and secondary care data (SNOMED, dm+d, ICD10 and OPCS4), as well as implementation rules, are provided for the components of the core diabetes dataset and other variables.