Core diabetes dataset

This page details the steps to produce the tables which comprise the core diabetes dataset in GP data. These tables can then be combined to create different diabetes cohorts.

 

Prerequisites:

  • Date of birth should be available for all patients (at minimum, year of birth; month and year preferred). Patients without this information should be excluded. Where the exact date of birth is unavailable, we have developed an algorithm for defining date of birth; however, its use may be restricted by data source–specific governance or regulatory requirements.
  • Sex/gender should be available for all patients (male/female/indeterminate or other)
  • Additional data cleaning steps depend on data source: CPRD additional exclude those without valid registration date or aged >115 years, or from practices with anomalous mortality rates

 

Additional considerations:

  • Earliest codes for conditions such as diabetes may be stored in 'problem' or 'observation' table depending on data source.

 

Step Description Output Summary statistics for quality checking
1 Create a table of all clean instances of all diabetes SNOMED codes 1 table with multiple rows per patient, with each clean instance of a diabetes SNOMED code including patient identifier, date of diabetes code observation, and type of diabetes code (type 1, type 2, unspecified diabetes type) Clean diabetes code dates: min, P25, P50, P75, max, mode. Number of unique patients with clean diabetes SNOMED code and % of patients in dataset
2 Find earliest clean diabetes code per patient 1 table with 1 row per patient and date of earliest clean code observation Distribution of earliest code dates: min, P25, P50, P75, max, mode
3 Create a table of all clean HbA1c measurements in mmol/mol 1 table with multiple rows per patient, with each cleaned HbA1c measurement including patient identifier, date of result and value of result. Cleaning includes combining where there are multiple values on the same day so there should not be multiple rows with the same date and patient identifier. Clean HbA1c values: min, P25, P50, P75, max, mode. Clean HbA1c dates: min, P25, P50, P75, max, mode.
4 Create tables of all clean insulin and non-insulin glucose-lowering medication prescriptions 2 tables (1 for insulin and 1 for non-insulin glucose-lowering medications) with multiple rows per patient, with each clean instance of a dm+d code including patient identifier, date of prescription, drug class and drug substance, and any information relating to drug substance, quantity and dose available. There may be multiple rows with the same date and patient identifier if the patient has mutiple prescriptions for insulin or non-insulin glucose-lowering medications on the same day. Clean insulin dates: min, P25, P50, P75, max, mode. Clean non-insulin glucose-lowering medication dates: min, P25, P50, P75, max, mode. Number of unique patients with clean insulin or non-insulin glucose-lowering medication prescription and % of patients in dataset
5 Find earliest HbA1c>=48 mmol/mol per patient 1 table with 1 row per patient and date of earliest clean HbA1c>=48 mmol/mol Earliest code dates: min, P25, P50, P75, max, mode. Number of unique patients with HbA1c>=48 mmol/mol and % of patients in dataset
6 Find earliest insulin and non-insulin glucose-lowering medication prescription per patient Table with 1 row per patient and date of earliest prescription Earliest insulin code dates: min, P25, P50, P75, max, mode. Earliest non-insulin glucose-lowering medication dates: min, P25, P50, P75, max, mode.
7 Find diabetes diagnosis dates: earliest of diabetes code (step 2), HbA1c>=48 mmol/mol (step 5) and glucose-lowering medication prescription (step 6) Table with 1 row per patient and date of earliest diabetes code, HbA1c>=48 mmol/mol and glucose-lowering medication prescription, and overall earliest date of these Of patients with a clean diabetes code: % with any HbA1c>=48mmol/mol, % with earliest HbA1c>=48mmol/mol earlier than earliest diabetes code, % with any glucose-lowering medication prescription, % with earliest glucose-lowering medication prescription earlier than earliest diabetes code, % with any HbA1c>=48mmol/mol or glucose-lowering medication prescription, % with earliest HbA1c>=48mmol/mol or earliest glucose-lowering medication prescription earlier than earliest diabetes code
8 Find counts of type 1 and type 2-specific diabetes codes per patient using clean table from step 1 Table with 1 row per patient and counts of type 1-specific diabetes SNOMED codes and counts of type 2-specific diabetes SNOMED codes N/A
9 Determine patient diabetes type using insulin use (step 4) and type 1 and type 2 code counts (step 8) as per our algorithm Table with 1 row per patient and assigned diabetes type: 'type 1', 'type 2' or 'unclassified' Of those with a diabetes code: % type 1, % type 2, % unclassified
10 For each of the following variables, create a table of all clean instances of the relevant SNOMED codes:
  • Ethnicity
  • BMI
  • Weight
  • Height
  • Systolic blood pressure (SBP)
  • Systolic blood pressure (DBP)
  • Total cholesterol
  • HDL cholesterol
  • Triglycerides
  • Alanine aminotransferase (ALT)
  • Creatinine
  • Urine albumin/creatinine ratio (ACR)
  • Urine albumin
  • Urine creatinine
  • Retinopathy
  • Foot complication
  • Amputation
  • Myocardial infarction
  • Heart failure
  • Stroke
  • Angina
  • Peripheral arterial disease
  • Peripheral arterial revascularisation
  • Transient ischaemic attack
  • Hypertension
  • Atrial fibrillation
  • Ischaemic heart disease
  • Coronary revascularisation
  • Chronic kidney disease stage 5
  • Diabetic ketoacidosis/hyperosmolar hyperglycaemic state
  • Alcohol
  • Smoking
  • CGM prescription
  • Lipid lowering medication prescription
  • Blood-pressure lowering medication prescription
  • Anti-platelet therapy prescription
For each variable, table with multiple rows per patient including patient identifier, date, severity of code (retinopathy, foot complication, amputation only) and value of result (BMI, weight, height, SBP, DBP, total cholesterol, HDL, triglycerides, ALT, creatinine, urine albumin/creatinine/ACR only) For all variables: clean dates: code dates: min, P25, P50, P75, max, mode. For measured values (BMI, weight, height, SBP, DBP, total cholesterol, HDL, triglycerides, ALT, creatinine, urine albumin/creatinine/ACR): clean values: min, P25, P50, P75, max, mode
11 Define ethnicity using our algorithm Table with 1 row per patient and ethnicity % of all patients in each of 5-category and 11-category ethnicities and with missing ethnicity