Abstract
Background: Chronic Obstructive Pulmonary Disease (COPD) is a heterogeneous group of lung conditions challenging to diagnose and treat. Identification of phenotypes of patients with lung function loss may allow early intervention and improve disease management. We characterized patients with the “fast decliner” phenotype, determined its reproducibility, and predicted lung function decline after COPD diagnosis.
Methods: A prospective 4-years observational study that applies machine learning tools to identify COPD phenotypes among 13260 patients from the UK Royal College of General Practitioners and Surveillance Centre database. The phenotypes were identified prior to diagnosis (training dataset), and their reproducibility was assessed after COPD diagnosis (validation dataset).
Results: Three COPD phenotypes were identified, the most common of which was the “fast decliner” - characterized by patients of younger age with the lowest number of COPD exacerbations and better lung function - yet a fast decline in lung function with increasing number of exacerbations. The other two phenotypes were characterized by a) patients with the highest prevalence of COPD severity and b) patients of older age, most males and the highest prevalence of diabetes, cardiovascular comorbidities, and hypertension. These
phenotypes were reproduced in the validation dataset with 80% accuracy. Gender, COPD severity, and exacerbations were the most important risk factors for lung function decline in the most common phenotype.
Conclusions: In this study, three COPD phenotypes were identified prior to patients being diagnosed with COPD. The reproducibility of those phenotypes in a blind dataset following COPD diagnosis suggests their generalizability among different populations.
Methods: A prospective 4-years observational study that applies machine learning tools to identify COPD phenotypes among 13260 patients from the UK Royal College of General Practitioners and Surveillance Centre database. The phenotypes were identified prior to diagnosis (training dataset), and their reproducibility was assessed after COPD diagnosis (validation dataset).
Results: Three COPD phenotypes were identified, the most common of which was the “fast decliner” - characterized by patients of younger age with the lowest number of COPD exacerbations and better lung function - yet a fast decline in lung function with increasing number of exacerbations. The other two phenotypes were characterized by a) patients with the highest prevalence of COPD severity and b) patients of older age, most males and the highest prevalence of diabetes, cardiovascular comorbidities, and hypertension. These
phenotypes were reproduced in the validation dataset with 80% accuracy. Gender, COPD severity, and exacerbations were the most important risk factors for lung function decline in the most common phenotype.
Conclusions: In this study, three COPD phenotypes were identified prior to patients being diagnosed with COPD. The reproducibility of those phenotypes in a blind dataset following COPD diagnosis suggests their generalizability among different populations.
Original language | English |
---|---|
Article number | e000980 |
Number of pages | 11 |
Journal | BMJ Open Respiratory Research |
Volume | 8 |
Issue number | 1 |
Early online date | 29 Oct 2021 |
DOIs | |
Publication status | Published - 31 Jan 2022 |
Bibliographical note
AcknowledgementsWe acknowledge patients for allowing their data to be used for surveillance and research. Practices who have agreed to be part of the RCGP RSC and allow us to extract and used health data for surveillance and research. Ms. Filipa Ferreira from RCGP and Mr. Julian Sherlock from the University of Surrey. Apollo Medical Systems for data extraction. Collaboration with EMIS, TPP, In-Practice and Micro-test CMR supplier for facilitating data extraction. Colleagues at Public Health England.
Funding
This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors
Keywords
- fast decliner phenotype
- Machine learning
- cluster analysis
- ensemble models