Biomarker Discovery in Parkinson’s disease using Machine Learning on Public Multi-omic Datasets: A Pilot Study

M. Makarious, H. Iwaki, C. Blauwendraat, H. Leonard, S. Hashemi, J. Kim, K. Van Keuren-Jensen, D. Craig, E. Appelmans, L. Smolensky, M. Bookman, A. Singleton, F. Faghri, M. Nalls (Bethesda, MD, USA)

Meeting: MDS Virtual Congress 2020

Abstract Number: 491

Keywords: Scales

Category: Parkinson's Disease: Genetics

Objective: In this study, we use machine learning (ML) as a framework for biomarker studies to assess if multiple modalities in the same model perform best.

Background: Parkinson’s disease (PD) is a complex, progressive disorder where rare and common genetic variants contribute to the risk, onset, and progression of disease. Given the long latency between the damage to dopaminergic cells and the onset of clinical symptoms, there is an increasing need to identify reliable biomarkers that can 1) predict onset, 2) disease progression or 3) response to therapeutic interventions. Currently, most biomarker studies only focus on a handful of features from a single assay. Preliminary results show multimodal data improves prediction between cases and controls.

Method: By using public datasets such as the Parkinson’s Progression Markers Initiative (PPMI) via the Accelerating Medicines Partnership – Parkinson’s Disease (AMP-PD), we have developed an automated ML tool (GenoML) that applies different ML algorithms to genetic, clinical, and transcriptomic data separately and combined to assess the accuracy, sensitivity, and specificity of predictive models. This included 872 samples that had sequenced genomes , clinical data, and ~50K normalized transcripts from RNA sequencing.

Results: Determining best algorithms based on the area under the curve (AUC) for predicting peri-diagnostic PD, both genetic and transcriptomic data performed best using XGBoost, while clinical data performed best using logistic regression. While the types of data each individually performed well (clinical; AUC=85.5%, genetic; AUC=79.5%, and transcriptomic; AUC=79.6%), clinical data has lowest sensitivity (clinical; 0.71, genetic; 0.73, and transcriptomic; 0.80) while having the highest specificity (clinical; 0.88, genetic; 0.69, and transcriptomic; 0.83). However, using all three data types combined, the XGBoost performed best, with AUC=89.88%, sensitivity=0.78, and specificity=0.83 in witheld testing samples.

Conclusion: When assessing the performance in 30% of test samples after training on 70% of samples, multiple modalities implemented in the same predictive model performs best. By incorporating different modalities, we can develop more comprehensive predictive models to better understand the complex disease and identify better biomarkers.

To cite this abstract in AMA style:

M. Makarious, H. Iwaki, C. Blauwendraat, H. Leonard, S. Hashemi, J. Kim, K. Van Keuren-Jensen, D. Craig, E. Appelmans, L. Smolensky, M. Bookman, A. Singleton, F. Faghri, M. Nalls. Biomarker Discovery in Parkinson’s disease using Machine Learning on Public Multi-omic Datasets: A Pilot Study [abstract]. Mov Disord. 2020; 35 (suppl 1). https://www.mdsabstracts.org/abstract/biomarker-discovery-in-parkinsons-disease-using-machine-learning-on-public-multi-omic-datasets-a-pilot-study/. Accessed April 3, 2025.

« Back to MDS Virtual Congress 2020

MDS Abstracts - https://www.mdsabstracts.org/abstract/biomarker-discovery-in-parkinsons-disease-using-machine-learning-on-public-multi-omic-datasets-a-pilot-study/