You are viewing the site in preview mode

Skip to main content

Table 1 Main characteristics of the reviewed articles

From: Predicting neurodevelopmental disorders using machine learning models and electronic health records – status of the field

Author

Dataset

Variables

Sample Size

Method

Results

ASD Prediction Models

 Engelhard et al. [15]

EHR from Duke University Health System

Demographics, diagnosis and procedure codes, laboratory measurements, medications, vital signs, and encounter details

N: 45,080

ASD: 924

L2-regularized Cox proportional hazards (CoxPH), gradient-boosting survival analysis and random survival forest

Autism detection (360 days):

Sensitivity: 59.8% PPV: 17.6%

Specificity: 81.5%

 Betts et al. [16]

Health Administrative Datasets from New South Wales, Australia

Clinical, demographic & lifestyle information and ICD10 AM (Australian version of the ICD-10) and Australian Classification of Health Intervention (ACHI) codes

N:261,447 mother baby dyads

ASD: 981

Logistic regression with elastic net regularization and gradient boosting trees

AUC:0.73

 Allesøe et al. [13]

Danish nationwide registers, family and patient diagnostic histories, birth-related measurements and genetics

Psychiatric diagnosis codes, age, parent diagnosis history, parent and patient infections, autoimmune diseases, diabetes, migraine, epilepsy

N: 63,535

ASD: 12,878

ADHD: 15,969

Controls: 20,681

Feed-forward DL network

Multidiagnostic prediction model: AUC: 0.8

MCC: 0.28

 Onishchenko et al. [22]

Truven (claims),

UCM (diagnostic records)

Comorbid disease categories

Truven ASD: 15,164

No ASD: 4,488,420

UCM ASD: 377

No ASD: 37,634

SLD, Long Short-Term Memory, Random Forest, Gradient Boosting

AUC: > 0.8

 Bishop et al. [23]

EHR from the Marshfield Clinic

Age at death, sex, EHR length, 30 comorbidities, co-occurring ID and down syndrome

ASD: 91

No ASD: 6,186

Random Forest

Accuracy: 93%

Sensitivity: 75%

Specificity: 94%

AUC: 0.88

 Rahman et al. [24]

EMR from a Israeli Health Maintenance Organization

Socio demographics, parental medical histories, prescribed medications

ASD: 1, 397

No ASD: 94,741

Logistic Regression, Artificial Neural Network, Random Forest

Accuracy: 95.62%

Sensitivity: 29.93%

Specificity: 98.18%

PPV: 43.35%

 Hassan et al. [25]

National Database for Autism Research

Family and subject medical history

ASD: 2,577

Non-ASD: 410

Decision Tree

Accuracy: 89.2%

 Maenner et al. [26]

Georgia site of the ADDM Network

Words and phrases contained in children’s developmental evaluations

ASD: 1,355

Non-ASD: 1,257

Random Forest

Sensitivity: 84%,

PPV: 89.4%

AUC: 0.932

 Ejlskov et al. [27]

Danish Civil Registration System & Danish Medical Birth Register

Mental, cardiometabolic, neurologic, congenital defects, autoimmune, asthma, allergy conditions, birth weight, year, gestational age, parental age, education

ASD: 26,840

Non-ASD: 1,670,391

Random Forest, XGB, Generalized Linear Model, Elastic Net, Neural Networks, SVM, KNN, Ensemble Learning

AUC: > 0.6

 Alexeeff et al. [28]

EMR and administrative claims for children in Northern California, Georgia, and the Northwest

79 medical conditions from 19 domains

ASD: 3,911

Non-ASD: 38,609

Clustering using Conditional Inference Tree

-

 Lingren et al. [29]

EHR from the Boston children hospital, Cincinnati Children’s Hospital and Medical Center & Children’s Hospital of Philadelphia

ICD-9 codes and concepts from clinical notes

ASD: 20,658

Rules, SVM, Clustering

PPV: 0.786

Sensitivity: 0.769

F1: 0.761

AUC: 0.770

 Leroy et al. [30]

EHRs frin the Arizona Developmental Disabilities Surveillance Program

Extraction of entities from text

N: 6,636 sentences

Rule based and Pruned Decision Tree

ML Precision: 60%

Recall: 30%

 Chen et al. [31]

Market Scan Health Claims Database 2005–2016

Disease CCS codes, sex, encounters of emergency department visits

ASD: 12,743

Non-ASD: 25,833

Logistic Regression, Random Forest

Sensitivity: 40%

PPV: 20.5%

Specificity: 96.4%

AUC: 0.834

 Yuan et al. [32]

Hand written semi-structured and unstructured medical forms of children

Lexical, lda & doc2vec features, parent and teacher, preschool and early intervention questionnaires, phone intake by social workers

ASD: 56

Non-ASD: 143

SVM

Accuracy: 83.4%

Precision: 64.6%

Recall: 91.1%

F2: 84.2%

 Lerthattasilp et al. [33]

Medical records from the Thammasat University Hospital

Gender, age, chief complaint, communication, birthweight, maternal & paternal age, family history of ASD or DD, caregiver education, history of the child’s ASD, and clinical observation symptoms

ASD: 104

Non- ASD: 35

Logistic Regression

AUC: 91%

ADHD Prediction Models

 GarciaArgibay et al. [17]

Population-based Swedish Registers

Psychiatric and somatic disorder ICD codes, sex, head circumference and weight at birth, small size for gestational age, Apgar score, number of failed subjects at school at age 16, and presence of criminal convictions

N: 238,696

ADHD: 12,893

Logistric regression, random forest, gradient boosting, XGBoost, DNN and ensemble models

AUC: 0.75

 Shi et al [34]

Birth cohorts of children in Olmsted County of Minnesota

ICD-9 codes

ADHD: 237

LD: 162

No ADHD: 1,194

LASSO, Elastic Net Logistic Regression, Classification And Regression Trees, Stochastic Gradient Boosting

ADHD Accuracy: 0.96

Sensitivity: 0.69

Specificity: 0.99

PPV: 0.93

 Mikolas et al. [35]

Medical records from Technical University Dresden

Age, gender, symptom ratings, Neuropsychological measures

ADHD: 153

No ADHD: 139

Linear SVM

Accuracy: 66.1%

Sensitivity: 66.9%

Specificity: 65.4%

AUC: 0.66

 Chen et al. [36]

Patient records from the South West Yorkshire Partnership NHS Foundation Trust

Demographics, screening questionaires and clinical interviews

N: 69

SVM, Logistic Regression, Naive Bayes, Random Forest, Decision Tree, KNN

Accuracy: 85.51%

AUC: 0.871

 Caye et al. [37]

Birth cohorts from ALSPAC, E-Risk, and Pelotas,The Multimodal Treatment Study of Children with ADHD (MTA)

Female sex, socio economic status, mother’s depression, IQ, maltreatment, ADHD, depressive symptoms, oppositional defiant behaviour & conduct disorders, and single parent

ALSPAC

ADHD:5113 E-Risk

ADHD:2040 Pelotas

ADHD:4039

MTA ADHD:476 No

ADHD:241

Logistic Regression, Random Forest, Stochastic Gradient Boosting, ANN

AUC: 0.82

 Elujide et al. [38]

Medical records from Yaba Psychiatry Hospital, Yaba, Lagos State, Nigeria

Age, occupation, religion, spiritual consult, age code, faNoily, loss of parent, genetic, sex, status, injury & divorce

N: 500

DL, Multi Layer Perceptron, SVM, Random Forest, DecisionTree

Accuracy: 65%

AUC: 0.73

 Morris et al. [39]

Neurofibromatosis type 1 (NF1) clinical registry and EHR information from Washington University

Race, sex, family history of NF1, clinical features associated with NF1, and diagnosis codes

ADHD: 194

No ADHD: 384

Gradient Boosting

AUC: 0.74

 Tran et al. [40]

The CEGS N-GRID 2016 Shared

Task in Clinical

NLP Records

Short text description of patient’s history of present illness

ADHD: 404

No ADHD: 582

SVM, CNN and ReHAN

micro-F: 63.144%

Other NDDs Prediction Models

 Movaghar et al. [41] (Fragile X syndrome)

EHR from Marshfield Clinic health-care system

ICD-9 codes

FXS: 55

No FXS: 5,500

Random Forest

AUC: 0.772

 Koivu et al. [42] (Down syndrome)

Clinical records of two datasets from the Canada and one from the UK

Biological measurements, ethnicity, smoking, maternal age, patient weight, gestational age, NT, PAPP-A and fhCG measurements

Trisomy21:239

Controls:1611

SVM, Logistic Regression, Naive Bayes, Random Forest, Decision Tree, Deep Forward NN, KNN

AUC: 0.96

 Gui et al. [43] (Neurodevelopment)

Medical records & MRI scans from University Hospitals of Geneva

Gestational Age, Birth Weight, Persistent ductus arteriosus, Birth asphyxia, presence of sepsis, Bronchopulmonary dysplasia, parental socioeconomic status

N: 84

Linear Discriminant Analysis

AUC: 0.77

 Randolph et al. [44] (Neurodevelopment)

The NICHD NRN Generic Database (GDB)

Cord pH, BE, GA, BW, 5-min Apgar, SGA, multiple gestation, race/ethnicity, maternal insurance, hypertension, hemorrhage, antibiotics, ANS

N: 3,979

NDI/death: 2124

Logistic Regression

Accuracy: 70%

Sensitivity: 0.71

Specificity: 0.64

 Hill et al [45] (Commu-nication impairment)

Pediatric Health

Information System from Children’s Hospital Association (Overland Park, KS)

Patient demographics, diagnoses, procedures, detailed pharmacy information

CI: 50,

No CI: 86

Logistic Regression

Sensitivity: 82.6%

Specificity: 86.3%

AUC: 0.92

 Pruett et al. [46] (Developmental stuttering)

Vanderbilt University Medical Center (VUMC) EHR

Child, adult onset fluency disorder, hearing loss, sleep disorders, atopy, codes for infections, neurological deficits, body weight

Stutter: 574

No Stutter: 2754

Decision Tree

PPV: >  83%

 Shaw et al. [47] (Developmental stuttering)

Vanderbilt University’s EHR

Comorbid ICD-9 codes mapped to phecodes

Stutter: 141

No Stutter: 709

Classification And Regression Tree

PPV: 83%

Sensitivity: 68.8%

 Shrot et al. [48] (Intellectural development)

Edmond and Lily Safra Children’s Hospital, Sheba Medical Center EHR

Clinical and imaging data. Clinical data included genetic, demographic, and seizure characteristics

ID: 39

No ID: 38

Random Forest

Sensitivity: 0.69

Precision: 0.81

AUC: 0.68

 van Dokkum et al. [49] (Developmental delay)

Community based cohort Longitudinal Preterm Outcome Project (LOLLIPOP)

Perinatal, parental factors and child growth milestones during the first two years

N: 1,983

Logistic Regression

Sensitivity: 73%

Specificity: 80%

AUC: 0.837