Author | Dataset | Variables | Sample Size | Method | Results |
---|---|---|---|---|---|
ASD Prediction Models | |||||
Engelhard et al. [15] | EHR from Duke University Health System | Demographics, diagnosis and procedure codes, laboratory measurements, medications, vital signs, and encounter details | N: 45,080 ASD: 924 | L2-regularized Cox proportional hazards (CoxPH), gradient-boosting survival analysis and random survival forest | Autism detection (360 days): Sensitivity: 59.8% PPV: 17.6% Specificity: 81.5% |
Betts et al. [16] | Health Administrative Datasets from New South Wales, Australia | Clinical, demographic & lifestyle information and ICD10 AM (Australian version of the ICD-10) and Australian Classification of Health Intervention (ACHI) codes | N:261,447 mother baby dyads ASD: 981 | Logistic regression with elastic net regularization and gradient boosting trees | AUC:0.73 |
Allesøe et al. [13] | Danish nationwide registers, family and patient diagnostic histories, birth-related measurements and genetics | Psychiatric diagnosis codes, age, parent diagnosis history, parent and patient infections, autoimmune diseases, diabetes, migraine, epilepsy | N: 63,535 ASD: 12,878 ADHD: 15,969 Controls: 20,681 | Feed-forward DL network | Multidiagnostic prediction model: AUC: 0.8 MCC: 0.28 |
Onishchenko et al. [22] | Truven (claims), UCM (diagnostic records) | Comorbid disease categories | Truven ASD: 15,164 No ASD: 4,488,420 UCM ASD: 377 No ASD: 37,634 | SLD, Long Short-Term Memory, Random Forest, Gradient Boosting | AUC: > 0.8 |
Bishop et al. [23] | EHR from the Marshfield Clinic | Age at death, sex, EHR length, 30 comorbidities, co-occurring ID and down syndrome | ASD: 91 No ASD: 6,186 | Random Forest | Accuracy: 93% Sensitivity: 75% Specificity: 94% AUC: 0.88 |
Rahman et al. [24] | EMR from a Israeli Health Maintenance Organization | Socio demographics, parental medical histories, prescribed medications | ASD: 1, 397 No ASD: 94,741 | Logistic Regression, Artificial Neural Network, Random Forest | Accuracy: 95.62% Sensitivity: 29.93% Specificity: 98.18% PPV: 43.35% |
Hassan et al. [25] | National Database for Autism Research | Family and subject medical history | ASD: 2,577 Non-ASD: 410 | Decision Tree | Accuracy: 89.2% |
Maenner et al. [26] | Georgia site of the ADDM Network | Words and phrases contained in children’s developmental evaluations | ASD: 1,355 Non-ASD: 1,257 | Random Forest | Sensitivity: 84%, PPV: 89.4% AUC: 0.932 |
Ejlskov et al. [27] | Danish Civil Registration System & Danish Medical Birth Register | Mental, cardiometabolic, neurologic, congenital defects, autoimmune, asthma, allergy conditions, birth weight, year, gestational age, parental age, education | ASD: 26,840 Non-ASD: 1,670,391 | Random Forest, XGB, Generalized Linear Model, Elastic Net, Neural Networks, SVM, KNN, Ensemble Learning | AUC: > 0.6 |
Alexeeff et al. [28] | EMR and administrative claims for children in Northern California, Georgia, and the Northwest | 79 medical conditions from 19 domains | ASD: 3,911 Non-ASD: 38,609 | Clustering using Conditional Inference Tree | - |
Lingren et al. [29] | EHR from the Boston children hospital, Cincinnati Children’s Hospital and Medical Center & Children’s Hospital of Philadelphia | ICD-9 codes and concepts from clinical notes | ASD: 20,658 | Rules, SVM, Clustering | PPV: 0.786 Sensitivity: 0.769 F1: 0.761 AUC: 0.770 |
Leroy et al. [30] | EHRs frin the Arizona Developmental Disabilities Surveillance Program | Extraction of entities from text | N: 6,636 sentences | Rule based and Pruned Decision Tree | ML Precision: 60% Recall: 30% |
Chen et al. [31] | Market Scan Health Claims Database 2005–2016 | Disease CCS codes, sex, encounters of emergency department visits | ASD: 12,743 Non-ASD: 25,833 | Logistic Regression, Random Forest | Sensitivity: 40% PPV: 20.5% Specificity: 96.4% AUC: 0.834 |
Yuan et al. [32] | Hand written semi-structured and unstructured medical forms of children | Lexical, lda & doc2vec features, parent and teacher, preschool and early intervention questionnaires, phone intake by social workers | ASD: 56 Non-ASD: 143 | SVM | Accuracy: 83.4% Precision: 64.6% Recall: 91.1% F2: 84.2% |
Lerthattasilp et al. [33] | Medical records from the Thammasat University Hospital | Gender, age, chief complaint, communication, birthweight, maternal & paternal age, family history of ASD or DD, caregiver education, history of the child’s ASD, and clinical observation symptoms | ASD: 104 Non- ASD: 35 | Logistic Regression | AUC: 91% |
ADHD Prediction Models | |||||
GarciaArgibay et al. [17] | Population-based Swedish Registers | Psychiatric and somatic disorder ICD codes, sex, head circumference and weight at birth, small size for gestational age, Apgar score, number of failed subjects at school at age 16, and presence of criminal convictions | N: 238,696 ADHD: 12,893 | Logistric regression, random forest, gradient boosting, XGBoost, DNN and ensemble models | AUC: 0.75 |
Shi et al [34] | Birth cohorts of children in Olmsted County of Minnesota | ICD-9 codes | ADHD: 237 LD: 162 No ADHD: 1,194 | LASSO, Elastic Net Logistic Regression, Classification And Regression Trees, Stochastic Gradient Boosting | ADHD Accuracy: 0.96 Sensitivity: 0.69 Specificity: 0.99 PPV: 0.93 |
Mikolas et al. [35] | Medical records from Technical University Dresden | Age, gender, symptom ratings, Neuropsychological measures | ADHD: 153 No ADHD: 139 | Linear SVM | Accuracy: 66.1% Sensitivity: 66.9% Specificity: 65.4% AUC: 0.66 |
Chen et al. [36] | Patient records from the South West Yorkshire Partnership NHS Foundation Trust | Demographics, screening questionaires and clinical interviews | N: 69 | SVM, Logistic Regression, Naive Bayes, Random Forest, Decision Tree, KNN | Accuracy: 85.51% AUC: 0.871 |
Caye et al. [37] | Birth cohorts from ALSPAC, E-Risk, and Pelotas,The Multimodal Treatment Study of Children with ADHD (MTA) | Female sex, socio economic status, mother’s depression, IQ, maltreatment, ADHD, depressive symptoms, oppositional defiant behaviour & conduct disorders, and single parent | ALSPAC ADHD:5113 E-Risk ADHD:2040 Pelotas ADHD:4039 MTA ADHD:476 No ADHD:241 | Logistic Regression, Random Forest, Stochastic Gradient Boosting, ANN | AUC: 0.82 |
Elujide et al. [38] | Medical records from Yaba Psychiatry Hospital, Yaba, Lagos State, Nigeria | Age, occupation, religion, spiritual consult, age code, faNoily, loss of parent, genetic, sex, status, injury & divorce | N: 500 | DL, Multi Layer Perceptron, SVM, Random Forest, DecisionTree | Accuracy: 65% AUC: 0.73 |
Morris et al. [39] | Neurofibromatosis type 1 (NF1) clinical registry and EHR information from Washington University | Race, sex, family history of NF1, clinical features associated with NF1, and diagnosis codes | ADHD: 194 No ADHD: 384 | Gradient Boosting | AUC: 0.74 |
Tran et al. [40] | The CEGS N-GRID 2016 Shared Task in Clinical NLP Records | Short text description of patient’s history of present illness | ADHD: 404 No ADHD: 582 | SVM, CNN and ReHAN | micro-F: 63.144% |
Other NDDs Prediction Models | |||||
Movaghar et al. [41] (Fragile X syndrome) | EHR from Marshfield Clinic health-care system | ICD-9 codes | FXS: 55 No FXS: 5,500 | Random Forest | AUC: 0.772 |
Koivu et al. [42] (Down syndrome) | Clinical records of two datasets from the Canada and one from the UK | Biological measurements, ethnicity, smoking, maternal age, patient weight, gestational age, NT, PAPP-A and fhCG measurements | Trisomy21:239 Controls:1611 | SVM, Logistic Regression, Naive Bayes, Random Forest, Decision Tree, Deep Forward NN, KNN | AUC: 0.96 |
Gui et al. [43] (Neurodevelopment) | Medical records & MRI scans from University Hospitals of Geneva | Gestational Age, Birth Weight, Persistent ductus arteriosus, Birth asphyxia, presence of sepsis, Bronchopulmonary dysplasia, parental socioeconomic status | N: 84 | Linear Discriminant Analysis | AUC: 0.77 |
Randolph et al. [44] (Neurodevelopment) | The NICHD NRN Generic Database (GDB) | Cord pH, BE, GA, BW, 5-min Apgar, SGA, multiple gestation, race/ethnicity, maternal insurance, hypertension, hemorrhage, antibiotics, ANS | N: 3,979 NDI/death: 2124 | Logistic Regression | Accuracy: 70% Sensitivity: 0.71 Specificity: 0.64 |
Hill et al [45] (Commu-nication impairment) | Pediatric Health Information System from Children’s Hospital Association (Overland Park, KS) | Patient demographics, diagnoses, procedures, detailed pharmacy information | CI: 50, No CI: 86 | Logistic Regression | Sensitivity: 82.6% Specificity: 86.3% AUC: 0.92 |
Pruett et al. [46] (Developmental stuttering) | Vanderbilt University Medical Center (VUMC) EHR | Child, adult onset fluency disorder, hearing loss, sleep disorders, atopy, codes for infections, neurological deficits, body weight | Stutter: 574 No Stutter: 2754 | Decision Tree | PPV: > 83% |
Shaw et al. [47] (Developmental stuttering) | Vanderbilt University’s EHR | Comorbid ICD-9 codes mapped to phecodes | Stutter: 141 No Stutter: 709 | Classification And Regression Tree | PPV: 83% Sensitivity: 68.8% |
Shrot et al. [48] (Intellectural development) | Edmond and Lily Safra Children’s Hospital, Sheba Medical Center EHR | Clinical and imaging data. Clinical data included genetic, demographic, and seizure characteristics | ID: 39 No ID: 38 | Random Forest | Sensitivity: 0.69 Precision: 0.81 AUC: 0.68 |
van Dokkum et al. [49] (Developmental delay) | Community based cohort Longitudinal Preterm Outcome Project (LOLLIPOP) | Perinatal, parental factors and child growth milestones during the first two years | N: 1,983 | Logistic Regression | Sensitivity: 73% Specificity: 80% AUC: 0.837 |