Date of Award
Fall 12-4-2025
Document Type
Thesis
Publication Status
Version of Record
Submission Date
December 2025
Department
Physics
College Granting Degree
Charles E. Schmidt College of Science
Degree Name
Master of Science (MS)
Thesis/Dissertation Advisor [Chair]
Wazir Muhammad
Abstract
Objective: The primary goal of this research is to establish and test a predictive model for risk stratification of Head and Neck Cancer (HNC) based on Artificial Neural Networks (ANNs) developed and trained using routinely available clinical, lifestyle, and demographic data. To test generalizability, temporal stability, and possible clinical applicability, the study evaluates the model’s performance in several datasets and validation methods.
Methods: A proactive multilayer Artificial Neural Network (ANN) with 32 input features, two hidden layers of 12 neurons, and one output neuron, featuring a 32-12-12-1 architecture, was designed and evaluated using large-scale statistics from public health. Some of the input variables included demographic characteristics and lifestyle exposures, as well as clinical variables such as smoking status, alcohol consumption, gender, occupation, ethnicity, family history of cancer, and medical comorbidities. The training data consisted of the Prostate, Lung, Colorectal, and Ovarian Cancer Screening Trial (PLCO) and the Health Interview Survey (NHIS, 1997-2023), encompassing 821,545 individuals and 1,292 cases of HNC. The model was validated with a multi-tier strategy, which comprised k-fold cross-validation, domain transfer validation, pure temporal validation over 26 years, and interpretability through SHAP (Shapley Additive Explanations).
Results: The ANN demonstrated consistent performance across all validation methods. In the conventional cross-validation, the model reached the AUCs of 0.824 (PLCO), 0.721 (NHIS), and 0.836 (combined dataset). The temporal validation revealed long-term stability in performance, with an AUC of 0.921 (95% CI: 0.903, 0.939) on the redesigned NHIS data from 2019 to 2023, representing a 36 percentage point improvement in stability compared to previous years. This optimization could only be attributed to improved data quality, with no changes to the model architecture, which underscores the crucial role of data infrastructure in determining model performance. The three-tier risk stratification model identified 80.8% of HNC cases among 33.3% of all individuals at high risk (PPV = 0.35%), and it successfully eliminated 33.4% of people at low risk (NPV = 99.97%). The SHAP analysis showed that the predictive values were strongly biological, and the most significant factors include family cancer age, cardiovascular comorbidities (e.g., angina), and alcohol consumption.
Recommended Citation
Hidayat, Abdullah, "AI-ASSISTED RISK PREDICTION AND POPULATION STRATIFICATION OF HEAD AND NECK CANCER (HNC) USING BASIC HEALTH DATA" (2025). Electronic Theses and Dissertations. 192.
https://digitalcommons.fau.edu/etd_general/192