Date of Award
Fall 11-5-2025
Document Type
Dissertation
Publication Status
Version of Record
Submission Date
November 2025
Department
Computer and Electrical Engineering and Computer Science
College Granting Degree
College of Engineering and Computer Science
Degree Name
Doctor of Philosophy (PhD)
Thesis/Dissertation Advisor [Chair]
Michael DeGiorgio
Abstract
Natural selection leaves characteristic footprints in genomic variation, and detecting these patterns is fundamental to understanding evolutionary history and adaptation in humans. In this thesis, I present a progression of machine learning frameworks, beginning with highly specialized feature engineering and complex modeling, advancing to approaches that require minimal training, and culminating in methods that remain robust to model misspecification. First, I developed SISSSCO, a spectral feature extraction framework that applies wavelet transforms, multitaper analysis, and S-transform to genomic summary statistics, converting one-dimensional signals into two-dimensional spectral images analyzed by convolutional neural networks. SISSSCO achieved high accuracy across varied evolutionary scenarios, remained resilient to missing data, and uncovered both established and novel sweep candidates in European genomes. Building on this foundation, I introduced TrIdent, a transfer learning method that leverages pre-trained deep CNNs to efficiently extract features from multilocus genomic images. TrIdent reduced simulation requirements while improving detection of adaptive regions, provided interpretability through class activation maps, and revealed novel disease-associated candidate genes in European and African populations. Finally, I developed PULSe, a positive-unlabeled learning framework that bypasses the need for explicit negative training data, enabling robust sweep detection under domain shift and demographic misspecification. Applied to European and Bengali genomes, PULSe recovered well-supported sweep candidates and demonstrated strong generalizability across complex genomic landscapes. Together, these projects trace a trajectory from specialized yet powerful frameworks toward more flexible, generalized methodologies. This body of work advances machine learning strategies for detecting natural selection, extending their applicability from well-characterized to understudied human populations and enhancing our capacity to uncover the genetic basis of adaptation.
Recommended Citation
Arnab, Sandipan Paul, "FROM SPECIALIZED TO GENERALIZED FRAMEWORKS: BROADENING MACHINE LEARNING APPROACHES TO DETECT NATURAL SELECTION" (2025). Electronic Theses and Dissertations. 178.
https://digitalcommons.fau.edu/etd_general/178