Author Type

Graduate Student

Date of Award

Fall 11-5-2025

Document Type

Dissertation

Publication Status

Version of Record

Submission Date

November 2025

Department

Computer and Electrical Engineering and Computer Science

College Granting Degree

College of Engineering and Computer Science

Degree Name

Doctor of Philosophy (PhD)

Thesis/Dissertation Advisor [Chair]

Michael DeGiorgio

Abstract

Natural selection leaves characteristic footprints in genomic variation, and detecting these patterns is fundamental to understanding evolutionary history and adaptation in humans. In this thesis, I present a progression of machine learning frameworks, beginning with highly specialized feature engineering and complex modeling, advancing to approaches that require minimal training, and culminating in methods that remain robust to model misspecification. First, I developed SISSSCO, a spectral feature extraction framework that applies wavelet transforms, multitaper analysis, and S-transform to genomic summary statistics, converting one-dimensional signals into two-dimensional spectral images analyzed by convolutional neural networks. SISSSCO achieved high accuracy across varied evolutionary scenarios, remained resilient to missing data, and uncovered both established and novel sweep candidates in European genomes. Building on this foundation, I introduced TrIdent, a transfer learning method that leverages pre-trained deep CNNs to efficiently extract features from multilocus genomic images. TrIdent reduced simulation requirements while improving detection of adaptive regions, provided interpretability through class activation maps, and revealed novel disease-associated candidate genes in European and African populations. Finally, I developed PULSe, a positive-unlabeled learning framework that bypasses the need for explicit negative training data, enabling robust sweep detection under domain shift and demographic misspecification. Applied to European and Bengali genomes, PULSe recovered well-supported sweep candidates and demonstrated strong generalizability across complex genomic landscapes. Together, these projects trace a trajectory from specialized yet powerful frameworks toward more flexible, generalized methodologies. This body of work advances machine learning strategies for detecting natural selection, extending their applicability from well-characterized to understudied human populations and enhancing our capacity to uncover the genetic basis of adaptation.

Share

COinS