Date of Award
Fall 12-9-2025
Document Type
Thesis
Publication Status
Version of Record
Submission Date
December 2025
Department
Computer and Electrical Engineering and Computer Science
College Granting Degree
College of Engineering and Computer Science
Department Granting Degree
Electrical Engineering and Computer Science
Degree Name
Doctor of Philosophy (PhD)
Thesis/Dissertation Advisor [Chair]
Taghi M. Khoshgoftaar
Abstract
In today’s data-driven landscape, large volumes of data are generated continuously, often containing imperfections such as noise, missing data, or unreliable labeling. These real-world datasets are typically high-dimensional, sparsely labeled, and imbalanced, creating substantial challenges for both supervised and unsupervised learning. These challenges are especially prevalent in the task of anomaly detection, where instances belonging to the class of interest are rare and underrepresented compared to normal instances. This dissertation proposes and evaluates robust frameworks for anomaly detection that address these data challenges and improve model performance and robustness using real-world datasets, including credit card transactions and cognitive assessments. Supervised learning requires labeled data which can be costly, hard to produce, and prone to mislabeling. We propose a reconstruction error–based method to identify and correct mislabeled samples, thereby improving the quality of labeled data. To address imbalance and high dimensionality, we combine deep feature extraction using convolutional autoencoders, an unsupervised learning technique, with class rebalancing strategies to improve classification performance. Then, we examine how the order of preprocessing steps affects downstream ensemble learners. For unlabeled data, we propose a novel hybrid unsupervised framework that integrates convolutional autoencoders for representation learning with Isolation Forest for anomaly detection (CAE-IF). CAE–IF demonstrates robust performance on unlabeled, high-dimensional, and imbalanced data across cognitive and fraud detection domains, relative to common baselines such as Isolation Forest and Local Outlier Factor. In addition, we apply an instance-based iterative cleaning method that uses reconstruction error to remove likely outliers and improves representation quality for downstream detection without requiring manual annotation. The results demonstrate that our proposed approaches improve model robustness in various imperfect data conditions. Collectively, these contributions provide a practical and generalizable toolkit for anomaly detection, addressing the core challenges of class imbalance, label noise, and label scarcity across both supervised and unsupervised settings.
Recommended Citation
Salekshahrezaee, Zahra, "ROBUST ANOMALY DETECTION UNDER DATA IMBALANCE, NOISE, AND LABEL SCARCITY" (2025). Electronic Theses and Dissertations. 223.
https://digitalcommons.fau.edu/etd_general/223