Author Type

Graduate Student

Date of Award

Spring 2-26-2026

Document Type

Dissertation

Publication Status

Version of Record

Submission Date

March 2026

Department

Computer and Electrical Engineering and Computer Science

College Granting Degree

College of Engineering and Computer Science

Department Granting Degree

Electrical Engineering and Computer Science

Degree Name

Doctor of Philosophy (PhD)

Thesis/Dissertation Advisor [Chair]

Mohammad Ilyas

Abstract

Advancements in technology have significantly contributed to the development of innovative tools aimed at improving communication and accessibility for individuals with hearing impairments. This dissertation explores various machine learning and deep learning techniques for recognizing American Sign Language (ASL) gestures, focusing on enhancing accessibility and bridging the communication gap between hearing-impaired and hearing individuals. Traditional machine learning models, such as Random Forest, Support Vector Machines (SVM), and K-Nearest Neighbors (KNN), alongside deep learning architectures like AlexNet, ResNet-50, EfficientNet, ConvNeXt, and VisionTransformer, were investigated for their effectiveness. Experiments conducted on an extensive dataset of 87,000 ASL gesture images revealed exceptional recognition accuracy, with ResNet-50 achieving 99.98% and Random Forest reaching 99.55%, while other models performed within a range of 97% to 98%.

Building on these findings, an innovative real-time recognition system was developed, integrating computer vision and deep learning techniques. The project initially utilized MediaPipe for precise hand movement tracking and YOLOv8, a state-of-the-art object detection model, to translate ASL gestures into text in real time. A comprehensive dataset of 29,820 annotated images was created to ensure strong generalization across diverse hand positions and lighting conditions. MediaPipe’s hand landmark annotations significantly enhanced input quality, improving the YOLOv8 model’s training accuracy.

In addition, a more advanced framework was later designed that integrates YOLOv11 with MediaPipe for robust real-time ASL alphabet recognition. This system was trained on a large-scale dataset of 130,000 annotated images with custom keypoint-based annotations, enabling the model to capture subtle variations in hand and finger positions. Experimental evaluation demonstrated outstanding performance, achieving a mean Average Precision (mAP@0.5) of 98.2% with minimal latency, confirming its suitability for real-time applications in education, healthcare, and professional environments.

Overall, the findings of this dissertation underscore the transformative potential of AI-driven solutions for ASL recognition. By bridging communication gaps through both traditional classification models and real-time deep learning frameworks, this work contributes to fostering inclusivity, accessibility, and independence for individuals with hearing impairments.

Share

COinS