Defensa de tesis doctoral: Advancing Face Analysis in Images and Videos: Age Estimation and Drowsiness Detection
Fecha de primera publicación: 09/07/2025
Autor: Salah Eddine Bekhouche
Tesis: Advancing Face Analysis in Images and Videos: Age Estimation and Drowsiness Detection
Director: Fadi Dornaika
Día: 17 de julio de 2025
Hora: 10:30h
Lugar: sala Ada Lovelace (Facultad de Informática)
Abstract:
"Deep learning, particularly through sophisticated architectures like CNN, has significantly advanced automated facial analysis. Tasks such as recognition and attribute analysis have seen performance boosts. However, achieving truly robust and versatile systems, especially for complex regression tasks like age estimation or dynamic state assessments like driver drowsiness detection, faces persistent challenges. A key hurdle remains developing models that reliably handle extreme variations in real-world conditions including pose, illumination, expression, occlusions, and intrinsic image quality issues. While techniques like attention mechanisms and specialized network designs exist, accurately interpreting subtle age-related facial changes across a lifetime or detecting fine-grained behavioral cues indicative of drowsiness under these variations remains difficult. Furthermore, ensuring robust generalization across diverse demographics, unseen environments, and varying data acquisition setups often requires more than standard data augmentation or transfer learning, demanding tailored methodological innovations.
Addressing these specific challenges in facial age estimation and driver drowsiness detection is crucial. Current age estimation methods, often relying on direct regression or simple classification with standard CNNs, can struggle with the non-linear nature of aging, sensitivity to variations unrelated to age, and may not adequately capture distinct features relevant to different life stages. Similarly, vision-based drowsiness detection often relies on indicators like PERCLOS, yawn frequency, or head pose, typically extracted using conventional computer vision techniques or basic deep learning models. These approaches can be sensitive to illumination changes, fail to integrate multiple cues effectively, lack robustness to individual differences in fatigue expression, and may not fully leverage the rich spatiotemporal information present in video sequences. Existing methods often lack mechanisms to specifically enhance feature discriminability for these challenging tasks or to compare systematically foundational approaches (handcrafted features) against modern deep learning within these specific contexts.
This thesis delves into these specific problems within automated facial analysis, proposing and evaluating advanced computational methods focused explicitly on enhancing facial age estimation from static images and driver drowsiness detection from video sequences. Motivated by the limitations of existing approaches and the need for systems robust to real-world variability, our work explores both the comparative efficacy of traditional handcrafted features versus contemporary deep learning techniques and introduces novel deep learning architectures and strategies tailored to these tasks. The primary objective is to develop methods that push the boundaries of accuracy, robustness, and practical applicability in these domains. The core contributions, validated through extensive experiments detailed herein using relevant benchmark datasets and evaluation metrics (e.g., Mean Absolute Error for age; Accuracy, F1-score for drowsiness), are:
A comprehensive comparative study systematically evaluating the performance of established handcrafted features against various deep learning-based features for human facial age estimation. This establishes critical baselines and contextualizes the performance gains achievable with deep learning, while also highlighting scenarios where traditional methods remain competitive or complementary.
The development and validation of a novel multi-stage deep neural network architecture specifically designed for facial age estimation. This approach aims to improve accuracy and robustness by decomposing the complex regression task into distinct stages, potentially better modeling age-related transformations across the lifespan.
The design and implementation of a specialized Spatiotemporal Convolutional Neural Network (ST-CNN) incorporating Pyramid Bottleneck Blocks. This architecture is demonstrated effectively for eye blinking detection, targeting the efficient capture of multi-scale spatiotemporal features crucial for recognizing micro-expressions relevant to drowsiness analysis in video data.
A new hybrid approach for end-to-end driver drowsiness detection in video sequences. This method utilizes a strategy for selecting and integrating deep features from different network levels or temporal windows, aiming to enhance the discriminative power of the feature representation and improve classification performance by focusing on the most salient fatigue indicators over time.
The methodologies employed span comparative analysis, feature engineering, the design of novel deep network architectures (multi-stage CNNs, ST-CNNs with specialized blocks), and hybrid feature selection strategies within deep learning frameworks. Collectively, this research advances the state-of-the-art in robust facial age estimation and video-based drowsiness detection. It contributes valuable comparative insights and introduces tailored deep learning solutions specifically designed to address the limitations of prior methods and overcome persistent challenges encountered in real-world facial analysis applications.
Keywords:
Face Analysis, Deep Learning, Age Estimation, Drowsiness Detection Computer Vision"