Breadcrumb

DIFusio@

Doctoral Thesis Defence: Towards scalable sermi-supervised learning on graphs

Author: Zoulfikar Ibrahim

Thesis: Towards scalable sermi-supervised learning on graphs

Directors: Fadi Dornaika eta Alireza Bosaghzadeh

Day: 29 April 2025
Time: 10:30h
Place: Ada Lovelace room (Faculty of Computer Science)

Abstract:

"Graph-based semi-supervised learning (GSSL) has attracted considerable attention dueto its ability to leverage both labeled and unlabeled data to improve classification performance. This thesis addresses the limitations of traditional GSSL methods, such as reliance on predefined graphs, computational inefficiency with large datasets, treating data equally, and insufficient handling of unlabeled data, by proposing a unified and scalable framework.

Recent advancements in GSSL have primarily focused on predefined graph structures, which often do not accurately represent the data topology, and struggle with scalability issues. To overcome these challenges, this research introduces several innovative approaches, including anchor-based graph construction, adaptive sample weighting, and dynamic self-training, specifically designed for large-scale datasets.

The Joint Graph and Reduced Flexible Manifold Embedding (SGRFME) algorithm integrates anchor graph computation into the learning model. This approach not only scales efficiently to large databa ses but also improves the accuracy of label predictions for test samples through linear transformations. Experimental results on datasets such as NORB, RCVl, and Covtype demonstrate the method's effectiveness and scalability.

Addressing graph topology imbalance, the Weighted Simultaneous Graph Construction and Reduced Flexible Manifold Embedding (W-SGRFME) algorithm extends the concept of graph topology imbalance to large datasets and incorporates computed weights of labeled samples into the model. The fusion of labeis and features of anchors allows for adaptive graph construction, resulting in superior performance on large datasets.

A scalable and inductive semi-supervised classifier with sample weighting based on graph topology, called the Weighted Joint Graph Construction and Reduced Flexible Manifold Embedding (W-JGRFME), employs calculated weights of labeled samples for label-matching, leading to a unified and scalable model that simultaneously labels unlabeled data and constructs an adaptive anchor graph. Experimental results on extensive datasets, including MNIST, validate the method's robustness and superiority.

A unified framework for inductive and scalable GSSL using adaptive sample weighting, termed Adaptive Weighted Simultaneous Graph Construction and Reduced Flexible Manifold Embedding (AW-SGRFME), assigns adaptive weights to labeled samples based on their estimated labeis and constructs an anchor-to­ anchor affinity graph by incorporating both feature and label information. The efficacy of this method is illustrated through experiments on large-scale datasets.

Lastly, the Self-training Simultaneous Graph Construction and Reduced Flexible Manifold Embedding (SSGRFME) framework, tailored for very large datasets, utilizes pseudo-labeling to enhance the model's accuracy by incorporating confidently predicted labeis of random batches of unlabeled samples into the training set. The anchor-to-anchor affinity graphs facilitate robust learning, as demonstrated by comprehensive experimentation across diverse large datasets.

Overall, this thesis makes significant contributions to the field of GSSL by introducing automatic graph consrtuction, weighted labeled samples, and self­ training methods that effectively address the challenges of large-scale semi­ supervised learning. These innovative approaches not only enhance the performance of GSSL models but also extend their applicability across various domains, providing a robust foundation for future research. The main contributions of the thesis are outlined below.

(1) Propases the SGRFME model, which handles large-scale datasets using anchor points. SGRFME jointly predicts labels, constructs the anchor affinity matrix, and estimates a projection matrix for test samples.
(2) SGRFME emphasizes the integration of graph construction and label estimation into a unified objective function.
(3) W-SGRFME focuses on extending graph topology imbalance to large datasets and integrating this concept into a scalable semi-supervised model with $1_2$ regularization.
(4) Propases the W-JGRFME model, addressing topological imbalance through weighted labeled nades with Nuclear norm Regularization.
(5) Introduces the AW-SGRFME method, which uses adaptive weights for labeled nades based on dynamic class prediction.
(6) Propases a dynamic self-training approach for GSSL, integrating graph construction and label propagation.

K.eywords: Graph-based Semi-supervised Learning, Scalable Learning, lnductive Models, Anchor-based Graph Construction, Large-scale Datasets"


Category Filter