Defensa de tesis doctoral: Improving Fidelity and Table Representation in Table Understanding and Table- to- Text Generation
Fecha de primera publicación: 24/02/2025
Autor: Iñigo Borja Alonso González
Tesis: Improving Fidelity and Table Representation in Table Understanding and Table- to- Text Generation
Director: Eneko Agirre Bengoa
Día: 27 de febrero de 2025
Hora: 11:00h
Lugar: Sala Ada Lovelace (Facultad de Informática)
Abstract:
"The field of Natural Language Processing (NLP) has advanced considerably, yet applying its techniques to structured data, like tables, introduces unique challenges. These challenges stem from the structured nature of tables and the need for accurate interpretation of their data. Among these challenges, a critical one in Table Understanding (TU) is the ability to represent all table information in a complete and efficient manner while ensuring, particularly in natural language generation tasks like table-to-text, that the generated texts remain faithful to the source data.
The goal of this thesis is to contribute to the field of TU by developing techniques that enhance fidelity in table-to-text generation and improve table representation to better capture information within tabular data. To this end, this thesis explores the use of structured semantics to guide table-to-text generation models in producing descriptions that faithfully represent table data. We dissect the critical components that play a key role in achieving this, including the grammar used to represent these semantics and the conditioning signals required to build them. We propose the use of automatically generated logical forms and analyze the impact of content selection in enhancing the system's accuracy. We demonstrate that using automatically generated logical forms significantly improves faithfulness and factual accuracy in table-to-text generation, achieving a 67\% increase in fidelity over baseline models.
In addition, we propose a new method for effectively encapsulating information across a wider range of table formats. Specifically, we introduce the use of Visual Language Models (VLMs) to capture information from tables represented as images, highlighting their advantages over traditional text-based representations. We also address inherent challenges in this approach by proposing a new image-based structure learning curriculum to capture the structural dynamics of tabular data and reduce structure-related fidelity errors. Our proposed image-based table-to-text generation model, PixT3, achieves state-of-the-art results, outperforming other baseline models in both automatic metrics and human evaluations of faithfulness. PixT3's strong performance on an out-of-domain dataset further demonstrates its adaptability to previously unseen tables.
Finally, we extend our image-based approach to additional TU tasks, such as Table Question Answering, Table Structure Recognition, Table Fact Verification, and Table Numerical Reasoning by creating a multimodal, instruction-based dataset that includes original table visualizations. We analyze state-of-the-art TU pre-training objectives to construct a dataset designed to instill foundational, generalizable knowledge of table interpretation into vision-based models. To this end, we introduce the largest multimodal, instruction-based TU dataset with original table visualizations from Wikipedia to date, comprising 2.5 million examples and 1.1 million unique table images across 11 different tasks. This dataset addresses a significant limitation of current multimodal TU datasets, which rely on lossy textual table representations, by incorporating original table visualizations instead.
This thesis contributes to the field of Table Understanding by introducing advancements that address the need for more reliable, scalable, and visually-aware methods for table-to-text generation. This work also proposes new research lines to further advance in this field. Our findings were published in a Journal Citation Reports (JCR) Q1-ranked journal and the main conference of ACL 2024."