- Main
Bridging Language Models and Structured Knowledge: Extraction, Representation, and Reasoning
- Wang, Zilong
- Advisor(s): Shang, Jingbo
Abstract
Structured knowledge—diversely embedded in document images, web pages, and tabular data—presents distinct challenges for language models. Unlike free-form text, structured data encodes meaning through spatial arrangements, hierarchical structures, and relational dependencies, requiring models to extract, interpret, and reason beyond linguistic signals. This dissertation advances the integration of structured knowledge with language models, introducing novel methodologies for document understanding, web mining, and table-based reasoning.
We first introduce VRDU, a benchmark for Visually-Rich Document Understanding, designed to evaluate how models extract structured information from business documents with complex layouts and hierarchical entities. By identifying key challenges in template generalization and few-shot adaptation, VRDU provides a more realistic assessment of multimodal language models.
Next, we present LASER, a label-aware sequence-to-sequence framework for few-shot entity recognition in document images. By embedding label semantics and spatial relationships directly into the decoding process, LASER enables models to recognize entities with minimal supervision, outperforming traditional sequence-labeling approaches in low-resource scenarios.
For web mining, we propose ReXMiner, a zero-shot relation extraction framework that captures structural dependencies within semi-structured web pages. By encoding relative XML paths in the Document Object Model (DOM) tree, ReXMiner improves the generalization of relation extraction across diverse and unseen web templates, demonstrating that structural signals enhance information retrieval from the web.
Finally, we introduce Chain-of-Table, a framework for table-based reasoning that evolves tabular data iteratively. Unlike previous approaches that treat tables as static inputs, Chain-of-Table dynamically applies structured transformations, enabling models to reason step-by-step over tabular data. This approach achieves state-of-the-art performance across multiple benchmarks in table-based question answering and fact verification.
Together, these contributions redefine how language models interact with structured knowledge, bridging the gap between unstructured text processing and structured data reasoning. By integrating multimodal signals, relational structures, and iterative reasoning mechanisms, this dissertation lays the foundation for more robust and generalizable models in structural knowledge understanding.
Main Content
Enter the password to open this PDF file:
-
-
-
-
-
-
-
-
-
-
-
-
-
-