DataVisT5: A Pre-trained Language Model for
Jointly Understanding Text and Data Visualization
Zhuoyue Wan
∗
, Yuanfeng Song
†
, Shuaimin Li
∗
, Chen Jason Zhang
∗
, Raymond Chi-Wing Wong
‡
∗
PolyU, Hong Kong, China
†
WeBank Co., Ltd, Shenzhen, China
‡
HKUST, Hong Kong, China
Abstract—Data visualization (DV) is the fundamental and
premise tool to improve the efficiency in conveying the insights
behind the big data, which has been widely accepted in existing
data-driven world. Task automation in DV, such as converting
natural language queries to visualizations (i.e., text-to-vis), gener-
ating explanations from visualizations (i.e., vis-to-text), answering
DV-related questions in free form (i.e. FeVisQA), and explicating
tabular data (i.e., table-to-text), is vital for advancing the field.
Despite their potential, the application of pre-trained language
models (PLMs) like T5 and BERT in DV has been limited by high
costs and challenges in handling cross-modal information, leading
to few studies on PLMs for DV. We introduce DataVisT5, a novel
PLM tailored for DV that enhances the T5 architecture through a
hybrid objective pre-training and multi-task fine-tuning strategy,
integrating text and DV datasets to effectively interpret cross-
modal semantics. Extensive evaluations on public datasets show
that DataVisT5 consistently outperforms current state-of-the-art
models and higher-parameter Large Language Models (LLMs) on
various DV-related tasks. We anticipate that DataVisT5 will not
only inspire further research on vertical PLMs but also expand
the range of applications for PLMs.
Index Terms—pre-trained language model, data visualization,
text-to-vis, vis-to-text, FeVisQA, table-to-text
I. INTRODUCTION
Data visualizations (DVs) utilize graphical representation to
convey insights to summarize the massive raw data, which is a
common practice in existing big data era [1], [2]. Popular data
analysis and database applications, such as Google Sheets
1
and Microsoft Power BI
2
, all support DV features. Many
institutions realize the value of DV and have applied it as their
daily fundamental tools. Thus the ability of creating suitable
DVs has become a necessary skill for data analysts, engineers,
and data scientists [3]–[5]. However, creating appropriate
DVs remains challenging, even for experts, since it requires
visual analysis expertise and familiarity with the domain data.
Furthermore, users must master the complex grammar of
Declarative Visualization Languages (DVLs), such as Vega-
Lite [6], ggplot2 [7], and Vega-Zero [8], to accurately define
DV specification in the visualization engine.
To lower the barriers to creating DVs and further unlock
the power of DV for the general public, researchers have
proposed a variety of DV-related tasks that have attracted sig-
nificant attention from both industrial and academic researchers.
Numerous studies on these topics have been presented in
leading conferences and journals such as VLDB [2], [9], [10],
ICDE [11], [12], SIGMOD [13]–[15], and TKDE [16], [17].
1
https://www.google.com/sheets/about/
2
https://powerbi.microsoft.com/
These tasks include text-to-vis (i.e., automatically generating
DVs from natural language questions) [8], [15], vis-to-text
[18] (i.e., automatically generating interpretations of complex
DVs for educational purposes), FeVisQA [12] (i.e., free-form
question answering over data visualization), and table-to-text
(i.e., describing a given table) [19].
A vivid example is given in Figure 1, which shows four
important tasks central to the domain knowledge of DV: text-to-
vis, vis-to-text, FeVisQA and table-to-text. The figure presents
a natural language (NL) question, “Give me a pie chart about
the proportion of the number of countries in the artist table.”
This example demonstrates the text-to-vis task’s capability
to interpret the NL question and transform it into a Vega-
Lite specification, resulting in a pie chart. The DV query,
introduced by [15], serves as a bridge in the text-to-vis process,
encapsulating visualization details and data operations with a
grammar akin to SQL. Translations between DV queries and
DVLs are seamless, with text-to-vis tasks primarily focusing
on converting NL questions into DV queries. Conversely,
the vis-to-text task aims to generate accessible and user-
friendly explanations of complex visualizations for individuals
without expertise in the field. The FeVisQA task addresses
user inquiries regarding DV by providing detailed answers
to common questions. We present four typical DV-related
questions, including understanding the semantics of a DV query,
resolving numerical issues within a chart, and evaluating the
compatibility of a DV query with a given database. Lastly,
the table-to-text task generates informative NL description of
tabular data, which are essential for visual analytics, thereby
reducing the perceptual effort needed for data interpretation.
Meanwhile, PLMs such as BERT [20] and T5 [21] have
received considerable attention in the realms of natural lan-
guage processing (NLP) and data mining, becoming widely
recognized for their efficacy. These PLMs greatly promote
the development of effective text-driven applications, since
they show dominating performance in understanding the
semantics in natural language. The operational paradigm for
these PLMs typically unfolds in two stages: initially, they
undergo unsupervised pre-training on expansive, open-domain
datasets (such as Wikipedia) to acquire foundational capabilities
in language representation and comprehension; subsequently,
they are fine-tuned on specialized corpora pertinent to targeted
downstream tasks, thereby enhancing task-specific performance.
Despite their success [22]–[24], there are still significant
challenges when it comes to the DV field : (i) Limited studies
have been conducted to explore the effectiveness of PLMs in
arXiv:2408.07401v2 [cs.CL] 27 Nov 2024
文档被以下合辑收录
评论