IEEE VIS 2021_Natural Language to Visualization by Neural Machine Translation_华为.pdf

迹部景吾

107

10页

1次

2024-09-23

免费下载

Natural Language to Visualization by Neural Machine Translation

Yuyu Luo, Nan Tang, Guoliang Li*, Jiawei Tang, Chengliang Chai, Xuedi Qin

2021-03-08

California

New York

conﬁrmed

deaths

conﬁrmed

deaths

3599250

54220

1694651

48335

date states

cases

number

Create a bar chart showing the top 5

states with the most conﬁrmed cases

until 2021-03-08

Show me the trend of conﬁrmed,

died, and recovered cases in Utah

N1, D

N2, C, D

ncNet

A Transformer-based

sequence-to-sequence

neural translation model

Convert to

Vega-Lite

Encoder Decoder

Visualization by NL query

Fig. 1: We present

ncNet

, a Transformer-based sequence-to-sequence model that translates natural language queries to visualizations.

It works in two modes. (A) It takes a natural language query

and a dataset

as input, translates them

,D)

into a visualization

rendered in Vega-Lite. (B) Besides a natural language query

and a dataset

, the user can optionally select a chart template

;

ncNet will translate the given input (N

,C,D) into a target visualization.

Abstract

— Supporting the translation from natural language (

) query to visualization (

NL2VIS

) can simplify the creation of data

visualizations because if successful, anyone can generate visualizations by their natural language from the tabular data. The state-

of-the-art

NL2VIS

approaches (e.g.,

NL4DV

and FlowSense) are based on semantic parsers and heuristic algorithms, which are not

end-to-end and are not designed for supporting (possibly) complex data transformations. Deep neural network powered neural machine

translation models have made great strides in many machine translation tasks, which suggests that they might be viable for

NL2VIS

well. In this paper, we present

ncNet

, a Transformer-based sequence-to-sequence model for supporting

NL2VIS

, with several novel

visualization-aware optimizations, including using attention-forcing to optimize the learning process, and visualization-aware rendering

to produce better visualization results. To enhance the capability of machine to comprehend natural language queries,

ncNet

is also

designed to take an optional chart template (e.g., a pie chart or a scatter plot) as an additional input, where the chart template will be

served as a constraint to limit what could be visualized. We conducted both quantitative evaluation and user study, showing that

ncNet

achieves good accuracy in the nvBench benchmark and is easy-to-use.

Index Terms—Natural language interface; data visualization; neural machine translation; chart template;

1 INTRODUCTION

Natural language interface is a promising interaction paradigm for sim-

plifying the creation of visualizations [32, 43, 52]. If successful, even

novices can generate visualizations simply like a Google search. Not

surprisingly, both commercial vendors (e.g., Tableau’s Ask Data [46],

Power BI [2], ThoughtSpot [3], and Amazon’s QuickSight [1]) and aca-

demic researchers [7,13,20,33,34,40,42,45,49,50,57] have investigated

to support the translation from NL queries to visualizations (NL2VIS).

• Yuyu Luo, Guoliang Li, Chengliang Chai, Xuedi Qin are with the

Department of Computer Science, Tsinghua University, China. Email:

{luoyy18@mails., liguoliang@, ccl@, qxd17@mails.}tsinghua.edu.cn

• Nan Tang is with QCRI, Hamad Bin Khalifa University, Qatar. Email:

ntang@hbku.edu.qa

• Jiawei Tang is with American School of Doha, Doha, Qatar. Email:

23jtang@asd.edu.qa

• *Guoliang Li is the corresponding author.

Manuscript received xx xxx. 201x; accepted xx xxx. 201x. Date of Publication

xx xxx. 201x; date of current version xx xxx. 201x. For information on

obtaining reprints of this article, please send e-mail to: reprints@ieee.org.

Digital Object Identiﬁer: xx.xxxx/TVCG.201x.xxxxxxx

NL2VIS

needs both natural language understanding that uses ma-

chines to comprehend natural language queries, and translation algo-

rithms to generate targeted visualization using a visualization language.

Natural language understanding is considered an AI-hard problem [56],

with many intrinsic difﬁculties such as ambiguity and underspeciﬁca-

tion. Many tools from the

NLP

community, especially based on statisti-

cal phrase-based translation [26] and neural machine translation [4, 10],

have been used to tackle NL2VIS.

The state-of-the-art

NL2VIS

methods (for example,

NL4DV

[40] and

FlowSense [57]) are statistical phrase-based translation, which treats

natural language understanding and machine translation as two steps.

They ﬁrst employ

NLP

toolkits (for example, NLTK [5], Stanford

CoreNLP [37], and NER [12]) to parse an

query and produce a

variety of linguistic annotations (for example, parts of speech, named

entities, etc), based on which they then devise algorithms to generate

target visualizations. They are good choices when there are not many

training datasets to train deep learning models.

We present

ncNet

, an end-to-end solution using a Transformer-

based sequence-to-sequence (

seq2seq

) model, which translates an

query to a visualization. It adopts self-attention to generate a rich repre-

The code is available at https://github.com/Thanksyy/ncNet

sentation (high dimensional vectors) of the input,

ncNet

enables smart

visualization inference (e.g., guessing the missing column, selecting a

chart type, etc).

Besides making smarter inferences, a system can obtain more infor-

mation (or “hint”) from the user, by either obtaining a one-shot hint

from the user or iteratively requiring more information (a.k.a. conver-

sational systems) [6]. The hint can be of various formats, such as

queries, tables, chart templates, with one main criterion to be easy-to-

use. We propose to use chart templates as additional hints, where a

user can specify the output to be a pie chart or a scatter plot with a

simple click. In practice, chart templates have been widely used in all

commercial products, including Tableau, Excel, Google Sheets, and

so on. Due to the ﬂexibility of the

seq2seq

model, we just treat the

selected chart template

as another sequence, together with the

query N and the dataset D as the input X.

Contributions.

In this work, we make several contributions, including:

•

proposing

ncNet

, a Transformer-based [53]

seq2seq

model for

supporting NL2VIS;

•

presenting a novel visualization-grammar, namely Vega-Zero,

with the main purpose to simplify the

NL2VIS

translation using

neural machine translation techniques. Moreover, transforming it

to other visualization languages are straightforward;

•

enhancing

ncNet

by allowing the user to select a chart template,

which will be used to improve the translation accuracy;

•

devising two optimization techniques:

attention forcing

for in-

corporating pre-deﬁned domain knowledge and

visualization-

aware translation for better ﬁnal visualization generation; and

•

demonstrating that

ncNet

can well support

NL2VIS

with several

use cases, as well as conducting a quantitative study.

2 RELATED WORK

2.1 Natural Language Interface for Data Visualization

The idea of using

as a way to create visualizations was explored

around two decades ago [6], where the system interacts with the user

through dialogs. During each interaction, the system tries to clarify

a small part of the user speciﬁcation. For example, the system asks:

“At what organizational level?”, the user answers: “At the department

level”, and so on. At that time, the system can only map simple user

inputs to pre-deﬁned commands.

Afterwards, semantic parsers (e.g., NLTK [5], NER [12], and Stan-

ford CoreNLP [37]), which can automatically add additional layers of

semantic information (e.g., parts of speech, named entities, coreference,

etc) to

, have been widely adopted in the research of

NL2VIS

. Recent

studies, such as

NL4DV

[40] and FlowSense [57], all employ semantic

parsers, which are considered as the state of the art.

2.2 Natural Language Processing with Deep Learning

Closer to this work is

ADVISor

[27] that uses BERT [10] to generate

the embeddings of both the

query and the table headers, which

are then used by an “Aggregation” network to select an aggregation

type and a “Data” network to decide used attribute and predicates –

these

SQL

fragments will determine an

SQL

query. Then, a rule-based

“Visualization” module will decide which visualization to generate.

Compared with

ADVISor

ncNet

supports more complex data transfor-

mation types such as relational

join

GroupBY

OrderBY

predicate

SQL

WHERE clauses. Another difference is that the neural networks

ADVISor

are trained using (

SQL

) pairs, while

ncNet

is trained

using (NL, VIS) pairs and outputs Vega-Zero queries.

In fact, the main obstacle of using deep learning for

NL2VIS

is not

the shortage of deep learning models or techniques. Instead, it is the

lack of benchmark datasets that these models can be trained on, because

deep learning models are known to be data hungry [14]. Fortunately,

a recent work releases the ﬁrst public benchmark for

NL2VIS

, namely

nvBench

[35], which can be used to try deep learning for

NL2VIS

nvBench

consists of 25,750

queries and the corresponding visu-

alizations, i.e., 25,750 (

VIS

) pairs, over

∼

780 tables from 105

domains (e.g., sports, customers). We will discuss more details of

English2French

natural language to

visualization is important

le langage naturel à la

visualisation est important

a learned approach is

promising

une approche apprise est

prometteuse

NL2VIS

Create a bar chart showing

the top 5 states with …

mark bar encoding x states y

aggregate sum number …

Show me the trend of

conﬁrmed, died …

mark line encoding x date y

aggregate none number …

Encoder Decoder

Fig. 2: Sample

seq2seq

tasks. (A) Translation from English to French.

(B) Translation from NL queries to visualization speciﬁcations.

nvBench

in Section 6.2. Another recent work [48] collected 893

queries over three datasets. However, its number is not sufﬁcient to

train typical deep learning models.

An alternative solution is

NL2SQL

+ automatic data visualization,

which is a good choice when the entire pipeline is one-shot. However,

in practice, it is always iterative. That is, if the target visualization

needs to be reﬁned, the user needs to verify/reﬁne both

NL2SQL

and

check the result of automatic data visualization. Note that, checking

whether a table is good enough is hard, even for a small table with

hundreds/thousands of tuples. In this case, using end-to-end

NL2VIS

has an advantage that the user only sticks to one task, which is more

user-friendly.

3 DESIGN REQUIREMENTS

There are three main goals when devising solutions for

NL2VIS

, along

the same line of other NL2VIS tools e.g., NL4DV [40].

(1) Easy-to-use.

We want to allow novices to create visualizations sim-

ply like a Google-search. That is, even users without data visualization

background can easily generate visualizations.

(2) End-to-end.

Traditional semantic parser based translation systems

typically consist of many small sub-components that are tuned sep-

arately. In contrast, we want to deliver a complete

NL2VIS

solution

without the need of any additional steps. Besides the well-known ben-

eﬁts of end-to-end solutions such as increased efﬁciency, cost cutting

and ease of learning, one particular beneﬁt for a

seq2seq

model is

that it is easy to maintain and upgrade. For example, upgrading a

seq2seq

model from using long short-term memory (LSTM) [19] to

Transformer [53] only requires to change a few lines of code.

(3) Language-agnostic.

The main beneﬁt to be language-agnostic is

that we just need to train one

seq2seq

model for

NL2VIS

, but can

support multiple target visualization languages. The practical need

for this is evident, because the users might use various visualization

languages constrained by different applications, such as Vega-Lite, D3,

ggplot2, and so forth.

4 BACKGROUND AND PROBLEM FORMULATION

4.1 Sequence-to-Sequence Models

A sequence-to-sequence (

seq2seq

) model [51] consists of two parts,

an encoder and a decoder, where each part can be implemented by

different neural networks. The task of an encoder is to understand

the input sequence, and generate a smaller representation

(i.e., a

high-dimensional vector) to represent the input. The task of a decoder

is to generate a sequence of output by taking

as input. The net-

work needs to be trained with a lot of training data, in the form of

(Input sequence, Output sequence) pairs.

Due to the ﬂexibility of

seq2seq

models that allow the input and out-

put to have different formats, they have a wide spectrum of applications

including language translation [17], image captioning [38], conversa-

tional models and text summarization [39], and NL to SQL [23].

Let’s ﬁrst walk through a typical translation task – lan-

guage translation from English to French (Figure 2(A)). The

of 10

免费下载

关注

评论