
2020a).
In another thread, pretrained seq2seq models just
have unveiled their powerful potential for this task.
Recent attempts by Shaw et al. (2021) show that
directly fine-tuning a T5 model (Raffel et al., 2020)
on this task without presenting any relational struc-
tures could achieve satisfying results. Moreover,
PICARD (Scholak et al., 2021) presents a way to
prune invalid beam search results during inference
time, thus drastically improving the grammatical
correctness of the SQL queries generated by the
autoregressive decoder that comes with T5.
In this work, different from the more common ap-
proach of fine-tuning the original pretrained model
or using prompt tuning, we propose to augment
the self-attention modules in the encoder and in-
troduce new parameters to the model while still
being able to leverage the pre-trained weights. We
call the proposed model RASAT
2
. Our model can
incorporate almost all existing types of relations in
the literature, including schema encoding, schema
linking, syntactic dependency of the question, etc.,
into a unified relation representation. In addition
to that, we also introduce coreference relations to
our model for multi-turn text-to-SQL tasks. Exper-
imental results show that RASAT could effectively
leverage the advantage of T5. It achieves the state-
of-art performance in question execution accuracy
(EX/IEX) on both multi-turn (SParC and CoSQL)
and single-turn (Spider) text-to-SQL benchmarks.
On SParC, RASAT surpasses all previous methods
in interaction execution accuracy (IEX) and im-
proves state-of-the-art performance from 21.6% to
52.6%, 31% absolute improvements. On CoSQL,
we improve state-of-the-art IEX performance from
8.4% to 37.4%, achieving 29% absolute improve-
ments. Moreover, on Spider, we improve state-of-
the-art execution accuracy from 75.1% to 75.5%,
achieving 0.4% absolute improvements.
2 Related Work
Early works usually exploit a sketch-based slot-
filling method that uses different modules to pre-
dict the corresponding part of SQL. These methods
decompose the SQL generation task into several
independent sketches and use different classifiers
to predict corresponding part, such as SQLNet (Xu
et al., 2017), SQLOVA (Hwang et al., 2019), X-
SQL (He et al., 2019), RYANSQL (Choi et al.,
2021), et.al,. However, most of these methods only
2
RASAT: Relation-Aware Self-Attention-augmented T5
handle simple queries while failing to generate cor-
rect SQL in a complex setting such as on Spider.
Faced with the multi-table and complex SQL
setting, using graph structures to encode various
complex relationships is a major trend in the text-to-
SQL task. For example, Global-GNN (Bogin et al.,
2019a) represents the complex database schema as
a graph, RAT-SQL (Wang et al., 2020a) introduces
schema encoding and linking and assigns every two
input items a relation, LGESQL (Cao et al., 2021)
further distinguishes local and non-local relations
by exploiting a line graph enhanced hidden module,
SADGA (Cai et al., 2021) uses contextual structure
and dependency structure to encode question-graph
while database schema relations are used in schema
graph, S
2
SQL (Hui et al., 2022) adds syntactic de-
pendency information in relational graph attention
network (RGAT) (Wang et al., 2020b).
For the conversational context-dependent text-
to-SQL task that includes multiple turns of interac-
tions, such as SParC and CoSQL, the key challenge
is how to take advantage of historical interaction
context. Edit-SQL (Zhang et al., 2019) edits the
last turn’s predicted SQL to generate the newly pre-
dicted SQL at the token level. IGSQL (Cai and
Wan, 2020) uses cross-turn and intra-turn schema
graph layers to model database schema items in a
conversational scenario. Tree-SQL (Wang et al.,
2021b) uses a tree-structured intermediate repre-
sentation and assigns a probability to reuse sub-
tree of historical Tree-SQLs. IST-SQL (Wang
et al., 2021a) proposes an interaction state tracking
method to predict the SQL query. RAT-SQL-TC
(Li et al., 2021)adds two auxiliary training tasks
to explicitly model the semantic changes in both
turn grain and conversation grain. R
2
SQL (Hui
et al., 2021) and HIE-SQL (Zheng et al., 2022) in-
troduce a dynamic schema-linking graph by adding
the current utterance, interaction history utterances,
database schema, and the last predicted SQL query.
Recently, Shaw et al. (2021) showed that fine-
tuning a pre-trained T5-3B model could yield re-
sults competitive to the then-state-of-the-art. Based
on this discovery, Scholak et al. (2021) proposed to
constrain the autoregressive decoder through incre-
mental parsing during inference time, effectively
filtering out grammatically incorrect sequences on
the fly during beam search, which significantly im-
proved the qualities of the generated SQL.
评论