RASAT- Integrating Relational Structures into Pretrained Seq2Seq Model for Text-to-SQL.pdf

章芋文

268

15页

3次

2023-03-10

免费下载

RASAT: Integrating Relational Structures into Pretrained Seq2Seq Model

for Text-to-SQL

Jiexing Qi

, Jingyao Tang

, Ziwei He

, Xiangpeng Wan

, Yu Cheng

Chenghu Zhou

, Xinbing Wang

, Quanshi Zhang

, Zhouhan Lin

1∗

Shanghai Jiao Tong University, Shanghai, China

NetMind.AI and ProtagoLabs, Virginia, USA

Microsoft Research, Redmond, Washington, USA

IGSNRR, Chinese Academy of Sciences, Beijing, China

{qi_jiexing, monstar, ziwei.he, zqs1022, xwang8}@sjtu.edu.cn

lin.zhouhan@gmail.com

Abstract

Relational structures such as schema linking

and schema encoding have been validated as a

key component to qualitatively translating nat-

ural language into SQL queries. However, in-

troducing these structural relations comes with

prices: they often result in a specialized model

structure, which largely prohibits using large

pretrained models in text-to-SQL. To address

this problem, we propose RASAT: a Trans-

former seq2seq architecture augmented with

relation-aware self-attention that could lever-

age a variety of relational structures while in-

heriting the pretrained parameters from the T5

model effectively. Our model can incorpo-

rate almost all types of existing relations in

the literature, and in addition, we propose in-

troducing co-reference relations for the multi-

turn scenario. Experimental results on three

widely used text-to-SQL datasets, covering

both single-turn and multi-turn scenarios, have

shown that RASAT could achieve state-of-the-

art results across all three benchmarks (75.5%

EX on Spider, 52.6% IEX on SParC, and

37.4% IEX on CoSQL).

1 Introduction

Text-to-SQL is the task that aims at translating

natural language questions into SQL queries. Since

it could signiﬁcantly break down barriers for non-

expert users to interact with databases, it is among

the most important semantic parsing tasks that are

of practical importance (Kamath and Das, 2018;

Deng et al., 2021).

Various types of relations have been introduced

for this task since Zhong et al. (2017) collected

the ﬁrst large-scale text-to-SQL dataset, which has

resulted in signiﬁcant boosts in the performance

∗

Zhouhan Lin is the corresponding author.

Our implementation is available at

https://github.

com/LUMIA-group/rasat.

through recent years. For example, Bogin et al.

(2019b) introduced schema encoding to represent

the schema structure of the database, and the result-

ing augmented LSTM encoder-decoder architec-

ture was able to generalize better towards unseen

database schema. Lin et al. (2020a) introduced rela-

tions between the entity mentioned in the question

and the matched entries in the database to utilize

database content effectively. Their BERT-based

encoder is followed by an LSTM-based pointer net-

work as the decoder, which generalizes better be-

tween natural language variations and captures cor-

responding schema columns more precisely. RAT-

SQL (Wang et al., 2020a) introduced schema link-

ing, which aligns mentions of entity names in the

question to the corresponding schema columns or

tables. Their augmented Transformer encoder is

coupled with a speciﬁc tree-decoder. SADGA (Cai

et al., 2021) introduced the dependency structure of

the natural language question and designed a graph

neural network-based encoder with a tree-decoder.

On the other hand, a tree-decoder that can gener-

ate grammatically correct SQL queries is usually

needed to better decode the encoder output, among

which Yin and Neubig (2017) is one of the most

widely used.

Although integrating various relational struc-

tures as well as using a tree-decoder have been

shown to be vital to generating qualitative SQL

queries and generalizing better towards unseen

database schema, the dev of various speciﬁcally

designed model architectures signiﬁcantly deviate

from the general sequential form, which has made

it hard if one considers leveraging large pre-trained

models for this task. Existing methods either use

BERT output as the input embedding of the speciﬁ-

cally designed model (Cao et al., 2021; Choi et al.,

2021; Wang et al., 2020a; Guo et al., 2019), or

stack a speciﬁc decoder on top of BERT (Lin et al.,

arXiv:2205.06983v2 [cs.CL] 9 Oct 2022

2020a).

In another thread, pretrained seq2seq models just

have unveiled their powerful potential for this task.

Recent attempts by Shaw et al. (2021) show that

directly ﬁne-tuning a T5 model (Raffel et al., 2020)

on this task without presenting any relational struc-

tures could achieve satisfying results. Moreover,

PICARD (Scholak et al., 2021) presents a way to

prune invalid beam search results during inference

time, thus drastically improving the grammatical

correctness of the SQL queries generated by the

autoregressive decoder that comes with T5.

In this work, different from the more common ap-

proach of ﬁne-tuning the original pretrained model

or using prompt tuning, we propose to augment

the self-attention modules in the encoder and in-

troduce new parameters to the model while still

being able to leverage the pre-trained weights. We

call the proposed model RASAT

. Our model can

incorporate almost all existing types of relations in

the literature, including schema encoding, schema

linking, syntactic dependency of the question, etc.,

into a uniﬁed relation representation. In addition

to that, we also introduce coreference relations to

our model for multi-turn text-to-SQL tasks. Exper-

imental results show that RASAT could effectively

leverage the advantage of T5. It achieves the state-

of-art performance in question execution accuracy

(EX/IEX) on both multi-turn (SParC and CoSQL)

and single-turn (Spider) text-to-SQL benchmarks.

On SParC, RASAT surpasses all previous methods

in interaction execution accuracy (IEX) and im-

proves state-of-the-art performance from 21.6% to

52.6%, 31% absolute improvements. On CoSQL,

we improve state-of-the-art IEX performance from

8.4% to 37.4%, achieving 29% absolute improve-

ments. Moreover, on Spider, we improve state-of-

the-art execution accuracy from 75.1% to 75.5%,

achieving 0.4% absolute improvements.

2 Related Work

Early works usually exploit a sketch-based slot-

ﬁlling method that uses different modules to pre-

dict the corresponding part of SQL. These methods

decompose the SQL generation task into several

independent sketches and use different classiﬁers

to predict corresponding part, such as SQLNet (Xu

et al., 2017), SQLOVA (Hwang et al., 2019), X-

SQL (He et al., 2019), RYANSQL (Choi et al.,

2021), et.al,. However, most of these methods only

RASAT: Relation-Aware Self-Attention-augmented T5

handle simple queries while failing to generate cor-

rect SQL in a complex setting such as on Spider.

Faced with the multi-table and complex SQL

setting, using graph structures to encode various

complex relationships is a major trend in the text-to-

SQL task. For example, Global-GNN (Bogin et al.,

2019a) represents the complex database schema as

a graph, RAT-SQL (Wang et al., 2020a) introduces

schema encoding and linking and assigns every two

input items a relation, LGESQL (Cao et al., 2021)

further distinguishes local and non-local relations

by exploiting a line graph enhanced hidden module,

SADGA (Cai et al., 2021) uses contextual structure

and dependency structure to encode question-graph

while database schema relations are used in schema

graph, S

SQL (Hui et al., 2022) adds syntactic de-

pendency information in relational graph attention

network (RGAT) (Wang et al., 2020b).

For the conversational context-dependent text-

to-SQL task that includes multiple turns of interac-

tions, such as SParC and CoSQL, the key challenge

is how to take advantage of historical interaction

context. Edit-SQL (Zhang et al., 2019) edits the

last turn’s predicted SQL to generate the newly pre-

dicted SQL at the token level. IGSQL (Cai and

Wan, 2020) uses cross-turn and intra-turn schema

graph layers to model database schema items in a

conversational scenario. Tree-SQL (Wang et al.,

2021b) uses a tree-structured intermediate repre-

sentation and assigns a probability to reuse sub-

tree of historical Tree-SQLs. IST-SQL (Wang

et al., 2021a) proposes an interaction state tracking

method to predict the SQL query. RAT-SQL-TC

(Li et al., 2021)adds two auxiliary training tasks

to explicitly model the semantic changes in both

turn grain and conversation grain. R

SQL (Hui

et al., 2021) and HIE-SQL (Zheng et al., 2022) in-

troduce a dynamic schema-linking graph by adding

the current utterance, interaction history utterances,

database schema, and the last predicted SQL query.

Recently, Shaw et al. (2021) showed that ﬁne-

tuning a pre-trained T5-3B model could yield re-

sults competitive to the then-state-of-the-art. Based

on this discovery, Scholak et al. (2021) proposed to

constrain the autoregressive decoder through incre-

mental parsing during inference time, effectively

ﬁltering out grammatically incorrect sequences on

the ﬂy during beam search, which signiﬁcantly im-

proved the qualities of the generated SQL.

of 15

免费下载

nl2sql paper

关注

评论