
quired schema items are “petid”, “pets”, and “pet age”.
Since Text-to-SQL needs to perform not only the schema
linking which aligns the mentioned entities in the question
to schema items in the database schema, but also the skele-
ton parsing which parses out the skeleton of the SQL query,
the major challenges are caused by a large number of re-
quired schema items and the complex composition of opera-
tors such as GROUP BY, HAVING, and JOIN ON involved
in a SQL query. The intertwining of the schema linking and
the skeleton parsing complicates learning even more.
To investigate whether the Text-to-SQL task could be-
come easier if the skeleton parsing and the schema link-
ing are decoupled, we conduct a preliminary experiment
on Spider’s dev set. Concretely, we fine-tune a T5-base
model to generate the pure skeletons based on the ques-
tions (i.e., skeleton parsing task). We observe that the exact
match accuracy on such a task achieves about 80% using the
fine-tuned T5-base. However, even the T5-3B model only
achieves about 70% accuracy (Shaw et al. 2021; Scholak,
Schucher, and Bahdanau 2021). This pre-experiment indi-
cates that decoupling such two objectives could be a poten-
tial way of reducing the difficulty of Text-to-SQL.
To realize the above decoupling idea, we propose
a Ranking-enhanced Encoding plus a Skeleton-aware
Decoding framework for Text-to-SQL (RESDSQL). The
former injects a few but most relevant schema items into
the seq2seq model’s encoder instead of all schema items. In
other words, the schema linking is conducted beforehand to
filter out most of the irrelevant schema items in the database
schema, which can alleviate the difficulty of the schema
linking for the seq2seq model. For such purpose, we train an
additional cross-encoder to classify the tables and columns
simultaneously based on the input question, and then rank
and filter them according to the classification probabilities
to form a ranked schema sequence.
The latter does not add any new modules but simply al-
lows the seq2seq model’s decoder to first generate the SQL
skeleton, and then the actual SQL query. Since skeleton
parsing is much easier than whole SQL parsing, the first gen-
erated skeleton could implicitly guide the subsequent SQL
parsing via the masked self-attention mechanism in the de-
coder.
Contributions. (1) We investigate a potential way of de-
coupling the skeleton parsing and the schema linking to
reduce the difficulty of Text-to-SQL. Specifically, we pro-
pose a ranking-enhanced encoder to alleviate the effort of
the schema linking and a skeleton-aware decoder to implic-
itly guide the SQL parsing by the skeleton. (2) We conduct
extensive evaluations and analysis and show that our frame-
work not only achieves the new SOTA performance on Spi-
der but also exhibits strong robustness.
2 Problem Definition
Database Schema. A relational database is denoted as
D. The database schema S of D includes (1) a set of
N tables T = {t
1
, t
2
, · · · , t
N
}, (2) a set of columns
C = {c
1
1
, · · · , c
1
n
1
, c
2
1
, · · · , c
2
n
2
, · · · , c
N
1
, · · · , c
N
n
N
} associ-
ated with the tables, where n
i
is the number of columns in
the i-th table, (3) and a set of foreign key relations R =
{(c
i
k
, c
j
h
)|c
i
k
, c
j
h
∈ C}, where each (c
i
k
, c
j
h
) denotes a for-
eign key relation between column c
i
k
and column c
j
h
. We use
M =
P
N
i=1
n
i
to denote the total number of columns in D.
Original Name and Semantic Name. We use “schema
items” to uniformly refer to tables and columns in the
database. Each schema item can be represented by an origi-
nal name and a semantic name. The semantic name can in-
dicate the semantics of the schema item more precisely. As
shown in Figure 1, it is obvious that the semantic names “air-
line id” and “destination airport” are more clear than their
original names “uid” and “destairport”. Sometimes the se-
mantic name is the same as the original name.
Text-to-SQL Task. Formally, given a question q in natural
language and a database D with its schema S, the Text-to-
SQL task aims to translate q into a SQL query l that can be
executed on D to answer the question q.
3 Methodology
In this section, we first give an overview of the proposed
RESDSQL framework and then delve into its design details.
3.1 Model Overview
Following Shaw et al. (2021); Scholak, Schucher, and Bah-
danau (2021), we treat Text-to-SQL as a translation task,
which can be solved by an encoder-decoder transformer
model. Facing the above problems, we extend the existing
seq2seq Text-to-SQL methods by injecting the most relevant
schema items in the input sequence and the SQL skeleton
in the output sequence, which results in a ranking-enhanced
encoder and a skeleton-aware decoder. We provide the high-
level overview of the proposed RESDSQL model in Fig-
ure 2. The encoder of the seq2seq model receives the ranked
schema sequence, such that the schema linking effort could
be alleviated during SQL parsing. To obtain such a ranked
schema sequence, an additional cross-encoder is proposed
to classify the schema items according to the given ques-
tion, and then we rank and filter them based on the classifi-
cation probabilities. The decoder of the seq2seq model first
parses out the SQL skeleton and then the actual SQL query,
such that the SQL generation can be implicitly constrained
by the previously parsed skeleton. By doing this, to a certain
extent, the schema linking and the skeleton parsing are not
intertwined but decoupled.
3.2 Ranking-enhanced Encoder
Instead of injecting all schema items, we only consider the
most relevant schema items in the input of the encoder. For
this purpose, we devise a cross-encoder to classify the tables
and columns simultaneously and then rank them based on
their probabilities. Based on the ranking order, on one hand,
we filter out the irrelevant schema items. On the other hand,
we use the ranked schema sequence instead of the unordered
schema sequence, so that the seq2seq model could capture
potential position information for schema linking.
As for the input of the cross-encoder, we flatten the
schema items into a schema sequence in their default or-
der and concatenate it with the question to form an input
sequence: X = q | t
1
: c
1
1
, · · · , c
1
n
1
| · · · | t
N
: c
N
1
, · · · , c
N
n
N
,
评论