
(a) A Text Question &
Which course has the highest score for the student named timothy ward?
(b) Snippets of a Database ⇡
Course id course teacher
001 math jordy wu
... ... ...
Student id given_name last_name score course
1 timmy ward 92 math
... ... ... ... ...
(c) The Ground-truth SQL Query ( w.r.t. &
SELECT course FROM Student
WHERE given_name = timmy AND last_name = ward
ORDER BY score LIMIT 1;
(d) An SQL query (
0
translated by an SLM
SELECT course FROM Student
WHERE given_name = ’timothy ward’
ORDER BY score LIMIT 1;
(e) An SQL query (
00
translated by an LLM
SELECT Course.course, Student.score
FROM Student JOIN Course ON Student.id = Course.id
WHERE given_name = ’timothy’ AND last_name = ward
ORDER BY score LIMIT 1;
Figure 1: A sample NL2SQL translation. The incorrect por-
tions are highlighted in red.
solutions for zero-shot NL2SQL, and report the error distributions
of both categories in Figure 2.
SLM-based methods, such as RASAT [
30
], PICARD [
37
], and
RESDSQL [
18
], have shown promise in generating accurate SQL
queries on NL2SQL datasets through ne-tuning on numerous an-
notated NL2SQL samples. However, SLM-based methods may have
limited generalizability in natural language reasoning in our zero-
shot settings, which may dramatically degrade the performance of
the methods [
28
]. Consider
(
0
in Figure 1(d), given “student named
timothy ward” in question
&
, an SLM-based method only selects
one column
given_name
that is similar to the word “named”. This is
because the annotated NL2SQL samples used to train the method do
not dierentiate between
given_name
and
last_name
. Although
further ne-tuning on new databases or linguistic phenomena can
alleviate this problem, it requires a signicant amount of high-
quality training data, such as annotated NL2SQL samples. Acquir-
ing such data can be both time-consuming and labor-intensive,
making it a challenging task.
LLMs like PaLM [
7
] and GPT4 [
25
], which are often accessed
through API calls, have demonstrated remarkable reasoning abil-
ities across a range of domains and tasks. Compared with SLMs,
LLMs are capable of (complicated) language reasoning, including
understanding question semantics, resolving ambiguities, and per-
forming logical deductions, which are necessary for generating SQL
queries in new environments. However, LLMs may not achieve pre-
cise schema alignment: as shown in Figure 2, 42% of the errors
are caused by incorrect table/column selection. In particular, LLMs
tend to choose more columns and tables to cover the input content,
leading to incorrect execution results. Consider
(
00
in Figure 1(e):
the LLM identies incorrect columns (
score
) and tables (
course
)
in SELECT and FROM clauses.
Our Proposal. We propose a decomposition-base d approach that
breaks down the NL2SQL task into smaller sub-tasks, such that
each sub-task is more amenable to solve in our zero-shot setting.
Naturally, the thought process of writing an SQL query can be
broken down into four sub-tasks:
(1)
Identifying query structure that consists of SQL reserved
keywords, e.g.,SELECT, FROM, WHERE, and ORDER BY;
(2)
Aligning relevant schema elements with the question, i.e.,
columns and tables in SELECT and FROM clauses;
(3)
Completing the SQL query by deducing conditions in
WHERE
clause, columns in ORDER BY or GROUP BY clauses, etc.
(4)
Iteratively correcting the SQL query if there are syntax or
execution errors.
By analyzing the behavior of SLMs and LLMs over many dier-
ent data sets, our key observation is that, contrary to the intuitive
observation that LLMs should outperform SLMs, we nd that the
two together complement each other and perform better than ei-
ther alone when performing the above four steps. Specically, a
task-specic ne-tuned SLM can better understand the database
schema and the SQL syntax, which enables it to excel in structure
identication and schema alignment. In contrast, existing research
reveals that LLMs often face an “hallucination” issue, generating
text unrelated to the given instructions, due to their lack of do-
main knowledge [
21
]. However, a general LLM possesses strong
language reasoning capabilities [
2
], making it well-suited for SQL
completion that require complex condition reasoning. Moreover,
LLM also exhibits excellent interaction capabilities, allowing it to
perform error correction eciently through appropriate feedback.
Based on the insight, we propose ZNL2SQL, a framework that
uses SLMs and LLMs to solve dierent steps in our decomposition-
based approach, combining the best of the two worlds to address
the generalization challenge of zero-shot NL2SQL. ZNL2SQL
consists of two key steps, as illustrated in Figure 3.
•
Step 1: SQL sketch generation utilizes SLMs for SQL structure
identication and schema alignment to generate an SQL sketch,
with attributes to SELECT, tables included in FROM, and necessary
keywords (e.g.,ORDER BY) for composing the SQL query.
•
Step 2: SQL query completion and correction leverages
LLMs to complete the missing information in the SQL sketch and
generate complete SQL queries, e.g., aligning with data values from
the database. For example, although the question refers to “timothy”,
the database actually uses the abbreviation “timmy”. Thus, “timmy”
should be used in the SQL query.
Challenges and Solutions. The rst challenge is how to de-
velop an SQL sketch generation method that can generalize to
new databases or linguistic phenomena. We introduce an encoder-
decoder based SLM model that generates an SQL sketch. Specically,
to improve the generalizability of the model, we design a novel data-
base serialization strategy to make the encoder more adaptive to
new databases. Moreover, we propose a question-aware aligner to
obtain the most relevant SQL sketches by inferring the semantics
2751
评论