
Graphix-T5: Mixing Pre-Trained Transformers with
Graph-Aware Layers for Text-to-SQL Parsing
Jinyang Li
1,2
*
, Binyuan Hui
2
, Reynold Cheng
1,5†
, Bowen Qin
3
, Chenhao Ma
4
, Nan Huo
1
,
Fei Huang
2
, Wenyu Du
1
, Luo Si
2
, Yongbin Li
2 †
1
The University of Hong Kong
2
DAMO Academy, Alibaba Group
3
Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences
4
The Chinese University of Hong Kong (Shenzhen)
5
Guangdong–Hong Kong-Macau Joint Laboratory
{jl0725,huonan,wenyudu}@connect.hku.hk, ckcheng@cs.hku.hk,
bw.qin@siat.ac.cn, machenhao@cuhk.edu.cn,
{binyuan.hby,f.huang,luo.si,shuide.lyb}@alibaba-inc.com
Abstract
The task of text-to-SQL parsing, which aims at convert-
ing natural language questions into executable SQL queries,
has garnered increasing attention in recent years, as it can
assist end users in efficiently extracting vital information
from databases without the need for technical background.
One of the major challenges in text-to-SQL parsing is do-
main generalization, i.e. , how to generalize well to un-
seen databases. Recently, the pretrained text-to-text trans-
former model, namely T5, though not specialized for text-
to-SQL parsing, has achieved state-of-the-art performance
on standard benchmarks targeting domain generalization. In
this work, we explore ways to further augment the pre-
trained T5 model with specialized components for text-to-
SQL parsing. Such components are expected to introduce
structural inductive bias into text-to-SQL parsers thus im-
proving model’s capacity on (potentially multi-hop) reason-
ing, which is critical for generating structure-rich SQLs. To
this end, we propose a new architecture GRAPHIX-T5, a
mixed model with the standard pre-trained transformer model
augmented by some specially-designed graph-aware layers.
Extensive experiments and analysis demonstrate the effec-
tiveness of GRAPHIX-T5 across four text-to-SQL bench-
marks: SPIDER, SYN, REALISTIC and DK. GRAPHIX-T5
surpass all other T5-based parsers with a significant mar-
gin, achieving new state-of-the-art performance. Notably,
GRAPHIX-T5-large reach performance superior to the orig-
inal T5-large by 5.7% on exact match (EM) accuracy and
6.6% on execution accuracy (EX). This even outperforms the
T5-3B by 1.2% on EM and 1.5% on EX.
1 Introduction
Relational database, serving as an important resource for
users to make decision in many fields, such as health care,
sports, and entertainment, has emerged frequently because
of the big data era. It is efficient for data users to access the
information from databases via structured query language,
e.g., SQL. Despite its effectiveness and efficiency, the com-
plex nature of SQLs leads to extremely expensive learning
*
Work done during an intern at Alibaba DAMO Academy.
†
Corresponding authors are Reynold Cheng and Yongbin Li.
Copyright © 2023, Association for the Advancement of Artificial
Intelligence (www.aaai.org). All rights reserved.
female
student
Student
Sex
MOD
EM
HAS
Question
Column
Table
Nature Language Question:
Find the number of dog pets that are raised by female students
!
Student
Database:
StuID
Sex
Age
Pets
PetID
PetType
Pet_age
Has_Pet
StuID
SQL:
PetID
SELECT count(*) FROM student AS T1 JOIN has_pet AS T2 ON
T1.stuid = T2.stuid JOIN pets AS T3 ON T2.petid = T3.petid
WHERE T1.sex = 'F' AND T3.pettype = 'dog'
Desired Linking
Figure 1: This is an an illustration of cross-domain text-to-
SQL challenge. The link between the target column sex
and the token female is highly desired but extremely chal-
lenging for the model to capture, especially when domain-
specific data or effective rules is absent. However, this
dilemma can be mitigated by a multi-hop reasoning path
(female
MOD
−→ student
EM
−→ Student
HAS
−→ Sex).
efforts for non-technical users. Therefore, text-to-SQL (Cai
et al. 2018; Zelle and Mooney 1996; Xu, Liu, and Song
2017; Yu et al. 2018a; Yaghmazadeh et al. 2017), aiming to
convert natural language instructions or questions into SQL
queries, has attracted remarkable attention.
In this work, we explore the challenging cross-domain
setting where a text-to-SQL parser needs to achieve domain
generalization, i.e. , the ability to generalize to domains that
are unseen during training. Achieving this goal would, in
principle, contribute to a universal natural language interface
that allows users to interact with data in arbitrary domains.
The major challenge towards domain generalization (Wang
et al. 2020a; Cao et al. 2021; Wang et al. 2022; Cai et al.
2021; Hui et al. 2022) is that generating structure-rich SQLs
requires (potentially multi-hop) reasoning, i.e. the ability
to properly contextualize a user question against a given
database by considering many explicit relations (e.g., table-
column relations specified by database schema) and implicit
relations (e.g., whether a phrase refers to a column or table).
Figure 1 shows an introductory example of multi-hop rea-
soning in the text-to-SQL parsing and Figure 6 presents two
more detailed cases.
arXiv:2301.07507v1 [cs.CL] 18 Jan 2023
评论