
Database Meets Deep Learning: Challenges and
Opportunities
Wei Wang
†
, Meihui Zhang
‡
, Gang Chen
§
,
H. V. Jagadish
#
, Beng Chin Ooi
†
, Kian-Lee Tan
†
†
National University of Singapore
‡
Beijing Institute of Technology
§
Zhejiang University
#
University of Michigan
†
{wangwei, ooibc, tankl}@comp.nus.edu.sg
‡
meihui zhang@bit.edu.cn
§
cg@zju.edu.cn
#
jag@umich.edu
ABSTRACT
Deep learning has recently become very popular on ac-
count of its incredible success in many complex data-
driven applications, such as image classification and speech
recognition. The database community has worked on
data-driven applications for many years, and therefore
should be playing a lead role in supporting this new
wave. However, databases and deep learning are differ-
ent in terms of both techniques and applications. In this
paper, we discuss research problems at the intersection
of the two fields. In particular, we discuss possible im-
provements for deep learning systems from a database
perspective, and analyze database applications that may
benefit from deep learning techniques.
1. INTRODUCTION
In recent years, we have witnessed the success of
numerous data-driven machine-learning-based ap-
plications. This has prompted the database com-
munity to investigate the opportunities for integrat-
ing machine learning techniques in the design of
database systems and applications [84]. A branch of
machine learning, called deep learning [57, 38], has
attracted worldwide interest in recent years due to
its excellent performance in multiple areas including
speech recognition, image classification and natural
language processing (NLP). The foundation of deep
learning was established about twenty years ago in
the form of neural networks. Its recent resurgence is
mainly fueled by three factors: immense computing
power, which reduces the time to train and deploy
new models, e.g. Graphic Processing Unit (GPU)
enables the training systems to run much faster
than those in the 1990s; massive (labeled) training
datasets (e.g. ImageNet) enable a more comprehen-
sive knowledge of the domain to be acquired; new
deep learning models (e.g. AlexNet [55]) improve
the ability to capture data regularities.
Database researchers have been working on sys-
tem optimization and large scale data-driven ap-
plications since 1970s, which are closely related to
the first two factors. It is natural to think about
the relationships between databases and deep learn-
ing. First, are there any insights that the database
community can offer to deep learning? It has been
shown that larger training datasets and a deeper
model structure improve the accuracy of deep learn-
ing models. However, the side effect is that the
training becomes more costly. Approaches have been
proposed to accelerate the training speed from both
the system perspective [12, 42, 18, 80, 2] and the
theory perspective [120, 27]. Since the database
community has rich experience with system opti-
mization, it would be opportune to discuss the ap-
plicability of database techniques for optimizing deep
learning systems. For example, distributed com-
puting and memory management are key database
technologies also central to deep learning.
Second, are there any deep learning techniques
that can be adapted for database problems? Deep
learning emerged from the machine learning and
computer vision communities. It has been success-
fully applied to other domains, like NLP [28]. How-
ever, few studies have been conducted using deep
learning techniques for traditional database prob-
lems. This is partially because traditional database
problems — like indexing, transaction and storage
management — involve less uncertainty, whereas
deep learning is good at predicting over uncertain
events. Nevertheless, there are problems in databases
like knowledge fusion [21] and crowdsourcing [79],
which are probabilistic problems. It is possible to
apply deep learning techniques in these areas. We
will discuss specific problems like querying interface,
knowledge fusion, etc. in this paper.
The previous version [108] of this paper has ap-
peared in SIGMOD Record. In this version, we ex-
tend it to include the recent developments in this
field and references to recent work.
arXiv:1906.08986v2 [cs.DB] 19 Jan 2020
评论