
软件学报 ISSN 1000-9825, CODEN RUXUEW E-mail: jos@iscas.ac.cn
Journal of Software, [doi: 10.13328/j.cnki.jos.000000] http://www.jos.org.cn
©中国科学院软件研究所版权所有. Tel: +86-10-62562563
XuanYuan:AI 原生数据库系统
*
李国良
,
周煊赫
,
冯建华
清华大学计算机系
通讯作者: 李国良, E-mail: liguoliang@tsinghua.edu.cn
摘 要: 大数据时代下,数据库系统主要面临着三个方面的挑战。首先,基于专家经验的传统优化技术(如代
价估计,连接顺序选择,参数调优)已经不能满足异构数据、海量应用和大规模用户对性能的需求。我们可以设
计基于学习的数据库优化技术,使数据库更智能。其次,AI 时代很多数据库应用需要使用人工智能算法,如数据
库中的图像搜索。我们可以将人工智能算法嵌入到数据库,利用数据库技术加速人工智能算法,并在数据库中提
供基于人工智能的服务。再者,传统数据库侧重于使用通用硬件(如 CPU), 不 能 充 分 发挥新硬件(如 ARM、AI
芯片)的优势。此外,除了关系模型,数据库需要支持张量模型来加速人工智能操作。为了解决这些挑战,我们
提出了一个原生支持人工智能(AI)的数据库系统。一方面,我们将各种人工智能技术集成到数据库中,以提供
自监控、自配置、自优化、自诊断、自愈、自安全和自组装功能。另一方面,我们通过使用声明性语言让数据库
提供人工智能功能,以降低人工智能使用门槛。本文介绍了实现人工智能原生数据库的五个阶段,并给出了设计
人工智能原生数据库的挑战。我们还以自主数据库调优、基于深度强化学习的查询优化、基于机器学习的基数估
计和自主索引/视图推荐为例,展示了人工智能原生数据库的优势。
关键词: 数据库;人工智能;计算框架
中图法分类号: TP311
XuanYuan: an AI-Native Database
Guoliang Li, Xuanhe Zhou
Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China
Abstract: In big data era, database systems face three challenges. Firstly, the traditional empirical optimization techniques (e.g., cost
estimation, join order selection, knob tuning) cannot meet the high-performance requirement for large-scale data, various applications and
diversified users. We need to design learning-based techniques to make database more intelligent. Secondly, many database applications
require to use AI algorithms, e.g., image search in database. We can embed AI algorithms into database, utilize database techniques to
accelerate AI algorithms, and provide AI capability inside databases. Thirdly, traditional databases focus on using general hardware (e.g.,
CPU), but cannot fully utilize new hardware (e.g., ARM, GPU, AI chips). Moreover, besides relational model, we can utilize tensor model
to accelerate AI operations. Thus, we need to design new techniques to make full use of new hardware. To address these challenges, we
design an AI-native database. On one hand, we integrate AI techniques into databases to provide self-configuring, self-optimizing,
self-monitoring, self-diagnosis, self-healing, self-assembling, and self-security capabilities. On the other hand, we enable databases to
provide AI capabilities using declarative languages in order to lower the barrier of using AI. In this paper, we introduce five levels of
AI-native databases and provide several open challenges of designing an AI-native database. We also take autonomous database knob
tuning, deep reinforcement learning based optimizer, machine-learning based cardinality estimation, and autonomous index/view advisor
as examples to showcase the superiority of AI-native databases.
* 基金项目: 国家自然科学基金(61632016, 61521002, 61661166012); 973 项目(2015CB358700)
收稿时间: 0000-00-00; 修改时间: 0000-00-00; 采用时间: 0000-00-00; jos 在线出版时间: 0000-00-00
CNKI 在线出版时间: 0000-00-00
评论