
SiriusBI: A Comprehensive LLM-Powered Solution for Data
Analytics in Business Intelligence
Jie Jiang
1
, Haining Xie
1
, Siqi Shen
2
, Yu Shen
1
, Zihan Zhang
1
, Meng Lei
1
, Yifeng Zheng
1
, Yang Li
1
,
Chunyou Li
1
, Danqing Huang
1
, Yinjun Wu
3
, Wentao Zhang
2
, Bin Cui
3
, Peng Chen
1
1
Department of Data Platform, TEG, Tencent Inc.
2
Center of Machine Learning Research, Peking University
3
School of Computer Science, Peking University
1
{zeus, hainingxie, willyushen, rylanzhang, garylei, yifengzheng, thomasyngli,
chunyouli, daisyqhuang, felixxfyang, pengchen}@tencent.com
2
{shensiqi1009, wentao.zhang}@pku.edu.cn
3
{wuyinjun, bin.cui}@pku.edu.cn
ABSTRACT
With the proliferation of Large Language Models (LLMs) in Busi-
ness Intelligence (BI), existing solutions face critical challenges in
industrial deployments: functionality deciencies from legacy sys-
tems failing to meet evolving LLM-era user demands, interaction
limitations from single-round SQL generation paradigms inade-
quate for multi-round clarication, and cost for domain adaptation
arising from cross-domain methods migration.
We present SiriusBI, a practical LLM-powered BI system address-
ing the challenges of industrial deployments through three key in-
novations: (a) An end-to-end architecture integrating multi-module
coordination to overcome functionality gaps in legacy systems; (b)
A multi-round dialogue with querying mechanism, consisting of se-
mantic completion, knowledge-guided clarication, and proactive
querying processes, to resolve interaction constraints in SQL gener-
ation; (c) A data-conditioned SQL generation method selection strat-
egy that supports both an ecient one-step Fine-Tuning approach
and a two-step method leveraging Semantic Intermediate Repre-
sentation for low-cost cross-domain applications. Experiments on
both real-world datasets and public benchmarks demonstrate the
eectiveness of SiriusBI. User studies further conrm that SiriusBI
enhances both productivity and user experience.
As an independent service on Tencent’s data platform, SiriusBI
is deployed across nance, advertising, and cloud sectors, serving
dozens of enterprise clients. It achieves over 93% accuracy in SQL
generation and reduces data analysts’ query time from minutes to
seconds in real-world applications.
PVLDB Reference Format:
Jie Jiang, Haining Xie, Siqi Shen, Yu Shen, Zihan Zhang, Meng Lei, Yifeng
Zheng, Yang Li, Chunyou Li, Danqing Huang, Yinjun Wu, Wentao Zhang,
Bin Cui, Peng Chen. SiriusBI. PVLDB, 18(12): 4860 - 4873, 2025.
doi:10.14778/3750601.3750610
PVLDB Artifact Availability:
The source code, data, and/or other artifacts have been made available at
https://github.com/Tencent-SiriusAI/SiriusBI.
This work is licensed under the Creative Commons BY-NC-ND 4.0 International
License. Visit https://creativecommons.org/licenses/by-nc-nd/4.0/ to view a copy of
this license. For any use beyond those covered by this license, obtain permission by
emailing info@vldb.org. Copyright is held by the owner/author(s). Publication rights
licensed to the VLDB Endowment.
Proceedings of the VLDB Endowment, Vol. 18, No. 12 ISSN 2150-8097.
doi:10.14778/3750601.3750610
1 INTRODUCTION
Business Intelligence (BI) [
54
,
83
] is a crucial application scenario in
the data eld, comprising a comprehensive suite of methodologies,
tools, and infrastructures designed to collect, integrate, analyze,
and present raw data from an organization to generate actionable
insights for informed decision-making. BI systems are extensively
used in various sectors, including nance [
55
], environment [
24
],
and social media [
11
,
64
], which signicantly improves the decision-
making process through the provision of real-time analytics and
reporting capabilities [44, 60].
A typical BI system comprises several key components: a data
management module that stores, processes, and aggregates vast
amounts of data; analytic algorithms that transform the data into
actionable insights; and visualization tools that present the infor-
mation in intuitive and user-friendly formats. Among these, data
analytics plays a crucial role in providing decision-making sup-
port, directly determining the correctness and appropriateness of
decisions. Recent advancements in LLMs [
34
,
46
,
89
] have sparked
signicant interest in ChatBI — a new paradigm supported by natu-
ral language interfaces [
1
,
41
]. Concurrently, the demand for a fully
integrated and ecient ChatBI solution is surging, driven by the
need of a more intuitive and accessible mode of data interaction.
This evolution promises to transform how users engage with data,
making insights more available and actionable.
To meet the growing demand for big data analytics and decision-
making in BI, the data community has proposed numerous eective
approaches. However, when applying existing work in real-world
BI scenarios, we identify the following three challenges:
C1: Functionality Deciencies. While traditional business in-
telligence systems [
8
] integrate core components spanning data
management, SQL generation, and insight discovery to form com-
plete analytics pipelines, their reliance on heuristic rules and con-
ventional AI/ML techniques limits generalization ability in dy-
namic scenarios. Although LLM-based methods have advanced
task-specic performance, few oer comprehensive BI capabilities
comparable to their traditional counterparts. For example, MAC-
SQL [
75
] and CHESS [
68
] optimize NL2SQL accuracy but treat SQL
execution as terminal outputs, neglecting downstream tasks like
attribution analysis. While Lian et al. [
46
] extend their pipeline with
Apache Superset for visualization, they fail to introduce knowledge
bases to support dynamic grounding of domain-specic context, a
4860
评论