
AI-Powered Orchestration of Multi-Model Data
Jáchym Bártík
supervised by Irena Holubová
Faculty of Mathematics and Physics, Charles University
Prague, Czech Republic
jachym.bartik@matfyz.cuni.cz
ABSTRACT
Multi-model databases are an increasingly popular solution to to-
day’s data management challenges of Big Data. However, their
inherent complexity and lack of standardization stand in the way
of their widespread adoption. In our research, we focus on reducing
the complexity by automating the management of such databases.
The goal is to provide a robust framework capable of unied mod-
elling, transformation, querying, and evolution management of
multi-model data and to leverage AI techniques to optimize data
distribution among the database systems.
VLDB Workshop Reference Format:
Jáchym Bártík. AI-Powered Orchestration of Multi-Model Data. VLDB 2024
Workshop: VLDB Ph.D. Workshop.
1 INTRODUCTION
More than 2/3 of the 50 most widely used database management
systems (DBMSs)
1
fall under the category of multi-model
.
2
The
multi-model data is organised in various mutually interlinked for-
mats and models, often with contradictory features [
17
]. In addition,
its structure may change over time, and its size can grow to the
extremes of Big Data. These aspects create one of the most complex
challenges of eective data management.
As handling such a complex task manually is impossible, we
focus on the automatic management of dynamic multi-model Big
Data. We want to create a robust framework capable of accepting
dierent types of data, queries, changes, and propagation strategies.
Based on such rich input, the system will learn to provide self-
adapting evolution management, ensuring a complete, correct, and
ecient propagation of changes. Particularly, it will support the
following features:
•
Multi-Model Modeling: We need to model the data in one
unied and formally backed schema. The model can either
(1) be created manually or (2) automatically inferred from
sample data. We can also combine these approaches, i.e.,
infer a reasonable schema and then manually improve it.
•
Multi-Model-to-Multi-Model Transformations: Transforma-
tion from one model to another is a simple process. But, we
must be able to migrate the data between dierent combina-
tions of models represented by dierent database systems.
This work is licensed under the Creative Commons BY-NC-ND 4.0 International
License. Visit https://creativecommons.org/licenses/by-nc-nd/4.0/ to view a copy of
this license. For any use beyond those covered by this license, obtain permission by
emailing info@vldb.org. Copyright is held by the owner/author(s). Publication rights
licensed to the VLDB Endowment.
Proceedings of the VLDB Endowment. ISSN 2150-8097.
1
https://db-engines.com/en/ranking
2
I.e., consisting of multiple data models (relational, document, graph, .. . ).
•
Cross-Model Querying: We need to query over the whole
dataset, not just a single database system. Also, the queries
should be independent of the underlying data models so that
we can use the same query language for the whole system
and thus not force the user to learn dierent languages.
•
Multi-Model Evolution Management: As we have mentioned,
each system evolves over time, whereas in the case of multi-
model data, the evolution must cover all combined models.
Primarily, we want to be able to update the model, the data
itself, and the queries. And, when possible, automatically.
Several solutions have already implemented these features, many
of which are widely used. However, they all have one thing in com-
mon: they are either tightly coupled with the underlying database
systems or too limited to fully model multi-model data. For example,
the UML and ER models are industry standards. But, they cannot
generally model complex properties, maps, or graphs.
Outline. In Section 2, we discuss the current functionalities of our
framework consisting of a family of tools. In Section 3, we describe
our planned steps. In Section 4, we outline the open problems.
2 INITIAL FRAMEWORK
In our research group, we have proposed several solutions to se-
lected aspects of unied and ecient multi-model data management.
We have also implemented tools for their experimental verication.
This toolset represents the initial framework we currently intend
to enhance by exploiting AI to automate data management.
First, we needed a suciently abstract approach to handle all
the conicting requirements because we deal with varied data
models and database systems. Therefore, we proposed a system-
independent representation based on category theory [
12
]. We can
view a category as a directed multigraph for simplicity. The nodes
(called objects) represent entities, and the edges (called morphisms)
represent relationships between them. For example, object
𝐴
repre-
sents a User and object
𝐵
represents a Name. Then, we can have
a morphism
𝑓
:
𝐴 → 𝐵
, meaning that a User has a Name. We can
create structures representing arrays, sets, weak-entity types, etc.
In our framework, we call such category a schema category.
This unifying representation enables us to “grasp” any combina-
tion of models and to process it in a system-independent manner.
When a particular operation has to be done at this abstract level, it
is propagated to the underlying database system.
Example 2.1. An example of a schema category can be found in Fig. 1.
The schema category is mapped only to the relational database model
(denoted using the violet colour). On the other hand, in Fig. 2, we can see
the same schema category after an evolution of the mapping. It is now
mapped to relational (violet) and document (green) models. □
文档被以下合辑收录
评论