
Optimizing Distributed Tiered Data Storage Systems with DITIS
Sotiris Vasileiadis
Cyprus University of Technology
Limassol, Cyprus
sr.vasileiadis@edu.cut.ac.cy
Matthew Paraskeva
Cyprus University of Technology
Limassol, Cyprus
mp.paraskeva@edu.cut.ac.cy
George Savva
Cyprus University of Technology
Limassol, Cyprus
gec.savva@edu.cut.ac.cy
Andreas Efstathiou
Cyprus University of Technology
Limassol, Cyprus
andreasefstathiouudt@gmail.com
Edson Ramiro Lucas Filho
Cyprus University of Technology
Limassol, Cyprus
edson.lucas@cut.ac.cy
Jianqiang Shen
Huawei Technologies Co., Ltd.
Shenzhen, China
shenjianqiang@huawei.com
Lun Yang
Huawei Technologies Co., Ltd.
Shenzhen, China
yanglun12@huawei.com
Kebo Fu
Huawei Technologies Co., Ltd.
Shenzhen, China
fukebo@huawei.com
Herodotos Herodotou
Cyprus University of Technology
Limassol, Cyprus
herodotos.herodotou@cut.ac.cy
ABSTRACT
Modern data storage systems are characterized by a distributed
architecture as well as the presence of multiple storage tiers and
caches. Both system developers and operators are challenged with
the complexity of such systems as it is hard to evaluate how a con-
guration change will impact the workload or system performance
and identify the best conguration to satisfy some performance
objective. DITIS is a new simulator that models the end-to-end
execution of le requests on distributed tiered storage systems that
addresses the aforementioned challenges eciently without any
costly system redeployments. The demonstration will showcase
the key functionalities and benets oered by DITIS, including
(i) analyzing workload traces to understand their characteristics
and the behavior of the underlying storage system; (ii) running
simulations with dierent congurations to evaluate their impact
on performance; and (iii) running optimizations over custom search
spaces to nd the best conguration that satises a given objective.
PVLDB Reference Format:
Sotiris Vasileiadis, Matthew Paraskeva, George Savva, Andreas Efstathiou,
Edson Ramiro Lucas Filho, Jianqiang Shen, Lun Yang, Kebo Fu,
and Herodotos Herodotou. Optimizing Distributed Tiered Data Storage
Systems with DITIS. PVLDB, 17(12): 4393 - 4396, 2024.
doi:10.14778/3685800.3685883
PVLDB Artifact Availability:
The source code, data, and/or other artifacts have been made available at
https://github.com/cut-dicl/ditis-ui.
1 INTRODUCTION
Modern data storage systems exhibit considerable complexity due
to their distributed nature and the need to balance data and load
This work is licensed under the Creative Commons BY-NC-ND 4.0 International
License. Visit https://creativecommons.org/licenses/by-nc-nd/4.0/ to view a copy of
this license. For any use beyond those covered by this license, obtain permission by
emailing info@vldb.org. Copyright is held by the owner/author(s). Publication rights
licensed to the VLDB Endowment.
Proceedings of the VLDB Endowment, Vol. 17, No. 12 ISSN 2150-8097.
doi:10.14778/3685800.3685883
across the storage nodes [
7
]. In addition, these systems incorporate
multiple storage tiers, encompassing numerous HDDs and SSDs,
along with multiple cache levels of DRAM and NVRAM. Conse-
quently, new data management policies are required to optimize
performance and resource utilization. Furthermore, these systems
integrate diverse redundancy mechanisms, including replication
and erasure coding, to ensure data durability and fault tolerance.
The multifaceted architecture of modern data storage systems
necessitates sophisticated management strategies to harness their
full potential, for both system developers and operators. For devel-
opers, evaluating the impact of new policies for caching, tiering,
and other mechanisms is cumbersome and time-consuming as it
requires system redeployments. Hence, it is very dicult to explore
the design space for promoting changes in the system. For opera-
tors, it is challenging to evaluate how their workloads will behave
after a system reconguration or upgrade as well as determine the
best system conguration that will satisfy their objectives.
Simulation presents a logical approach to address the aforemen-
tioned challenges. Numerous simulators concentrate on modeling
particular aspects of storage systems, including caching and tier-
ing policies [
6
], scheduling [
5
], network communication [
1
], and
le system behavior [
2
]. Some simulators are also available for
simulating either single-node multi-tier storage systems [4, 10] or
distributed single-tier storage systems [
8
,
9
]. However, none of
the current simulators can fully encompass the complexity and
nuances of contemporary storage systems that feature multiple
storage nodes, diverse storage tiers, and various cache levels.
DITIS [
3
] is a new comprehensive simulator that models the end-
to-end execution of le requests on distributed multi-tier storage
systems. The key novelties of DITIS include (i) an architecture based
on an adaptation of the actor model instead of the typical event-
oriented or process-oriented models; (ii) a machine learning-based
initialization process for placing data to the appropriate tier/cache
before the simulation begins; and (iii) ne-grained but ecient
performance cost models for HDD, SSD, NVRAM, DRAM, and net-
work communications. Moreover, DITIS is extremely congurable
with 131 conguration parameters touching all aspects of a storage
system (e.g., number of nodes/tiers/caches, device and network
4393
文档被以下合辑收录
评论