
271
Homomorphic Compression: Making Text Processing on
Compression Unlimited
JIAWEI GUAN, Key laboratory of Data Engineering and Knowledge Engineering (MOE), and School of
Information, Renmin University of China, China
FENG ZHANG
∗
, Key laboratory of Data Engineering and Knowledge Engineering (MOE), and School of
Information, Renmin University of China, China
SIQI MA, School of Systems and Computing, University of New South Wales, Australia
KUANGYU CHEN, Key laboratory of Data Engineering and Knowledge Engineering (MOE), and School
of Information, Renmin University of China, China
YIHUA HU, Key laboratory of Data Engineering and Knowledge Engineering (MOE), and School of Infor-
mation, Renmin University of China, China
YUXING CHEN, Tencent Inc., China
ANQUN PAN, Tencent Inc., China
XIAOYONG DU, Key laboratory of Data Engineering and Knowledge Engineering (MOE), and School of
Information, Renmin University of China, China
Lossless data compression is an eective way to handle the huge transmission and storage overhead of massive
text data. Its utility is even more signicant today when data volumes are skyrocketing. The concept of
operating on compressed data infuses new blood into ecient text management by enabling mainly access-
oriented text processing tasks to be done directly on compressed data without decompression. Facing limitations
of the existing compressed text processing schemes such as limited types of operations supported, low eciency,
and high space occupation, we address these problems by proposing a homomorphic compression theory. It
enables the generalization and characterization of algorithms with compression processing capabilities. On
this basis, we develop HOCO, an ecient text data management engine that supports a variety of processing
tasks on compressed text. We select three representative compression schemes and implement them combined
with homomorphism in HOCO. HOCO supports the extension of homomorphic compression schemes through
a modular and object-oriented design and has convenient interfaces for text processing tasks. We evaluate
HOCO on six real-world datasets. The three schemes implemented in HOCO show trade-os in terms of
compression ratio, supported operation types, and eciency. Experiments also show that HOCO can achieve
∗
Feng Zhang is the corresponding author of this paper.
Authors’ addresses: Jiawei Guan, guanjw@ruc.edu.cn, Key laboratory of Data Engineering and Knowledge Engineering
(MOE), and School of Information, Renmin University of China, China; Feng Zhang, Key laboratory of Data Engineering
and Knowledge Engineering (MOE), and School of Information, Renmin University of China, China, fengzhang@ruc.edu.cn;
Siqi Ma, School of Systems and Computing, University of New South Wales, Australia, siqi.ma@adfa.edu.au; Kuangyu Chen,
Key laboratory of Data Engineering and Knowledge Engineering (MOE), and School of Information, Renmin University
of China, China, kuangyuchen@ruc.edu.cn; Yihua Hu, Key laboratory of Data Engineering and Knowledge Engineering
(MOE), and School of Information, Renmin University of China, China, yh3485@columbia.edu; Yuxing Chen, Tencent Inc.,
China, axingguchen@tencent.com; Anqun Pan, Tencent Inc., China, aaronpan@tencent.com; Xiaoyong Du, Key laboratory
of Data Engineering and Knowledge Engineering (MOE), and School of Information, Renmin University of China, China,
duyong@ruc.edu.cn.
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee
provided that copies are not made or distributed for prot or commercial advantage and that copies bear this notice and the
full citation on the rst page. Copyrights for components of this work owned by others than the author(s) must be honored.
Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires
prior specic permission and/or a fee. Request permissions from permissions@acm.org.
© 2023 Copyright held by the owner/author(s). Publication rights licensed to ACM.
2836-6573/2023/12-ART271 $15.00
https://doi.org/10.1145/3626765
Proc. ACM Manag. Data, Vol. 1, No. 4 (SIGMOD), Article 271. Publication date: December 2023.
评论