暂无图片
暂无图片
暂无图片
暂无图片
暂无图片
SecCT_Secure_and_scalable_count_query_models_on_encrypted_genomic_data.pdf
71
10页
0次
2025-06-27
免费下载
SecCT: Secure and scalable count query models on
encrypted genomic data
Yanguo Peng, Rongqiao Liu, Xiyue Gao, Luyuan Huang
School of Computer Science and Technology
Xidian University
Xi’an, China
Jingjing Guo
School of Cyber Engineering
Xidian University
Xi’an, China
Yaofeng Tu
ZTE Corporation
Nanjing, China
Abstract—Recently, due to the continued reduction in DNA
sequencing cost, large-scale genetic samples are being gathered
for accelerating predispositions to specific diseases, tailoring
treatment of efficient drugs and therapies, etc. Massive genetic
samples are encrypted-and-then-delegated to a public cloud
to both save investment and maintenance costs and prevent
the potential leakage of sensitive information. However, such a
manner compromises the serviceability of a public cloud, since
encryption inevitably breaks the semantic information of genetic
samples. Secure count query of single-nucleotide polymorphisms
(SNPs), as a kernel component for GWASs and related genomic
analysis, is attracting much more attention.
Existing methods lack provable security, suffer low efficiency
caused by multiple interactions with the cloud, etc. In this
paper, a secure virtual CT-Tree (secure vCT-Tree) is carefully
constructed to confuse the tree structure by introducing a hash
function and a Paillier system. Furthermore, by delegating the
secure vCT-Tree to the cloud, concrete models (i.e., SecCT and
SecCT+) are presented to resolve secure count query problems
on-the-fly. Both models advance the provable security of genetic
research and are proven to be secure under the adaptive cho-
sen keyword (query) attack (IND-CKA2) model. Furthermore,
massive experiments are evaluated on realistic data to show the
superiority of SecCT.
Index Terms—Secure count query, genotypic and phenotypic
data, IND-CKA security, scalability
I. INTRODUCTION
DNA sequencing costs have dramatically decreased in re-
cent years. The acquisition of large-scale genetic samples has
become feasible and ubiquitous. Genetic samples are widely
applied in various biomedical research and industry fields.
For example, these genetic samples provide predispositions
to specific diseases (such as hypertension [2], Alzheimer [3],
etc.), tailor treatment with personalized medicine [5] and the
development of efficient drugs and therapies. For promoting
the above related research many countries and organizations
around the world have started collecting these genetic data,
such as UK Biobank [6], The 1000 Genomes Project
1
, etc.
Both the volume and scale of genetic samples are enormous
since a human’s DNA sequence consists of almost 3.16 billion
The work is supported by the National Natural Science Foundation of China
(No. 62172314, 62302370, 61602360), the Natural Science Basic Research
Program of Shaanxi Province (No.2023-JC-QN-0648, 2019CGXNG-023) and
ZTE Industry-University-Institute Cooperation Funds (No. IA20230625001).
J. Guo is the corresponding author.
1
https://www.internationalgenome.org/
of base pairs
2
and there are around 350 million people
on earth living with rare disorders in which 80% of these
are genetic-related
3
. It is rarely possible for organizations
and companies to leverage genetic samples locally. Cloud
computing is a solution that is gaining traction in genomics
research [7]. Funders of genomics research are willing to
utilize cloud computing’s advantages and move their various
genomic resources to the cloud accordingly.
The human genome is a special information carrier, which
indicates a plethora of human’s essential and private charac-
teristics. Once an individual’s genome is leaked there will be
no feasible measure to protect it any longer. Additionally, the
genome reflects an individual’s character [8], identifies predis-
position to diseases [1]–[3], etc. Thus, a human’s genome is
prone to privacy risks and attacks. So far, security and privacy
remain the main obstacle for biomedical organizations and
companies to fully embrace cloud computing.
The encrypt-then-outsource model is widely adopted to
prevent the leakage of private information on clouds. In such
a way, the cloud can learn nothing meaningful about both
the genetic samples and queries without secret keys [9],
[10]. Related works provide different services directly on the
encrypted data, among which genome-wide association studies
(GWASs) have revolutionized the field of complex disease
genetics over past decades [11]. Secure count query of single-
nucleotide polymorphisms (SNPs) on large genomic data is
a kernel component for GWASs and related genomic analysis
(e.g., paternity and disease susceptibility tests). For example, a
count query is used to derive statistical results of rare variants
(e.g., SNP) and a GWAS system utilizes the result to build
associations between gene and disease [7], [12], [13].
Several solutions have been proposed in previous works [13]
to resolve the problem of secure count queries on genomic
data. Various solutions [14]–[17] are proposed to resolve the
problem by levering cryptographic algorithms and hardware.
These solutions suffer limitations of low efficiency, multi-
round interactions, secure hardware investment, etc., and hence
are difficult to be deployed in practice. Also, they are not
compatible with phenotypic data, which is as important as
genomic data in GWASs and related genomic analysis. Hansan
2
https://en.wikipedia.org/wiki/Base_pair
3
https://www.genome.gov/dna-day/15-ways/rare-genetic-diseases
13
2023 IEEE 28th Pacific Rim International Symposium on Dependable Computing (PRDC)
2473-3105/23/$31.00 ©2023 IEEE
DOI 10.1109/PRDC59308.2023.00011
2023 IEEE 28th Pacific Rim International Symposium on Dependable Computing (PRDC) | 979-8-3503-5876-6/23/$31.00 ©2023 IEEE | DOI: 10.1109/PRDC59308.2023.00011
Authorized licensed use limited to: ZTE Corporation (One Dept). Downloaded on June 08,2025 at 03:29:13 UTC from IEEE Xplore. Restrictions apply.
et al. [18] proposed the first solution on both genotypic
and phenotypic data by levering a tree structure. However,
multiple rounds of interaction in such a solution still restrict
the performance of secure count queries since the tree structure
can only be recursively accessed. Most importantly, none of
the above schemes provide formal security proofs which is far
from practical security.
In this paper, based on a rigorous security model secure
count query models on encrypted genomic data are presented
to support GWASs and related genomic analysis on-the-fly.
The main contributions are as follows.
Scalable and efficient storage of genomic data. For a
genomic dataset consisting of both genotypic and phe-
notypic data, a counting tree (CT-Tree) is designed for
storing both the genomic data and count values by
embedding both genotypic and phenotypic data into edges
between nodes in the tree.
Secure organization on clouds. Furthermore, all nodes in
CT-Tree are encrypted and make up a secure virtual CT-
Tree (secure vCT-Tree) by levering a hash function and an
efficient homomorphic encryption. The secure vCT-Tree
prevents the leakage of both tree structures and contents
in all nodes and leaves.
Secure and efficient count query. Based on delegating a
secure vCT-Tree to a semi-honest cloud, two secure count
query models (i.e., SecCT and SecCT+) are presented in
which there is only a single interaction with the cloud.
Provable security and high efficiency. SecCT is theoreti-
cally proven to be secure under indistinguishability under
adaptive chosen keyword (query) attack (IND-CKA2)
model. Also, massive experimental evaluations have been
conducted on real datasets to support the superiority of
SecCT and SecCT+.
So far, SecCT and SecCT+ are the first secure count query
models that are proven to be secure under the IND-CKA2
model and that need only a single interaction with a cloud.
The rest of this paper is organized as follows. We review
the related works and present brief comparisons with SecCT in
section II. The formal problem definition, security model, and
related fundamental tools are stated in section III. The genomic
data organization on the cloud is discussed in section IV, based
on which the concrete SecCT is presented in section V. In
section VI, SecCT is theoretically analyzed, such as security
analysis, theoretical efficiency, and theoretical comparison. We
present experimental evaluation in section VII and conclude
this work in section VIII.
II. R
ELATED WORK
Solutions towards ensuring the privacy and security of
genomic data while preserving functionalities are already a
research hotspot in biomedical and security communities [12],
[13]. There has been some progress in the secure count query.
Kantarcioglu et al. [14] first presented a concrete secure count
query scheme in 2008. Since then, the secure count query
problem has been investigated in various works [15], [17],
[18]. In the following, we briefly review these works from
various aspects and summarize them in Table I.
First, for security, data privacy, as the fundamental require-
ment, is satisfied since genomic data is prevented from leakage
in all the above works. Output privacy preventing the leakage
of a count query’s result is achieved in works [15], [17], [18].
Query privacy, satisfying that there is no adversary that can
learn anything about the query, is only considered in [17], [18].
All the analyses are descriptively driven under non-standard
security models. The capabilities of the attacker are not defined
by an adversarial model. So far, there is little progress in the
provable secure count query. SecCT and SecCT+ are the first
attempts toward the provable security of count query.
From an architectural perspective, works in references [14],
[17] both introduce multiple clouds to collaborate on secure
count query tasks. Such solutions are constructed on a con-
sistent assumption that multiple clouds do not compromise
with each other and one of the clouds is trusted, which is a
nature drawback compared with single cloud architecture. In
reference [15] the proposal works on a secure cryptographic
co-processor (SCP). It suffers the memory limitation of SCP
itself and is not compatible with large-scale genomic data.
Hasan et al. [18] proposed the first scheme utilizing a single
cloud. But multiple interactions between a client and a cloud
are inevitable to resolve secure count queries. SecCT and
SecCT+ are the first secure count query models utilizing
a single cloud and circumventing multiple interactions to
accelerate the query.
Hasan et al. [18] presented the first secure count query work-
ing on both genotypic and phenotypic data. Other works [14],
[15], [17] are only for genotypic data. SecCT and SecCT+
are also compatible with both genotypic and phenotypic data.
Additionally, all the above schemes except SecCT and SecCT+
cannot support dataset updating directly on the cloud.
III. P
RELIMINARIES
In this section, we present a framework to resolve the secure
count query problem on both genotypic and phenotypic data.
Following that security model is carefully designed to lay a
solid foundation for security. In the end, the Paillier system is
introduced.
As shown in Fig. 1, there are three participants in the
secure framework for secure count query on both genotypic
and phenotypic data.
A data owner dominates a genomic dataset. A secure
index is constructed by embedding the genomic dataset.
Then, the secure index is delegated to a public cloud for
providing clients service of secure count query. A data
owner in practice is a hospital, health center, biobank, or
other similar institution that collects genomic data.
A cloud stores the delegated secure index. When re-
ceiving a legal request for a count query from a client,
it conducts the query directly on the secure index and
derives the encrypted result. A cloud in practice is a cloud
service provider, such as AWS, Google Cloud, etc.
14
Authorized licensed use limited to: ZTE Corporation (One Dept). Downloaded on June 08,2025 at 03:29:13 UTC from IEEE Xplore. Restrictions apply.
of 10
免费下载
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文档的来源(墨天轮),文档链接,文档作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论

关注
最新上传
暂无内容,敬请期待...
下载排行榜
Top250 周榜 月榜