
et al. [18] proposed the first solution on both genotypic
and phenotypic data by levering a tree structure. However,
multiple rounds of interaction in such a solution still restrict
the performance of secure count queries since the tree structure
can only be recursively accessed. Most importantly, none of
the above schemes provide formal security proofs which is far
from practical security.
In this paper, based on a rigorous security model secure
count query models on encrypted genomic data are presented
to support GWASs and related genomic analysis on-the-fly.
The main contributions are as follows.
• Scalable and efficient storage of genomic data. For a
genomic dataset consisting of both genotypic and phe-
notypic data, a counting tree (CT-Tree) is designed for
storing both the genomic data and count values by
embedding both genotypic and phenotypic data into edges
between nodes in the tree.
• Secure organization on clouds. Furthermore, all nodes in
CT-Tree are encrypted and make up a secure virtual CT-
Tree (secure vCT-Tree) by levering a hash function and an
efficient homomorphic encryption. The secure vCT-Tree
prevents the leakage of both tree structures and contents
in all nodes and leaves.
• Secure and efficient count query. Based on delegating a
secure vCT-Tree to a semi-honest cloud, two secure count
query models (i.e., SecCT and SecCT+) are presented in
which there is only a single interaction with the cloud.
• Provable security and high efficiency. SecCT is theoreti-
cally proven to be secure under indistinguishability under
adaptive chosen keyword (query) attack (IND-CKA2)
model. Also, massive experimental evaluations have been
conducted on real datasets to support the superiority of
SecCT and SecCT+.
So far, SecCT and SecCT+ are the first secure count query
models that are proven to be secure under the IND-CKA2
model and that need only a single interaction with a cloud.
The rest of this paper is organized as follows. We review
the related works and present brief comparisons with SecCT in
section II. The formal problem definition, security model, and
related fundamental tools are stated in section III. The genomic
data organization on the cloud is discussed in section IV, based
on which the concrete SecCT is presented in section V. In
section VI, SecCT is theoretically analyzed, such as security
analysis, theoretical efficiency, and theoretical comparison. We
present experimental evaluation in section VII and conclude
this work in section VIII.
II. R
ELATED WORK
Solutions towards ensuring the privacy and security of
genomic data while preserving functionalities are already a
research hotspot in biomedical and security communities [12],
[13]. There has been some progress in the secure count query.
Kantarcioglu et al. [14] first presented a concrete secure count
query scheme in 2008. Since then, the secure count query
problem has been investigated in various works [15], [17],
[18]. In the following, we briefly review these works from
various aspects and summarize them in Table I.
First, for security, data privacy, as the fundamental require-
ment, is satisfied since genomic data is prevented from leakage
in all the above works. Output privacy preventing the leakage
of a count query’s result is achieved in works [15], [17], [18].
Query privacy, satisfying that there is no adversary that can
learn anything about the query, is only considered in [17], [18].
All the analyses are descriptively driven under non-standard
security models. The capabilities of the attacker are not defined
by an adversarial model. So far, there is little progress in the
provable secure count query. SecCT and SecCT+ are the first
attempts toward the provable security of count query.
From an architectural perspective, works in references [14],
[17] both introduce multiple clouds to collaborate on secure
count query tasks. Such solutions are constructed on a con-
sistent assumption that multiple clouds do not compromise
with each other and one of the clouds is trusted, which is a
nature drawback compared with single cloud architecture. In
reference [15] the proposal works on a secure cryptographic
co-processor (SCP). It suffers the memory limitation of SCP
itself and is not compatible with large-scale genomic data.
Hasan et al. [18] proposed the first scheme utilizing a single
cloud. But multiple interactions between a client and a cloud
are inevitable to resolve secure count queries. SecCT and
SecCT+ are the first secure count query models utilizing
a single cloud and circumventing multiple interactions to
accelerate the query.
Hasan et al. [18] presented the first secure count query work-
ing on both genotypic and phenotypic data. Other works [14],
[15], [17] are only for genotypic data. SecCT and SecCT+
are also compatible with both genotypic and phenotypic data.
Additionally, all the above schemes except SecCT and SecCT+
cannot support dataset updating directly on the cloud.
III. P
RELIMINARIES
In this section, we present a framework to resolve the secure
count query problem on both genotypic and phenotypic data.
Following that security model is carefully designed to lay a
solid foundation for security. In the end, the Paillier system is
introduced.
As shown in Fig. 1, there are three participants in the
secure framework for secure count query on both genotypic
and phenotypic data.
• A data owner dominates a genomic dataset. A secure
index is constructed by embedding the genomic dataset.
Then, the secure index is delegated to a public cloud for
providing clients service of secure count query. A data
owner in practice is a hospital, health center, biobank, or
other similar institution that collects genomic data.
• A cloud stores the delegated secure index. When re-
ceiving a legal request for a count query from a client,
it conducts the query directly on the secure index and
derives the encrypted result. A cloud in practice is a cloud
service provider, such as AWS, Google Cloud, etc.
14
Authorized licensed use limited to: ZTE Corporation (One Dept). Downloaded on June 08,2025 at 03:29:13 UTC from IEEE Xplore. Restrictions apply.
评论