暂无图片
暂无图片
暂无图片
暂无图片
暂无图片
VLDB2024_EncChain:Enhancing Large Language Model Applications with Advanced Privacy Preservation Techniques_阿里云.pdf
182
4页
4次
2024-09-09
免费下载
EncChain: Enhancing Large Language Model Applications with
Advanced Privacy Preservation Techniques
Zhe Fu
Alibaba Cloud
je.fz@alibaba-inc.com
Mo Sha
Alibaba Cloud
shamo.sm@alibaba-inc.com
Yiran Li
Alibaba Cloud
yiranli.lyr@alibaba-inc.com
Huorong Li
Alibaba Cloud
huorong.lhr@alibaba-inc.com
Yubing Ma
Alibaba Cloud
yubing.myb@alibaba-inc.com
Sheng Wang
Alibaba Cloud
sh.wang@alibaba-inc.com
Feifei Li
Alibaba Cloud
lifeifei@alibaba-inc.com
ABSTRACT
In response to escalating concerns about data privacy in the Large
Language Model (LLM) domain, we demonstrate EncChain, a pi-
oneering solution designed to bolster data security in LLM ap-
plications. EncChain presents an all-encompassing approach to
data protection, encrypting both the knowledge bases and user
interactions. It empowers condential computing and implements
stringent access controls, oering a signicant leap in securing
LLM usage. Designed as an accessible Python package, EncChain
ensures straightforward integration into existing systems, bolstered
by its operation within secure environments and the utilization of
remote attestation technologies to verify its security measures. The
eectiveness of EncChain in fortifying data privacy and security
in LLM technologies underscores its importance, positioning it as a
critical advancement for the secure and private utilization of LLMs.
PVLDB Reference Format:
Zhe Fu, Mo Sha, Yiran Li, Huorong Li, Yubing Ma, Sheng Wang, and Feifei
Li. EncChain: Enhancing Large Language Model Applications with
Advanced Privacy Preservation Techniques. PVLDB, 17(12): 4413 - 4416,
2024.
doi:10.14778/3685800.3685888
1 INTRODUCTION
Since late 2022, interest in Large Language Models (LLMs) [
1
] has
surged dramatically. ChatGPT, for instance, amassed over 100 mil-
lion active users within just two months of its launch, representing
an unprecedented technological uptake. The profound capabilities
of LLMs across diverse domains have catalyzed their widespread
adoption, integration eorts in various use cases, and demonstrated
substantial benets in augmenting productivity and eciency.
However, the rapid advancement of LLMs has highlighted sig-
nicant data security and privacy issues. These concerns are not
merely theoretical. In March 2023, the Italian Data Protection Au-
thority banned ChatGPT due to privacy concerns. In April, Samsung
was accused of leaking sensitive semiconductor data to ChatGPT
in three incidents over 20 days. By November, Microsoft prohib-
ited employees from using ChatGPT at work, blocking related AI
This work is licensed under the Creative Commons BY-NC-ND 4.0 International
License. Visit https://creativecommons.org/licenses/by-nc-nd/4.0/ to view a copy of
this license. For any use beyond those covered by this license, obtain permission by
emailing info@vldb.org. Copyright is held by the owner/author(s). Publication rights
licensed to the VLDB Endowment.
Proceedings of the VLDB Endowment, Vol. 17, No. 12 ISSN 2150-8097.
doi:10.14778/3685800.3685888
tools on company devices. These instances indicate a shift from
initial enthusiasm to a more measured approach, recognizing the
pronounced issues with LLMs in practical applications.
The aggregation of extensive knowledge bases and user queries,
often containing sensitive data, introduces substantial security vul-
nerabilities when processed by LLMs. Typical LLM applications,
such as third-party tailored domain-specic APIs, require signi-
cant computational resources and specialized hardware, often fa-
voring cloud deployments. This setup introduces various security
threats, including data exposure due to negligence or malicious
service providers, multi-tenant architecture risks, and the potential
misuse of sensitive user data for model renement. The lack of the-
oretical tools to mitigate the risk of LLMs inadvertently revealing
sensitive content further complicates the issue. As technological
applications deepen, data security emerges as a pivotal constraint
to the advancement of LLM technologies.
In this paper, we demonstrate the proposed EncChain—a novel
privacy preservation solution tailored for LLM applications, under-
pinned by condential data handling practices. The strategic appli-
cation of EncChain signicantly enhances data security measures
within LLM frameworks, diminishing the likelihood of unautho-
rized data access and exploitation. More specically, EncChain
exhibits the following key attributes:
Encrypted Knowledge Base and User Interactions: All knowl-
edge base and interaction records are encrypted using distinct keys
before leaving the secure perimeter, which ensures that information
remains perpetually in ciphered form, thereby precluding access to
its unencrypted counterpart, even for application architects.
Condential Data Computing Capability: EncChain provides
a suite of core functionalities, including condential knowledge
base loading, condential similarity search, condential prompt
generation, and condential large model inference. These capabili-
ties enable developers to handle and process encrypted data without
accessing plaintext, meeting the requirements for constructing busi-
ness logic while protecting data privacy and security.
Fine-grained Access Control: Through rigorous access control,
EncChain enforces precise user permissions for knowledge bases.
By dening roles like “questioner” and “knowledge base owner”
and assigning access based on unique identiers for these roles, it
mitigates unauthorized data access and potential exltration.
Streamlined Integration and Application: As a Python pack-
age, EncChain oers straightforward integration into third-party
applications, facilitating adoption by allowing developers to easily
incorporate its features. This ease of use, combined with support
4413
for both encrypted and plaintext queries, signicantly reduces the
complexity for developers new to the system.
Execution Safety in Trusted Environments: EncChain and its
associated LLMs are deployable within trusted execution environ-
ments, leveraging advanced hardware security features to safeguard
virtual machine memory privacy and integrity. This setup ensures
that sensitive data is shielded from both the host operating system
and the virtual machine manager, enhancing operational security.
Remote Attestation for Enhanced Trust: EncChain enables
the use of remote attestation technologies to conrm the security
and trustworthiness of the execution environments for itself and
the deployed LLM, providing users with additional condence in
the security measures of LLM applications.
2 PRELIMINARIES
Retrieval Augmented Generation. RAG [
3
] architecture rep-
resents a signicant advancement in addressing the challenge of
hallucination in LLMs, emerging as a dominant pattern in devel-
oping LLM applications, particularly enhancing logical reasoning
and data comprehension from private knowledge bases to augment
question-answering (QA) capabilities. It is pivotal in scenarios like
knowledge-based questioning and intelligent assistance. The RAG
framework involves segmenting private knowledge into embedding
vectors stored in a database. Upon receiving a question, the system
converts it into a vector, retrieves the most relevant knowledge via
vector similarity search, and merges this with the question to form
a comprehensive prompt for LLMs.
Trusted Execution Environment. TEEs [
4
,
5
] provide a corner-
stone technology by oering secure and isolated execution spaces
within processors, enhancing the security of data and code against
potential threats from compromised operating systems or hypervi-
sors in the complex landscape of cybersecurity and data privacy.
Within this spectrum, Intel’s Trust Domain Extensions [
2
] (TDX)
serve as an evolved form of TEEs, tailored to bring their benets
into the realm of virtualization. TDX introduces the concept of
trusted domains, in which virtual machines operate in isolation
with hardware-level protections. This innovation directly addresses
the intricate challenges of maintaining data privacy and security in
environments such as cloud computing and data centers.
3 EncChain SOLUTION
3.1 Threat Model
The RAG architecture in QA leads to two primary threats: unautho-
rized access and data exltration. Firstly, its reliance on plaintext
storage of knowledge bases and user queries permits developers un-
fettered access, creating a vector for data leaks in cases of malicious
intent or system compromise. Secondly, the architecture lacks rigor-
ous access controls, enabling users to potentially retrieve sensitive
information beyond their clearance through intentionally designed
queries. These threats collectively jeopardize data integrity and
condentiality, necessitating an immediate implementation of en-
hanced security protocols to mitigate the risks of unauthorized
access and ensure the privacy protection of LLM applications.
3.2 Architecture Overview
The EncChain architecture, delineated in Figure 1 for LLM appli-
cation deployment, emphasizes security and operational integrity.
Web Browser
Confidential VM
EncChain
Service
LLM
Service
Guest OS
3rd-party
Application
Legacy VM
Guest OS
Firmware
Other
Hardware
Intel TDX
CPU
Hypervisor
Host OS
Client Terminal
GPT-4
Chatbot
Figure 1: The architecture of the EncChain demonstration.
It treats the client terminal as secure, encrypting data before it ex-
its, protecting it during transmission. Third-party applications are
hosted on virtual machines (VMs), establishing a clear operational
divide. EncChain and its models operate within secure virtual
environments utilizing advanced VM technologies like TDX for
enhanced runtime security. These environments are reinforced by
hardware security extensions, safeguarding virtual memory from
unauthorized access by the host OS and hypervisor. Third-party
applications leverage EncChain’s APIs for encrypted data interac-
tions and secure business logic development. Remote attestation
technology allows users to verify the security of EncChain and
LLM environments, adding a layer of trust. EncChain’s security
protocol includes data encryption at domain entry and exit, strict
access control, and the synergistic use of secure VMs and remote at-
testation, providing a robust framework for secure LLM application
deployment, addressing the critical need for data security.
3.3 Fine-grained Knowledge Control
EncChain enhances privacy attributes in LLM applications using
RAG-based private knowledge base inference through the key ac-
tion of leveraging ne-grained knowledge control. This innovation,
derived from Operon’s privacy-protected data management [
6
],
embodies the concept of the Behavior Control List (BCL). Speci-
cally, EncChain allows “knowledge owners” to establish a binary
relationship between the “questioners” and the “knowledge bases.
Upon the questioner posing a question, triggering the LLM’s infer-
ence, EncChain ensures that the search for relevant knowledge
vectors occurs exclusively within an authorized subset of vector
databases, generating answers based on this relationship. It solves
the issue traditionally addressed either by employing multiple dis-
tinguished LLM instances to segregate knowledge for privacy pro-
tection (sacricing eciency and increasing costs) or by utilizing
a single system but facing privacy risks. EncChain’s innovation
lies in its ability to protect privacy while optimizing the retrieval
and integration process of knowledge, thereby nding an eective
equilibrium between privacy security and knowledge utilization.
3.4 System Workow
We present the procedural workow of EncChain through a spe-
cic example, as illustrated in Figure 2. In this scenario, we assume
four distinct roles:
A
knowledge base data owners;
B
question-
ers;
C
third-party software developers providing QA applications;
and
D
TEEs (e.g., cloud infrastructure) for deploying LLMs with
EncChain. We note that, in practical scenarios,
A
and
B
might
represent the same entity, or
B
could be a controlled party of
A
(for
4414
of 4
免费下载
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文档的来源(墨天轮),文档链接,文档作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论

关注
最新上传
暂无内容,敬请期待...
下载排行榜
Top250 周榜 月榜