
strong consistency to eventual consistency, allowing the system to
balance between consistency, availability, and partition tolerance
as dictated by the CAP theorem. 3) Distributed Transactions:
Support for transactions across multiple nodes is often provided,
although it may come with trade-os in terms of performance and
scalability. 4) Query Processing: They can execute queries across
nodes eciently, often optimizing query execution plans to mini-
mize network trac and data movement. Table 1 outlines dierent
mechanisms of popular distributed databases, and Figure 1 illus-
trates three distinct architectures of native distributed databases.
(a) Shared-Nothing (b) Shared-Nothing Disaggr. (c) Shared-Storage
Figure 1: Architectures of Native Distributed Databases
2.2 Data Replication and Synchronization
2.2.1 Data Replication. Replication in distributed databases strikes
a balance between synchronous methods, which ensure immediate
consistency but result in slower writes, and asynchronous methods,
which allow for faster access but may introduce data discrepancies.
The level of replication—be it at the row, block, or le level—is
chosen based on specic needs. Asynchronous replication resolves
conicts using timestamps or custom logic to ensure data accu-
racy. Techniques like asymmetric-partition replication can reduce
system load [
15
], whereas solutions like BatchDB [
18
] improve
OLTP/OLAP workloads by pairing logical replication with a lazy
strategy for enhanced performance.
2.2.2 Data Synchronization. Distributed databases balance consis-
tency and performance using models like eventual consistency [
12
],
which tolerates short-term discrepancies for assured long-term ac-
curacy. Data sync frequency and network latency [
26
] are crucial
to this consistency, inuencing system design for performance op-
timization. Moreover, to handle simultaneous transactions and data
conicts, strategies such as version control and timestamp [
29
] help
maintain orderly data sync and ensure steadfast consistency.
2.2.3 Challenges. In distributed databases, especially those span-
ning wide areas, data synchronization is essential but challenged
by bandwidth limits, network latency and uctuation, aecting
eciency and performance [
24
]. Optimizing bandwidth and main-
taining swift recovery post-failure are crucial for data integrity and
loss prevention. Ensuring transactional consistency amid partitions
and securing data against unauthorized changes during replication
are key hurdles. However, as demands for performance rise, evolv-
ing technologies are progressively tackling these issues, enhancing
the robustness and reliability of distributed database systems.
2.3 Consistency Mo dels
The data consistency models in distributed databases crucially im-
pact performance, reliability, and availability, as depicted in Figure 2.
2.3.1 Strong and Eventual Consistency. Strong consistency in dis-
tributed systems ensures that operations are immediately visible
and executed in sequence, providing a seamless experience but
potentially limiting performance due to the need for node syn-
chronization. Systems such as Calvin [
26
], PolarDB [
28
], and Ge-
oGauss [
34
] have improved transactional eciency and replication
to deliver this consistent state without signicantly aecting speed
or scalability. In contrast, eventual consistency allows for short-
term data anomalies in exchange for better responsiveness, with
systems like Dynamo [
5
] managing high-performance demands
through application-level conict resolution. BlockchainDB [
8
] in-
novatively combines the exibility of databases with the strength
of blockchain technology to oer a range of consistency levels, thus
optimizing data management for a variety of operational contexts.
2.3.2 Other Consistency Models. Causal consistency improves upon
eventual consistency by ensuring causally related operations fol-
low the same order across nodes, while independent operations
are not strictly ordered. Eciency gains come from minimizing
dependency checks and dening external causal relationships [
2
].
GentleRain [
7
] increases throughput with time-based protocols and
uses physical timestamps to save on storage and communication.
Orbe [
6
] leverages dependency matrices and transitive causality
for eective causal consistency in key-value systems. Achieving
strong consistency in partitioned, replicated systems is challenging.
Google Spanner [
1
] clusters servers and uses Paxos for log repli-
cation within groups, maintaining a consistent prex order across
data replicas to uphold its consistency standard.
2.3.3 Challenges. Choosing the right consistency for distributed
databases is crucial for balancing performance, availability, and
precision. Financial systems often require strong consistency, while
CDNs may opt for eventual consistency, accepting brief data discrep-
ancies. The CAP theorem advises a trade-o between consistency,
availability, and partition tolerance, inuenced by application needs.
Databases like OceanBase [
32
] adapt consistency options for diverse
scenarios. Data replication, key for fault tolerance, faces latency
issues with synchronous methods and potential inconsistency with
asynchronous ones. Developers must navigate these complexities
to ensure optimal system performance and data reliability.
2.4 Distributed Transactions
2.4.1 Distributed Transaction Commit Protocols. Native distributed
databases employ distributed transaction commit protocols to up-
hold the ACID properties across distributed nodes, ensuring data
integrity and consistency. The 2PC protocol is a classic example, op-
erating in two distinct stages. ROCOCO [
19
] optimizes this process
by treating transactions as collections of atomic blocks, tracking
dependencies before execution to allow for serializable ordering
upon commit. Primo [
11
] avoids concurrency conicts by ensuring
transactions are conict-free after the commit phase. We proposed
OceanBase 2PC [
30
], a Paxos-enhanced 2PC protocol, to strengthen
fault tolerance and reduce transaction latency in distributed envi-
ronments, streamlining synchronizations for eciency.
2.4.2 Distributed Version Control. One of the core principles of
SAP HANA is full support for distributed query capabilities and
horizontal expansion [
14
]. It employs MVCC to provide distributed
4218
文档被以下合辑收录
评论