理解分布式系统中一致性和一致性级别

架构学习123 2021-04-29

1155

Alaska, Dec 2019

前两篇文章，我们重点分析了什么是数据库的隔离性（Isolation) 和隔离级别 (Isolation Level). 数据库的另外一个属性一致性 (Consistency) 讲的是什么？他跟CAP理论中的一致性 (Consistency) 有什么区别和联系？本文我们重点理解分布式系统中的一致性和一致性级别，下一篇文章我们将解释一致性 (Consistency) 和隔离性（Isolation) 的区别。

本文还发表在了Medium上。（点击阅读原文）谢谢大家支持！期待共同交流进步！

—

What is Consistency

In the last two posts, we reviewed what is the Isolation guarantee for database transactions. Among ACID properties, there’s another property - Consistency - which many people may be confused with the Isolation guarantee. In this post, we will review what is the Consistency guarantee.

Many of you may also have heard about different Consistency levels such as strict consistency, and eventual consistency. What are they and what are they used for? In this post we will also review them.

—

Consistency in ACID

Consistency, as a general term, has been mentioned as a property of the database transactions, as well as one of the three guarantees in CAP theorems. Are they talking about the same thing? If they are different. What is the difference?

The short answer is that they are totally different.

Firstly, the consistency property in ACID guarantees that the transaction executions will not violate any application defined semantics, such as an employee must be associated with a company (foreign key constraint), or an employee must have a non-null name (application specific constraints).

As we can see from the example, the Consistency guarantee in ACID is not really something that the database system can provide. It is more of the responsibility of application developers. The developers need to ensure the code will not violate any application specific semantics when the code is executed in the context of a transaction.

The consistency levels such as strict consistency, or eventual consistency is not referring to the consistency of a database transaction. In fact, it is used by the CAP theorem. Keep reading.

—

Consistency in CAP

The consistency guarantee in CAP theorem, on the other hand, tells how predictable a read can see the result of a write.

In the most strict form, the consistency guarantee means that every read receives the most recent write or an error, no matter where the write is performed. On the other hand, there are some less perfect consistency levels - where a read may not return the most recent write of the data. In the last two posts, we described different isolation levels and how people trade off the isolation guarantee for performance. In this post, we will also describe the different levels of consistencies and how people trade off them for performance.

In a non-distributed system, such as a single thread single data store system, consistency is not a problem. As you can imagine, it is trivial to guarantee that any read will get the most recent write in such a system.

However, it is not a trivial question any more in a distributed system where reads and writes occur from different places and data are replicated into different places. Traditionally, the consistency levels are described with the context of multi-thread shared memory. In this post, we will describe the problem based on the same context. However, the same idea should be applicable for distributed systems.

—

Consistency Levels in CAP

Strict Consistency

In theory, the most strict consistency level is called “Strict Consistency”. It assumes that every thread knows and agrees on the precise current time without any errors. With such assumption, all writes from all threads are sequentially ordered by the real time these writes are issued, and every read will see the most recent write in real time.

However, it is impossible in practice for threads or nodes in a distributed system to agree on a precise current time, hence the Strict Consistency is mostly limited to theoretical interest.

For example, in the following diagram, the read and write operations are ordered strictly by real time. The read of X from thread T2 happens before T1 writes X = 2, therefore, it receives X=0. The read of X from thread T3 happens after T1 writes X = 2, therefore, it receives X = 2.

Linearizability Atomic Consistency

The highest consistency that is achievable in practice is called Linearizability or Atomic Consistency. Like Strict Consistency, in Linearizability, all reads and writes are sequentially ordered by the real time. However, unlike Strict Consistency, Linearizability admits it takes time for a write or read operation to submit and complete. It doesn’t impose ordering constraints for operations that are overlapped in time with each other. In a word, Linearizability guarantees that reads see the most recent write that is not overlapped with it.

For example, in the following diagram, the first reads from thread T2 and T3 are overlapped with T1 write X=2, so it is possible that they may read X=0 or X=2. However, the second reads from thread T2 and T3 are not overlapped with T1 write X=2 operation, therefore, the second reads should only see X=2.

On the other hand, in the second diagram, the second read from thread T2 received X=0, it violated the Linearizability because the read doesn’t return the most recent non-overlapped write (from T1).

Sequential Consistency

Sequential Consistency is less strict than Linearizability. It doesn’t impose any real time constraints on reads and writes from different threads. However, it requires that all writes appear to take place in some order, and all threads agree on the order - see the writes occurring in that order.

In other words, it doesn’t enforce any particular order (such as real time order) that all threads should follow, but it requires that all writes are ordered in some “way”, where the “way” is agreed by all threads.

For example, in the following diagram, we can see the thread T1 write X = 1, and thread T2 write Y = 2. The Sequential Consistency doesn’t enforce that every thread must see write X = 1 before write Y = 2.

However, if thread T3 first sees X is updated to 1, then sees Y is still 0, then from the thread T3 perspective, the update of X = 1 happens before the update of Y = 2. In that case, all other threads should also see the updates in the same order that T3 sees.

If thread T4 first sees Y is updated to 2 but later sees X is still 0, then from the thread T4 perspective, the update of X = 1 happens after the update of Y = 2. The two threads disagree on the order of two updates, which violates the Sequential Consistency.

Causal Consistency

Causal Consistency is less strict than Sequential Consistency. Sequential Consistency enforces that all threads must agree on a global order, even if those writes are not related. Causal consistency, on the other hand, doesn’t enforce any constraint for unrelated writes, but it only enforces the order of writes that are causally related.

If a thread reads some data item (say X), then write the same data item (i.e. X) or another data item, we say the write is caused by the first read, and Causal Consistency enforce that all threads should observe the write of X before the write of Y.

In other words, if A causes B, all threads that see the result of B must see the result of A as well.

For example, in the following diagram, thread T2 reads X = 1 before writes Y = 2, then all threads that have already seen Y = 2 must see X = 1. In the second diagram, where T3 read Y = 2 first, but then read X = 0, which violates the Causal Consistency.

Eventual Consistency

Eventual Consistency is the most relaxed consistency level. It only guarantees that if there’s no writes for a “long” period of time (the “long” here is system dependent), all the threads converge on the last write of the data.

For example, in the following diagram, T3 reads X = 0 even after reading Y = 2, which violates the Causal Consistency. However, it doesn’t necessarily violate the Eventual Consistency as long as after a “long” period of time, all threads read X = 1, and read Y = 2.

From the second part of the following diagram, however, even after a “long” period of time, thread T2 still reads Y = 0, which violates the Eventual Consistency.

If Eventual Consistency is violated, we consider such a system doesn’t provide any consistency at all.

Summary

We provide a summary of all the consistency levels we discussed.

Strict Consistency

Most strict consistency level in theory. Not practicable. It assumes that there’s a precise current time that every thread agrees on.

Every read must see the most recent write in real time.

Linearizability

Atomic

Consistency

Most strict consistency level in practice. It acknowledges there’s a period of time between when an operation is started and when it is completed.

Every read must see the most recent write, if they are not overlapped, in real time.

Sequential

Consistency

No real time constraints. But all writes must be globally ordered in some way that all threads agree.

Causal

Consistency

No order constraints for unrelated writes. But there must be an order for causal related writes.

If A causes B, all threads that see the result of B must see the result of A as well.

Eventual

Consistency

No order constraints at all. But eventually all threads will converge.

Every thread must see the value of the last write eventually if there’s no write for a “long” period of time.

No Consistency

Not consistent at all.

Threads never agree on the value of the same data item.

—

Reference

http://dbmsmusings.blogspot.com/2019/07/overview-of-consistency-levels-in.html

数据库

文章转载自架构学习123，如果涉嫌侵权，请发送邮件至：contact@modb.pro进行举报，并提供相关证据，一经查实，墨天轮将立刻删除相关内容。

理解分布式系统中一致性和一致性级别

What is Consistency

Consistency in ACID

Consistency in CAP

Consistency Levels in CAP

Strict Consistency

Linearizability Atomic Consistency

Sequential Consistency

Causal Consistency

Eventual Consistency

Summary

Reference

评论