暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

分布式数据库系统的时序解决方案:TSO

原创 eygle 2019-09-05
11337

在一个分布式的数据库系统中,必须解决不同数据库之间事务的时序问题,MVCC 和 ACID 都需要确保事务的顺序。


数据库中常见的时序方案有:

Logic Lock

True Time

Hybrid Logic Clock

TSO - Timestamp Oracle

我们在此讨论一下  TSO 的实现。

TSO (Timestamp Oracle),即通过中心统一授时,通过中心授权可以保证按照递增的方式分配逻辑时钟,任何事件申请的时钟都不会重复,能够保证事务版本号的单调递增,确保分布式事务的时序。


TiDB作为国内开源分布式数据库的优秀代表,就采用了集中式的 TSO 服务来获取全局一致的版本号,TSO模块位于TiDB 全局中心总控节点 PD 中,PD通过集成 etcd ,保证了持久化数据的强一致性并且可以做到自动的 failover,解决了集中式服务带来的单点故障问题。

其文档描述如下:

The timestamp oracle plays a significant role in the Percolator Transaction model, it is a server that hands out timestamps in strictly increasing order, a property required for correct operation of the snapshot isolation protocol.

Since every transaction requires contacting the timestamp oracle twice, this service must scale well. The timestamp oracle periodically allocates a range of timestamps by writing the highest allocated timestamp to stable storage; then with that allocated range of timestamps, it can satisfy future requests strictly from memory. If the timestamp oracle restarts, the timestamps will jump forward to the maximum allocated timestamp. Timestamps never go "backwards".

To save RPC overhead (at the cost of increasing transaction latency) each timestamp requester batches timestamp requests across transactions by maintaining only one pending RPC to the oracle. As the oracle becomes more loaded, the batching naturally increases to compensate. Batching increases the scalability of the oracle but does not affect the timestamp guarantees.

The transaction protocol uses strictly increasing timestamps to guarantee that Get() returns all committed writes before the transaction’s start timestamp. To see how it provides this guarantee, consider a transaction R reading at timestamp TR and a transaction W that committed at timestamp TW < TR; we will show that R sees W’s writes. Since TW < TR, we know that the timestamp oracle gave out TW before or in the same batch as TR; hence, W requested TW before R received TR. We know that R can’t do reads before receiving its start timestamp TR and that W wrote locks before requesting its commit timestamp TW . Therefore, the above property guarantees that W must have at least written all its locks before R did any reads; R’s Get() will see either the fully committed write record or the lock, in which case W will block until the lock is released. Either way, W’s write is visible to R’s Get().

In our system, the timestamp oracle has been embeded into Placement Driver (PD). PD is the management component with a "God view" and is responsible for storing metadata and conducting load balancing.


在谷歌在2010发表的论文«Large-scale Incremental Processing Using Distributed Transactions and Notifications» 中,详细介绍了 Percolator 系统的实现,该系统也采用了 TSO 集中授时。

以下是其示范的代码实现:

GoogleGetTimestamp.jpg

论文参考:

https://www.modb.pro/doc/779


参考文献:

Tikv Timestamp Oracle

最后修改时间:2019-09-05 11:48:31
「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论