CommandPrompt_Performance_Analysis_of_PostgreSQL_Data_Checksums_2019-09-02.pdf

小小亮

173

13页

0次

2022-10-11

免费下载

Performance Analysis of PostgreSQL data-checksums

Eric Worden

Command Prompt, Inc

May 2019

Executive Summary

This article explores the performance characteristics of the PostgreSQL data-checksums cluster

option, which provides powerful protection against hardware-level data corruption. I explain the

conceptual basis of the feature and also the less obvious technical details that explain its

complex performance impacts.

I provide the results of several carefully designed benchmarks that illustrate the common

workload types that bear significant additional CPU and disk IO load because of the feature, and

other workload types that do not bear an additional load. For a technical description of these

workloads, see the Technical Summary



at the end of the article.

Because hardware failure and low-level data corruption are an all-too-common occurrence in

database operations, and the data-checksums feature provides strong protection against such

occurrences, Command Prompt recommends that administrators plan for implementation of

clusters utilizing the feature, with foreknowledge of its performance impacts.

Introduction

Recently I have been working on PostgreSQL benchmarks for its data checksums feature. This

incredibly valuable option to initdb -- introduced in version 9.3 in 2013 -- allows quick detection

of corrupted disk data pages. It provides the glorious opportunity to simply failover to a standby

before your data becomes corrupted, rather than endure the horror of discovering the corruption

afterward and attempting to recover.

But people care as much about speed as safety, and the feature comes with a performance cost

since that data is checksummed every time it's read from or written to disk. Clients are

interested in the feature but they want to know the performance cost. Our answer so far has

been "we don't really know," and if an internet search is any indicator, no one really knows.

There is another challenge with this feature: for PostgreSQL versions lower than 12, it can only

be enabled when the cluster is first created. And if you create a cluster with this option enabled,

there is no way to disable it. PostgreSQL version 12 introduces a utility program pg_checksums



which enables or disables data checksums in a PostgreSQL cluster, but the cluster must be

offline while the utility runs. Clients want to understand the performance implications of this

feature before committing a new cluster to it, or before undertaking a migration to a new cluster

with the feature enabled.

Take note: Amazon enables data checksums on all



RDS PostgreSQL clusters, and on their

platform it is not



a configurable option. I intend to leave you with their same clarity and

confidence about this feature.

To find the answer about performance, my first instinct was to run benchmarks and get the

answer empirically. I did run benchmarks and I did get an answer. However, this was a really

tricky benchmark to set up in a meaningful way, and there are several different benchmarks

needed to understand this issue.

I took a step back.

PostgreSQL is complex software, but checksumming data is really simple. It's so simple that

you can checksum a big file and measure the time it takes, and then you can repeat the task

and be assured that it will take nearly the same amount of time. The contents of the file do not

affect the time taken, only the size of the data. This means that ultimately we could know the

CPU cost of the data checksums feature if we could know how much data would be

checksummed; this information is easy to obtain. It's more complicated than that though,

because this cost is directly linked to disk IO.

In order to translate this basic metric "total CPU time" of checksumming some data into

something of real-world value, we need to relate it to some other factors.

Let's explore:

How much data is checksummed?

PostgreSQL only checksums data as it is read or written to disk, so we want to know

specifically: how much is that? Remember this only matters for data



pages -- table data and

indexes -- not WAL files, which are not checksummed. This figure varies dramatically depending

on the system. Commonly, systems with well-sized hardware and correct configuration, 90%+

(even 99%) of pages are read from cache, not disk. Pages read/written to cache are not

checksummed. Even with caching and WAL logging, all data is eventually written to disk. Some

of 13

免费下载

postgresql

关注

评论