[译文]将 PostgreSQL 迁移到 Kubernetes

越来越多的公司正在采用 Kubernetes。对某些人来说，它是关于前沿的，对某些人来说，它是一个明确的战略和业务转型。世界各地的开发人员和运营团队都在努力移动对容器和 Kubernetes 不是云原生友好的应用程序。

迁移数据库始终是一个挑战，它会给企业带来风险和停机时间。今天，我将展示使用Percona Distribution for PostgreSQL Operator以最少的停机时间将 PostgreSQL 数据库迁移到 Kubernetes 是多么容易。

目标

要执行迁移，我将使用以下设置：

将 PostgreSQL 迁移到 Kubernetes

部署在本地或云中某处的 PostgreSQL 数据库。它将是源头。
Percona Operator 在其中部署和管理 PostgreSQL 集群（目标）和pgBackRest Pod 的Google Kubernetes Engine (GKE) 集群
PostgreSQL 备份和预写日志上传到某个对象存储桶（在我的情况下为 GCS）
pgBackRest Pod 从桶中读取数据
pgBackRest Pod 持续将数据恢复到 Kubernetes 中的 PostgreSQL 集群

数据应持续同步。最后，我想关闭在本地运行的 PostgreSQL，只将集群保留在 GKE 中。

移民

先决条件

要复制设置，您将需要以下内容：

在某处运行的 PostgreSQL（v 12 或 13）
安装了 pgBackRest
Google Cloud Storage 或任何 S3 存储桶。我的例子将是关于 GCS 的。
Kubernetes集群

配置源

我在一些 Linux 机器上运行了Percona Distribution for PostgreSQL版本 13。

1.配置pgBackrest

# cat /etc/pgbackrest.conf[global]log-level-console=infolog-level-file=debugstart-fast=y[db]pg1-path=/var/lib/postgresql/13/mainrepo1-type=gcsrepo1-gcs-bucket=sp-test-1repo1-gcs-key=/tmp/gcs.keyrepo1-path=/on-prem-pg

# cat /etc/pgbackrest.conf

[global]

log-level-console=info

log-level-file=debug

start-fast=y

[db]

pg1-path=/var/lib/postgresql/13/main

repo1-type=gcs

repo1-gcs-bucket=sp-test-1

repo1-gcs-key=/tmp/gcs.key

repo1-path=/on-prem-pg

pg1-path 应该指向 PostgreSQL 数据目录
repo1-type 设置为 GCS 因为我们希望我们的备份去那里
密钥在 /tmp/gcs.key 文件中。可以通过 Google Cloud UI 获取密钥。在此处阅读更多相关信息。
备份将存储在 sp-test-1 存储桶中的 on-prem-pg 文件夹中

2. 编辑 postgresql .conf 配置以通过 pgBackrest 启用存档

archive_mode = on   archive_command = 'pgbackrest --stanza=db archive-push %p'

1 2	archive_mode = on archive_command = 'pgbackrest --stanza=db archive-push %p'

更改配置后需要重启。

3. 操作员需要在data目录下有 postgresql .conf文件。有一个空文件就足够了：

touch /var/lib/postgresql/13/main/postgresql.conf

1	touch /var/lib/postgresql/13/main/postgresql.conf

4. primaryuser 必须在来源创建，以确保复制由运营商设置正确。

# create user primaryuser with encrypted password '<PRIMARYUSER PASSWORD>' replication;

1	# create user primaryuser with encrypted password '' replication;

配置目标

1. 在 Kubernetes 上部署 Percona Distribution for PostgreSQL Operator。在此处的文档中阅读有关它的更多信息。

# create the namespacekubectl create namespace pgo# clone the git repositorygit clone -b v0.2.0 https://github.com/percona/percona-postgresql-operator/cd percona-postgresql-operator# deploy the operatorkubectl apply -f deploy/operator.yaml

# create the namespace

kubectl create namespace pgo

# clone the git repository

git clone -b v0.2.0 https://github.com/percona/percona-postgresql-operator/

cd percona-postgresql-operator

# deploy the operator

kubectl apply -f deploy/operator.yaml

2. 编辑主要的自定义资源清单——deploy/cr.yaml。

我不会更改集群名称并将其保留为 cluster1
集群将在待机模式下运行，这意味着它将同步来自 GCS 存储桶的数据。将spec .standby设置为 true。
配置 GCS 本身。 spec .backup部分看起来像这样（ bucket 和 repoPath与上面的 pgBackrest 配置相同）

 backup:...    repoPath: "/on-prem-pg"...    storages:      my-s3:        type: gcs        endpointUrl: https://storage.googleapis.com        region: us-central1-a        uriStyle: path        verifyTLS: false        bucket: sp-test-1    storageTypes: [      "gcs"    ]

backup:

...

repoPath: "/on-prem-pg"

...

storages:

my-s3:

type: gcs

endpointUrl: https://storage.googleapis.com

region: us-central1-a

uriStyle: path

verifyTLS: false

bucket: sp-test-1

storageTypes: [

"gcs"

]

我希望在我的 PostgreSQL 集群中至少有一个副本。设置规范.pgReplicas .hotStandby .size 1。

3. 操作员应该能够通过 GCS 进行身份验证。要做到这一点，我们需要创建一个名为机密对象 < CLUSTERNAME > -靠背-回购-配置与 GCS -关键数据。它应该与我们在 Source 上使用的键相同。在此处查看此秘密的示例。

kubectl apply -f gcs

1	kubectl apply -f gcs.yaml

4. 通过创建 Secret 对象来创建用户： postgres 和 primaryuser（我们在 Source 上创建的那个）。在此处查看用户 Secrets 的示例。密码应与源上的相同

kubectl apply -f users.yaml

1	kubectl apply -f users.yaml

5. 现在让我们通过应用cr .yaml在 Kubernetes 上部署我们的集群：

kubectl apply -f deploy/cr.yaml

1	kubectl apply -f deploy/cr.yaml

验证和故障排除

如果一切正常，您应该在主 Pod 日志中看到以下内容：

kubectl -n pgo logs -f --tail=20 cluster1-5dfb96f77d-7m2rs2021-07-30 10:41:08,286 INFO: Reaped pid=548, exit status=02021-07-30 10:41:08,298 INFO: establishing a new patroni connection to the postgres cluster2021-07-30 10:41:08,359 INFO: initialized a new clusterFri Jul 30 10:41:09 UTC 2021 INFO: PGHA_INIT is 'true', waiting to initialize as primaryFri Jul 30 10:41:09 UTC 2021 INFO: Node cluster1-5dfb96f77d-7m2rs fully initialized for cluster cluster1 and is ready for use2021-07-30 10:41:18,781 INFO: Lock owner: cluster1-5dfb96f77d-7m2rs; I am cluster1-5dfb96f77d-7m2rs                                 2021-07-30 10:41:18,810 INFO: no action.  i am the standby leader with the lock                                                     2021-07-30 10:41:28,781 INFO: Lock owner: cluster1-5dfb96f77d-7m2rs; I am cluster1-5dfb96f77d-7m2rs                                 2021-07-30 10:41:28,832 INFO: no action.  i am the standby leader with the lock

kubectl -n pgo logs -f --tail=20 cluster1-5dfb96f77d-7m2rs

2021-07-30 10:41:08,286 INFO: Reaped pid=548, exit status=0

2021-07-30 10:41:08,298 INFO: establishing a new patroni connection to the postgres cluster

2021-07-30 10:41:08,359 INFO: initialized a new cluster

Fri Jul 30 10:41:09 UTC 2021 INFO: PGHA_INIT is 'true', waiting to initialize as primary

Fri Jul 30 10:41:09 UTC 2021 INFO: Node cluster1-5dfb96f77d-7m2rs fully initialized for cluster cluster1 and is ready for use

2021-07-30 10:41:18,781 INFO: Lock owner: cluster1-5dfb96f77d-7m2rs; I am cluster1-5dfb96f77d-7m2rs 2021-07-30 10:41:18,810 INFO: no action. i am the standby leader with the lock 2021-07-30 10:41:28,781 INFO: Lock owner: cluster1-5dfb96f77d-7m2rs; I am cluster1-5dfb96f77d-7m2rs 2021-07-30 10:41:28,832 INFO: no action. i am the standby leader with the lock

更改 Source 上的一些数据并确保它正确同步到 Target 集群。

常见问题

以下错误消息表明您忘记在数据目录中创建 postgresql .conf文件：

FileNotFoundError: [Errno 2] No such file or directory: '/pgdata/cluster1/postgresql.conf' -> '/pgdata/cluster1/postgresql.base.conf'

1	FileNotFoundError: [Errno 2] No such file or directory: '/pgdata/cluster1/postgresql.conf' -> '/pgdata/cluster1/postgresql.base.conf'

有时很容易忘记创建主用户并在日志中看到以下内容：

psycopg2.OperationalError: FATAL:  password authentication failed for user "primaryuser"

1	psycopg2.OperationalError: FATAL: password authentication failed for user "primaryuser"

错误或丢失的对象存储凭据将触发以下错误：

WARN: repo1: [CryptoError] unable to load info file '/on-prem-pg/backup/db/backup.info' or '/on-prem-pg/backup/db/backup.info.copy':      CryptoError: raised from remote-0 protocol on 'cluster1-backrest-shared-repo': unable to read PEM: [218529960] wrong tag            HINT: is or was the repo encrypted?                                                                                                 CryptoError: raised from remote-0 protocol on 'cluster1-backrest-shared-repo': unable to read PEM: [218595386] nested asn1 error      HINT: is or was the repo encrypted?      HINT: backup.info cannot be opened and is required to perform a backup.      HINT: has a stanza-create been performed?ERROR: [075]: no backup set found to restoreFri Jul 30 10:54:00 UTC 2021 ERROR: pgBackRest standby Creation: pgBackRest restore failed when creating standby

WARN: repo1: [CryptoError] unable to load info file '/on-prem-pg/backup/db/backup.info' or '/on-prem-pg/backup/db/backup.info.copy': CryptoError: raised from remote-0 protocol on 'cluster1-backrest-shared-repo': unable to read PEM: [218529960] wrong tag HINT: is or was the repo encrypted? CryptoError: raised from remote-0 protocol on 'cluster1-backrest-shared-repo': unable to read PEM: [218595386] nested asn1 error

HINT: is or was the repo encrypted?

HINT: backup.info cannot be opened and is required to perform a backup.

HINT: has a stanza-create been performed?

ERROR: [075]: no backup set found to restore

Fri Jul 30 10:54:00 UTC 2021 ERROR: pgBackRest standby Creation: pgBackRest restore failed when creating standby

切换

一切看起来都很好，是时候执行转换了。在这篇博文中，我只介绍了数据库方面，但不要忘记您的应用程序应该重新配置为指向正确的 PostgreSQL 集群。在切换之前停止应用程序可能是个好主意。

1. 停止源PostgreSQL集群，确保没有数据写入

systemctl stop postgresql

1	systemctl stop postgresql

2. 将目标集群提升为主集群。为此，请删除 spec .backup .repoPath，在 deploy / cr .yaml 中将spec .standby更改为false ，然后应用更改：

kubectl apply -f deploy/cr.yaml

1	kubectl apply -f deploy/cr.yaml

PostgreSQL 将自动重新启动，您将在日志中看到以下内容：

2021-07-30 11:16:20,020 INFO: updated leader lock during promote2021-07-30 11:16:20,025 INFO: Changed archive_mode from on to True (restart might be required)2021-07-30 11:16:20,025 INFO: Changed max_wal_senders from 10 to 6 (restart might be required)2021-07-30 11:16:20,027 INFO: Reloading PostgreSQL configuration.server signaled2021-07-30 11:16:21,037 INFO: Lock owner: cluster1-5dfb96f77d-n4c79; I am cluster1-5dfb96f77d-n4c792021-07-30 11:16:21,132 INFO: no action.  i am the leader with the lock

2021-07-30 11:16:20,020 INFO: updated leader lock during promote

2021-07-30 11:16:20,025 INFO: Changed archive_mode from on to True (restart might be required)

2021-07-30 11:16:20,025 INFO: Changed max_wal_senders from 10 to 6 (restart might be required)

2021-07-30 11:16:20,027 INFO: Reloading PostgreSQL configuration.

server signaled

2021-07-30 11:16:21,037 INFO: Lock owner: cluster1-5dfb96f77d-n4c79; I am cluster1-5dfb96f77d-n4c79

2021-07-30 11:16:21,132 INFO: no action. i am the leader with the lock

结论

部署和管理数据库集群并非易事。最近发布的 Percona Distribution for PostgreSQL Operator 自动化了第 1 天和第 2 天的操作，并将在 Kubernetes 上运行 PostgreSQL 变成了一个顺利而愉快的旅程。

随着 Kubernetes 成为默认控制平面，开发人员和运维团队最常见的任务是执行迁移，这通常会变成一个复杂的项目。这篇博文表明，数据库迁移是一项简单的任务，停机时间最短。

我们鼓励您试用我们的运营商。请参阅我们的github 存储库并查看文档。

发现错误或有功能想法？随意在JIRA 中提交。

对于一般问题，请在社区论坛中提出主题。

您是一名开发人员并希望做出贡献吗？请阅读我们的CONTRIBUTING.md并发送拉取请求。

Percona Distribution for PostgreSQL 在单个发行版中提供来自开源社区的最佳和最关键的企业组件，经过设计和测试可以协同工作。

立即下载 PostgreSQL 的 Percona 发行版！

文章来源：https://www.percona.com/blog/migrating-postgresql-to-kubernetes

[译文]将 PostgreSQL 迁移到 Kubernetes

目标

移民

先决条件

配置源

配置目标

验证和故障排除

常见问题

切换

结论

评论