暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

使用FerretDB将MongoDB集合迁移到CockroachDB

原创 eternity 2022-09-20
1206

动机

完成第一篇文章后,我意识到通过CockroachDB支持的FerretDB代理MongoDB集合有很多可能性。CockroachDB具有独特的数据定位功能,可通过多区域抽象、反转索引和部分索引、计算列,当然还有强大的一致性。今天,我们将通过FerretDB将MongoDB集合恢复为CockroachDB,并扩展我们之前的学习。

高级步骤

  • 启动9节点多区域集群(CockroachDB专用)

  • 启动FerretDB(Docker)

  • 使用mongorestore恢复MongoDB数据集

  • 考虑

  • 结论

逐步说明

启动9节点多区域集群(CockroachDB专用)

在本教程中,我将使用CockroachDB专用集群,因为我需要访问多区域集群。您可以通过cockroach demo使用本地多区域集群——没有示例数据库——全局——不安全——节点9,但由于我将摄入大量数据,这可能不可行。这也是一个演示FerretDB如何使用CockroachDB证书身份验证的机会。我将我的集群称为artem mr,并使用我们全新的云CLI访问集群。

ccloud cluster sql artem-mr
Retrieving cluster info: succeeded
 Downloading cluster cert to /Users/artem/Library/CockroachCloud/certs/artem-mr-ca.crt: succeeded
Retrieving SQL user list: succeeded
SQL username: artem
SQL user password: *****
Starting CockroachDB SQL shell...
#
# Welcome to the CockroachDB SQL shell.
# All statements must be terminated by a semicolon.
# To exit, type: \q.
#
artem@artem-mr-7xw.aws-us-east-1.cockroachlabs.cloud:26257/defaultdb>

让我们快速浏览一下参与集群的地区:

SHOW REGIONS FROM CLUSTER;
     region     |                     zones
----------------+-------------------------------------------------
  aws-us-east-1 | {aws-us-east-1a,aws-us-east-1b,aws-us-east-1c}
  aws-us-east-2 | {aws-us-east-2a,aws-us-east-2b,aws-us-east-2c}
  aws-us-west-2 | {aws-us-west-2a,aws-us-west-2b,aws-us-west-2c}

让我们验证访问集群的区域:

select gateway_region();
 gateway_region
------------------
  aws-us-east-1

让我们创建一个数据库,用于托管MongoDB数据。

CREATE DATABASE ferretdb;
USE ferretdb;

启动FerretDB(Docker)

我将在本地启动FerretDB的Docker Compose实例。对于连接字符串,我们需要导航到Cockroach Cloud控制台并切换连接模式。在连接字符串下,我们可以找到填写组合文件所需的所有必要信息。

微信图片_20220918200244.png

version: "3"
services:

  ferretdb:
    image: ghcr.io/ferretdb/ferretdb:latest
    hostname: 'ferretdb'
    container_name: 'ferretdb'
    restart: 'on-failure'
    command:
      [
        '-listen-addr=:27017',
        ## Dedicated multiregion cluster
        '-postgresql-url=postgresql://artem:password@artem-mr-7xw.aws-us-east-1.cockroachlabs.cloud:26257/ferretdb?sslmode=verify-full&sslrootcert=/certs/artem-mr-ca.crt'
      ]
    ports:
      - 27017:27017
    volumes:
       - /Users/artem/Library/CockroachCloud/certs/artem-mr-ca.crt:/certs/artem-mr-ca.crt

请记住,我们需要在连接字符串中指定适当的数据库。因为我使用的是ferretdb,所以我将其包含在我的撰写文件中。

我们还需要安装连接到CockroachDB所需的CA证书。它是我第一次通过CLI连接时自动下载的。

将文件保存为docker compose。yml并发出docker compose-up-d命令来启动容器。

docker compose up -d
[+] Running 1/1
 ⠿ Container ferretdb  Started  

检查FerretDB容器日志可能是个好主意,以确保一切正常。

docker logs ferretdb
2022-09-08T13:51:35.456Z        INFO    pgdb    v4@v4.16.1/conn.go:354  Exec    {"sql": ";", "args": [], "time": "17.351386ms", "commandTag": "", "pid": 1849370}
2022-09-08T13:51:35.456Z        DEBUG   // 172.24.0.1:63788 -> 172.24.0.2:27017         clientconn/conn.go:437Response header: length:   190, id:   49, response_to: 5362, opcode: OP_MSG
2022-09-08T13:51:35.457Z        DEBUG   // 172.24.0.1:63788 -> 172.24.0.2:27017         clientconn/conn.go:438Response message:
{
  "Checksum": 0,
  "FlagBits": 0,
  "Sections": [
    {
      "Document": {
        "$k": [
          "ismaster",
          "maxBsonObjectSize",
          "maxMessageSizeBytes",
          "maxWriteBatchSize",
          "localTime",
          "minWireVersion",
          "maxWireVersion",
          "readOnly",
          "ok"
        ],
        "ismaster": true,
        "maxBsonObjectSize": 16777216,
        "maxMessageSizeBytes": 48000000,
        "maxWriteBatchSize": 100000,
        "localTime": {
          "$d": 1662645095456
        },
        "minWireVersion": 13,
        "maxWireVersion": 13,
        "readOnly": false,
        "ok": {
          "$f": 1
        }
      },
      "Kind": 0
    }
  ]
}

既然我们没有看到任何明显的错误,让我们继续。

本地安装mongosh或在Docker Compose中为mongosh添加另一个容器。请参阅我上一篇文章中的示例。

我在本地安装了mongosh。我们可以通过在终端发布mongosh来访问集装箱:

mongosh

或以下:

mongosh mongodb://localhost/
Current Mongosh Log ID: 631a77e81460be94daef222e
Connecting to:          mongodb://localhost/?directConnection=true&serverSelectionTimeoutMS=2000&appName=mongosh+1.5.4
Using MongoDB:          6.0.42
Using Mongosh:          1.5.4

For mongosh info see: https://docs.mongodb.com/mongodb-shell/

------
   The server generated these startup warnings when booting
   2022-09-08T23:16:58.426Z: Powered by FerretDB v0.5.3 and PostgreSQL 13.0.0.
   2022-09-08T23:16:58.426Z: Please star us on GitHub: https://github.com/FerretDB/FerretDB
------

test> 

按照说明在主机上安装Mongo工具。您也可以在安装了这些工具的情况下添加另一个容器,但我将把它作为作业留给您。

brew tap mongodb/brew
brew install mongodb-database-tools

让我们将数据加载到数据库中。我们将使用Mflix数据集,但任何数据集都可以。

在新终端窗口中:

wget https://atlas-education.s3.amazonaws.com/sampledata.archive

使用mongorestore恢复MongoDB数据集

使用我们之前安装的mongodb数据库工具包将存档恢复到FerretDB中。

mongorestore --archive=sampledata.archive
2022-09-08T09:57:55.587-0400    preparing collections to restore from
2022-09-08T09:57:55.595-0400    reading metadata for sample_mflix.comments from archive 'sampledata.archive'
2022-09-08T09:57:55.596-0400    reading metadata for sample_mflix.users from archive 'sampledata.archive'
2022-09-08T09:57:55.596-0400    reading metadata for sample_training.grades from archive 'sampledata.archive'
2022-09-08T09:57:55.596-0400    reading metadata for sample_airbnb.listingsAndReviews from archive 'sampledata.archive'
2022-09-08T09:57:55.596-0400    reading metadata for sample_analytics.customers from archive 'sampledata.archive'
2022-09-08T09:57:55.596-0400    reading metadata for sample_restaurants.restaurants from archive 'sampledata.archive'
2022-09-08T09:57:55.596-0400    reading metadata for sample_restaurants.neighborhoods from archive 'sampledata.archive'
...
2022-09-08T09:59:40.537-0400    sample_mflix.comments  278KB
2022-09-08T09:59:43.536-0400    sample_mflix.comments  278KB
2022-09-08T09:59:46.536-0400    sample_mflix.comments  278KB
2022-09-08T09:59:49.536-0400    sample_mflix.comments  278KB
2022-09-08T09:59:52.537-0400    sample_mflix.comments  278KB

我的FerretDB位于新泽西州北部的某个地方,我们的CockroachDB网关区域是aws-us-east-1。回想一下,我们在设置Docker Compose时指定了它。它可能有一些网络延迟。

下面是我们可以使用Cockroach CLI进行的快速测试:

SELECT 1;
?column?
------------
         1
(1 row)

Time: 20ms total (execution 1ms / network 19ms)

我们正在处理19ms的往返时间,以及通过FerretDB Docker容器代理MongoDB的事实。让我们做一个快速的检查。在运行mongosh的终端中,发出以下命令:

use sample_mflix
show collections
test> use sample_mflix
switched to db sample_mflix
sample_mflix> show collections
comments

让我们快速计算行数:

db.comments.count()
4257

查看数据集的描述,我们可以看到这些行是否存在:

db.comments.findOne({ "name": "Andrea Le" })
{
  _id: ObjectId("5a9427648b0beebeb6957ef5"),
  name: 'Andrea Le',
  email: 'andrea_le@fakegmail.com',
  movie_id: ObjectId("573a1392f29313caabcda653"),
  text: 'Odio expedita impedit sed provident at. Mollitia distinctio laborum optio earum voluptates recusandae ad. Voluptates quas et placeat atque.',
  date: ISODate("1973-08-07T04:00:34.000Z")
}

示例数据集为304MB。我不确定记录总数,但我知道恢复需要一段时间。您的里程可能会有所不同,因此请继续。

2022-09-08T10:34:07.610-0400    sample_mflix.comments  2.17MB
2022-09-08T10:34:09.218-0400    sample_mflix.comments  2.17MB
2022-09-08T10:34:09.218-0400    finished restoring sample_mflix.comments (7000 documents, 0 failures)
2022-09-08T10:34:09.218-0400    Failed: sample_mflix.comments: error restoring from archive 'sampledata.archive': (InternalError) [pool.go:361 pgdb.(*Pool).InTransaction] read tcp 172.24.0.2:55922->54.208.245.52:26257: read: connection reset by peer
2022-09-08T10:34:09.219-0400    7000 document(s) restored successfully. 0 document(s) failed to restore.

好,让我暂停一下,只说几句话,忽略由于网络错误而中断的恢复:我感到震惊的是,首先,FerretDB无缝且毫不费力地代理MongoDB集合,并将其恢复到CockroachDB的边缘!我敢说,这是迄今为止我从事的最简单的迁移!

我想我在评论集里有足够的记录。让我们分别从数据集中恢复其他集合。

mongorestore --archive=sampledata.archive --nsInclude=sample_mflix.movies
mongorestore --archive=sampledata.archive --nsInclude=sample_mflix.sessions
mongorestore --archive=sampledata.archive --nsInclude=sample_mflix.theaters
mongorestore --archive=sampledata.archive --nsInclude=sample_mflix.users

幸运的是,其余的集合足够小,可以在短时间内加载。让我向您展示成功恢复的情况。

2022-09-08T14:53:57.086-0400    sample_mflix.users  28.9KB
2022-09-08T14:54:00.086-0400    sample_mflix.users  28.9KB
2022-09-08T14:54:01.347-0400    sample_mflix.users  28.9KB
2022-09-08T14:54:01.347-0400    finished restoring sample_mflix.users (185 documents, 0 failures)
2022-09-08T14:54:01.373-0400    restoring indexes for collection sample_mflix.users from metadata
2022-09-08T14:54:01.373-0400    index: &idx.IndexDocument{Options:primitive.M{"name":"email_1", "unique":true, "v":2}, Key:primitive.D{primitive.E{Key:"email", Value:1}}, PartialFilterExpression:primitive.D(nil)}
2022-09-08T14:54:01.376-0400    185 document(s) restored successfully. 0 document(s) failed to restore.

让我们将mongosh中的文档计数与CockroachDB中的计数进行比较。

sample_mflix> db.comments.count()
7340
sample_mflix> db.movies.count()
3599
sample_mflix> db.sessions.count()
1
sample_mflix> db.theaters.count()
1564
sample_mflix> db.users.count()
185
show tables;
 schema_name  |     table_name     | type  | owner | estimated_row_count | locality
---------------+--------------------+-------+-------+---------------------+-----------
  sample_mflix | _ferretdb_settings | table | artem |                   1 | NULL
  sample_mflix | comments_5886d2d7  | table | artem |                6027 | NULL
  sample_mflix | movies_257fbbf4    | table | artem |                3599 | NULL
  sample_mflix | sessions_130573cc  | table | artem |                   1 | NULL
  sample_mflix | theaters_cf846063  | table | artem |                1493 | NULL
  sample_mflix | users_5e7cc513     | table | artem |                 185 | NULL

我不信任estimated_row_count字段。让我们快速浏览一下表计数。

select count(*) from sample_mflix.comments_5886d2d7;
  count
---------
   7340

select count(*) from sample_mflix.movies_257fbbf4;
  count
---------
   3599

select count(*) from sample_mflix.sessions_130573cc;
  count
---------
      1

select count(*) from sample_mflix.theaters_cf846063;
  count
---------
   1564

select count(*) from sample_mflix.users_5e7cc513;
  count
---------
    185

看起来一切都匹配。让我们检查模式。

SHOW CREATE TABLE sample_mflix.comments_5886d2d7;
            table_name           |                       create_statement
---------------------------------+----------------------------------------------------------------
  sample_mflix.comments_5886d2d7 | CREATE TABLE sample_mflix.comments_5886d2d7 (
                                 |     _jsonb JSONB NULL,
                                 |     rowid INT8 NOT VISIBLE NOT NULL DEFAULT unique_rowid(),
                                 |     CONSTRAINT comments_5886d2d7_pkey PRIMARY KEY (rowid ASC)

让我们提取CREATETABLE语句并进行一些修改。

SELECT create_statement FROM [SHOW CREATE TABLE sample_mflix.comments_5886d2d7];
CREATE TABLE sample_mflix.comments_5886d2d7 (
      _jsonb JSONB NULL,
      rowid INT8 NOT VISIBLE NOT NULL DEFAULT unique_rowid(),
      CONSTRAINT comments_5886d2d7_pkey PRIMARY KEY (rowid ASC)
  )

我们看到一个rowid列,如果没有明确指定,它是CockroachDB中的主键,还有一个包含文档的_jsonb列。我们将用存储在JSON文档中的id替换rowid。我们将使用计算列来提取id。Mongo ObjectID看起来像一个散列。我们将把它传递给一个md5()函数。

select md5(_jsonb->>'_id') from sample_mflix.comments_5886d2d7 limit 5;
               md5
------------------------------------
  e15a7dcdd0e149effb3305129a924195

以下步骤是灵活CockroachDB模式模型的一部分,它是MongoDB无模式模型的补充。

考虑到这是一个多区域集群,在后台执行以下架构更改可能需要很长时间。为了加快这些更改,以下查询将系统数据库固定到本地区域,并使模式更改更快。

ALTER DATABASE system CONFIGURE ZONE USING constraints = '{"+region=aws-us-east-1": 1}', lease_preferences = '[[+region=aws-us-east-1]]';

向现有表中添加新列:

ALTER TABLE sample_mflix.comments_5886d2d7 ADD COLUMN id STRING NOT NULL AS (md5(_jsonb->>'_id')::STRING) VIRTUAL;

现在,我们可以将主键从内部行ID更改为新的计算列,但在此之前,我们先截断表。这不是必要的,但我们还是这样做吧,因为在将数据加载到CockroachDB时还有其他因素需要考虑。我将很快谈论它。

TRUNCATE TABLE sample_mflix.comments_5886d2d7;

让我们将主键从内部rowid列更改为新添加的列。

ALTER TABLE sample_mflix.comments_5886d2d7 ALTER PRIMARY KEY USING COLUMNS (id);

最后,让我们删除rowid列,因为我们不再需要它。

set sql_safe_updates = false;
ALTER TABLE sample_mflix.comments_5886d2d7 DROP COLUMN rowid;
set sql_safe_updates = true;

让我们再次查看模式:

SELECT create_statement FROM [SHOW CREATE TABLE sample_mflix.comments_5886d2d7];
                          create_statement
---------------------------------------------------------------------
  CREATE TABLE sample_mflix.comments_5886d2d7 (
      _jsonb JSONB NULL,
      id STRING NOT NULL AS (md5(_jsonb->>'_id':::STRING)) VIRTUAL,
      CONSTRAINT comments_5886d2d7_pkey PRIMARY KEY (id ASC)
  )

我们返回了MongoDB中的行为,并引用了ObjectID。现在剩下的是恢复comments集合。

基于我们现在基于MongoDB ObjectID强制执行主键的事实,我们可以利用CockroachDB一致性并防止以下错误:

2022-09-01T14:28:11.291-0400    demux finishing when there are still outs (1)
2022-09-01T14:28:11.291-0400    Failed: sample_mflix.comments: error restoring from archive 'sampledata.archive': (InternalError) [pool.go:356 pgdb.(*Pool).InTransaction] [msg_insert.go:108 pg.(*Handler).insert.func1] ERROR: duplicate key value violates unique constraint "comments_5886d2d7_pkey" (SQLSTATE 23505)
2022-09-01T14:28:11.291-0400    0 document(s) restored successfully. 0 document(s) failed to restore.

我们有散列主键的事实也有助于负载分布。当您从空表开始时,您只有一个范围,所有读写请求都将命中该范围,从而创建热点。为了实现均匀分布和更好的性能,我们需要在多个范围内平衡负载。现在,表是空的,但它由一个范围支持。

SELECT start_key, end_key, range_id, range_size_mb FROM [SHOW RANGES FROM TABLE sample_mflix.comments_5886d2d7];
 start_key | end_key | range_id | range_size_mb
------------+---------+----------+----------------
  NULL      | NULL    |       85 |             0

让我们在重新加载数据之前预拆分表。由于我们的主键现在是一个基于md5的ObjectID哈希,因此我们可以预先划分为16个范围。

ALTER TABLE sample_mflix.comments_5886d2d7 SPLIT AT 
SELECT
  (
    first_letter
    || md5(random()::STRING)
  )::STRING
FROM
  (
    SELECT
      CASE
      WHEN i < 10 THEN i::STRING
      WHEN i = 10 THEN 'a'
      WHEN i = 11 THEN 'b'
      WHEN i = 12 THEN 'c'
      WHEN i = 13 THEN 'd'
      WHEN i = 14 THEN 'e'
      ELSE 'f'
      END
        AS first_letter
    FROM
      generate_series(0, 15) AS g (i)
  );

下一个命令将在集群中扩展范围:

ALTER TABLE sample_mflix.comments_5886d2d7 SCATTER;

让我们看看新的范围:

SELECT start_key, end_key, range_id, range_size_mb FROM [SHOW RANGES FROM TABLE sample_mflix.comments_5886d2d7];
               start_key               |               end_key                | range_id | range_size_mb
---------------------------------------+--------------------------------------+----------+----------------
  NULL                                 | /"0d545f8e17e030f0c5181e1afc5de6224" |       85 |             0
  /"0d545f8e17e030f0c5181e1afc5de6224" | /"1cab56eb5ce49a61e99ccf31c5ae19c67" |       86 |             0
  /"1cab56eb5ce49a61e99ccf31c5ae19c67" | /"29f6b8beefa6573ec41470968e54bca42" |      106 |             0
  /"29f6b8beefa6573ec41470968e54bca42" | /"33a54c6c02a9989a978aa8e5a5f7e9d73" |      107 |             0
  /"33a54c6c02a9989a978aa8e5a5f7e9d73" | /"4618f2a8a2cc498d98943c6a19b9c6c7b" |      108 |             0
  /"4618f2a8a2cc498d98943c6a19b9c6c7b" | /"5946ca647ea8270a37574c1a898056961" |      109 |             0
  /"5946ca647ea8270a37574c1a898056961" | /"6f2112e4bdad3a2237cb00d50acd0e05e" |      125 |             0
  /"6f2112e4bdad3a2237cb00d50acd0e05e" | /"793517eb63ea208ad2e5baea51c23ad05" |      126 |             0
  /"793517eb63ea208ad2e5baea51c23ad05" | /"8d81118af05e64980c10e5a1625ea250a" |      127 |             0
  /"8d81118af05e64980c10e5a1625ea250a" | /"9c293b5d497d94c029f34e288961415b9" |      128 |             0
  /"9c293b5d497d94c029f34e288961415b9" | /"a497c26e97b1136fb0b7ae1174cf575e6" |      129 |             0
  /"a497c26e97b1136fb0b7ae1174cf575e6" | /"b4fe34ea032cbded1b4004a9548ff1883" |      130 |             0
  /"b4fe34ea032cbded1b4004a9548ff1883" | /"c420a9bf42e53d5c980fc18931bf1dc79" |      131 |             0
  /"c420a9bf42e53d5c980fc18931bf1dc79" | /"d32739d1ed370496ccee3f23f87c36e01" |      132 |             0
  /"d32739d1ed370496ccee3f23f87c36e01" | /"e587ad032ca91e63f1a72999acb6baec5" |      133 |             0
  /"e587ad032ca91e63f1a72999acb6baec5" | /"fbb21b41dbeaf54b21a19f4b0b204195c" |      134 |             0
  /"fbb21b41dbeaf54b21a19f4b0b204195c" | NULL                                 |      135 |             0

我们现在可以将数据重新加载到comments表中,并查看是否有任何差异。

mongorestore --archive=sampledata.archive --nsInclude=sample_mflix.comments

我们可以重新运行show ranges命令以查看正在进行的拆分:

     start_key               |               end_key                | range_id |     range_size_mb
---------------------------------------+--------------------------------------+----------+-------------------------
  NULL                                 | /"05544a6f1a9e2c370d61e80b08bfc9914" |      450 | 0.39971900000000000000
  /"05544a6f1a9e2c370d61e80b08bfc9914" | /"145af2f7d21c2737757f48b85beaac411" |      382 |  1.0964470000000000000
  /"145af2f7d21c2737757f48b85beaac411" | /"22f8086dd66f929f2a9f02213d8eb2880" |      383 |  1.0559140000000000000
  /"22f8086dd66f929f2a9f02213d8eb2880" | /"340e8a0bbb4340a680a45b6f4d041f338" |      395 |  1.2896460000000000000
  /"340e8a0bbb4340a680a45b6f4d041f338" | /"4cd3f28b0d95bcd3be3be03a47cc3e7bf" |      396 |  1.8605330000000000000
  /"4cd3f28b0d95bcd3be3be03a47cc3e7bf" | /"500650bb144971c1138805e235a7275b3" |      397 | 0.26009000000000000000
  /"500650bb144971c1138805e235a7275b3" | /"6777895c82eab8059a7bbbc3f0953fde3" |      398 |  1.7841590000000000000
  /"6777895c82eab8059a7bbbc3f0953fde3" | /"73a828d7661f43b8696cf15103a061b06" |      401 | 0.85725000000000000000
  /"73a828d7661f43b8696cf15103a061b06" | /"894c50ca8d93d101ae665c14427bf35ce" |      402 |  1.5699020000000000000
  /"894c50ca8d93d101ae665c14427bf35ce" | /"9cdda5fe73ed53904d41ab67a47737a25" |      403 |  1.4688710000000000000
  /"9cdda5fe73ed53904d41ab67a47737a25" | /"a94b701dd474c23494ca834eaf9ed8fb6" |      404 | 0.94120900000000000000
  /"a94b701dd474c23494ca834eaf9ed8fb6" | /"bfb04e1b2ff37d4188c7722575504ca27" |      575 |  1.6290150000000000000
  /"bfb04e1b2ff37d4188c7722575504ca27" | /"cfc85f2b55491a6a480552b135c26d34f" |      576 |  1.1618950000000000000
  /"cfc85f2b55491a6a480552b135c26d34f" | /"db4f0c8dfad0c9c478e15b78ddc91834a" |      577 | 0.84471700000000000000
  /"db4f0c8dfad0c9c478e15b78ddc91834a" | /"e8a406deefeff94205f867661bfbcd269" |      578 |  1.0015310000000000000
  /"e8a406deefeff94205f867661bfbcd269" | /"f7a6e34db298c3e4204ddf43842e06cf1" |      579 |  1.1321730000000000000
  /"f7a6e34db298c3e4204ddf43842e06cf1" | NULL                                 |      580 | 0.60192500000000000000

数据在所有范围内都是平衡的,因此整体性能更好。我们仍然依赖于FerretDB的单个Docker实例,因此在您的环境中,您可以更好地使用承载FerretDB的实例。更好的是,因为CockroachDB可以扩展读写,所以可以并行运行多个FerretDB实例并加载数据。

image.png

25 qps没有什么好写的,延迟也相当高。我发现以下标志–numInsertionWorkersPerCollection=50有助于提高mongorestore的性能。玩这个数字直到你达到一个很好的平衡。

mongorestore --archive=sampledata.archive --nsInclude=sample_mflix.comments --numInsertionWorkersPerCollection=100

如果在还原注释集合时碰巧收到以下错误:

2022-09-09T15:08:16.551-0400    sample_mflix.comments  6.79MB
2022-09-09T15:08:16.551-0400    finished restoring sample_mflix.comments (17017 documents, 0 failures)
2022-09-09T15:08:16.551-0400    demux finishing when there are still outs (22)
2022-09-09T15:08:16.551-0400    Failed: sample_mflix.comments: error restoring from archive 'sampledata.archive': (InternalError) [pool.go:288 pgdb.(*Pool).InTransaction] ERROR: restart transaction: TransactionRetryWithProtoRefreshError: TransactionAbortedError(ABORT_REASON_NEW_LEASE_PREVENTS_TXN): "sql txn" meta={id=cc2e0dc4 key=/Table/121/5/"2ee7d65280b51e55fb4701fdb5787ab1"/0 pri=0.02374602 epo=0 ts=1662749935.838933077,1 min=1662749935.635443487,0 seq=2} lock=true stat=PENDING rts=1662749935.838933077,1 wto=false gul=1662749935.885443487,0 (SQLSTATE 40001)
2022-09-09T15:08:16.551-0400    17017 document(s) restored successfully. 0 document(s) failed to restore.

这意味着该区域的租约已转移到另一个区域,CockroachDB重试了该交易。不幸的是,它迫使mongorestore退出。我们可以将租约锁定到本地区域并防止这种行为。

让我们看看当前的首选项是什么:

SHOW ZONE CONFIGURATION FROM TABLE sample_mflix.comments_5886d2d7;
 target       |                raw_config_sql
--------------------+-----------------------------------------------
  DATABASE ferretdb | ALTER DATABASE ferretdb CONFIGURE ZONE USING
                    |     range_min_bytes = 134217728,
                    |     range_max_bytes = 536870912,
                    |     gc.ttlseconds = 600,
                    |     num_replicas = 3,
                    |     constraints = '[]',
                    |     lease_preferences = '[]'

让我们使用CockroachDB多区域抽象来实现期望的结果:

ALTER DATABASE ferretdb PRIMARY REGION "aws-us-east-1";
ALTER DATABASE ferretdb ADD REGION "aws-us-east-2";
ALTER DATABASE ferretdb ADD REGION "aws-us-west-2";
SHOW ZONE CONFIGURATION FROM TABLE sample_mflix.comments_5886d2d7;
      target       |                                           raw_config_sql
--------------------+------------------------------------------------------------------------------------------------------
  DATABASE ferretdb | ALTER DATABASE ferretdb CONFIGURE ZONE USING
                    |     range_min_bytes = 134217728,
                    |     range_max_bytes = 536870912,
                    |     gc.ttlseconds = 600,
                    |     num_replicas = 5,
                    |     num_voters = 3,
                    |     constraints = '{+region=aws-us-east-1: 1, +region=aws-us-east-2: 1, +region=aws-us-west-2: 1}',
                    |     voter_constraints = '[+region=aws-us-east-1]',
                    |     lease_preferences = '[[+region=aws-us-east-1]]'

总之,我们现在保证在us-east-1区域有一个副本,us-east-1有一个lease_holder,负责协调读写操作。

考虑

在写这篇文章时,我很难完全恢复评论集。由于各种问题,如错误、移动租约、重复主键冲突和连接中断,评论集合恢复很可能会失败。在过去的两天里,我一直在与它斗争,但并没有丢失所有数据,已经加载的数据是可访问和可查询的。我从这个练习中得到的最难的教训是阅读文档。推动性能提升的最大影响是numInsertionWorkersPerCollection=100。我们没有推动足够的并行工作。我认为分开表有点帮助,但在这一点上,我认为这是转移注意力。也许我会尝试另一个负载而不分裂。lease_holder钉扎是必要的,因为它发生在启用–numInsertionWorkersPerCollection=100标志的恢复过程中。作为我的最后一次尝试,我已经打开了所有的调整,并达到了40044行。这是我所能做到的。在达到生产级之前,还有很多工作要做。

2022-09-09T15:49:34.466-0400    sample_mflix.comments  11.1MB
2022-09-09T15:49:36.940-0400    sample_mflix.comments  11.1MB
2022-09-09T15:49:36.940-0400    finished restoring sample_mflix.comments (33079 documents, 0 failures)
2022-09-09T15:49:36.940-0400    Failed: sample_mflix.comments: error restoring from archive 'sampledata.archive': (InternalError) [pool.go:288 pgdb.(*Pool).InTransaction] read tcp 172.29.0.2:52130->3.217.93.138:26257: read: connection reset by peer
2022-09-09T15:49:36.940-0400    33079 document(s) restored successfully. 0 document(s) failed to restore.
2022-09-09T15:49:36.940-0400    demux finishing when there are still outs (21)

这是最后一次恢复的输出,CockroachDB和mongosh中的计数对应。

sample_mflix> db.comments.count()
40044
  count
---------
  40044

结论

最后,我想指出,这是一个过于简化的示例,毫无疑问,我认为所有迁移都很容易。尽管如此,这只是一个实验,目前还没有工程工作在进行,以使 CockroachDB和FerretDB更好地工作,除非以下问题获得足够的投票。未来还有机会将FerretDB添加为MongoDB副本集,谁知道呢,也许我们可以通过所谓的在线迁移将数据移动到CockroachDB?此外,我还要重申:MongoDB和CockroachDB不是可比技术。这两种产品都有一个箱子。为工作选择最好的工具!我只是给你另一个选择。综上所述,我们还有很多路要走,但我喜欢这条路的方向!

原文标题:Migrating MongoDB Collections to CockroachDB With FerretDB
原文作者:Artem Ervits
原文链接:https://dzone.com/articles/migrating-mongodb-collections-to-cockroachdb-with

「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论