暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

一起学PolarDB - 第18期 - 为什么创建索引慢?

原创 digoal 2022-01-20
262

作者

digoal

日期

2022-01-18

标签

PostgreSQL , PolarDB


懂PostgreSQL, 学PolarDB不难, 就好像有九阳神功护体, 可以快速融会贯通.
对于DBA只要学会PolarDB精髓即可.
对于开发者来说不需要学习, 使用PolarDB和PostgreSQL一样.

为什么创建索引慢?

https://www.bilibili.com/video/BV1aR4y177fK/

社区版本:
以Btree索引为例, 排序, 构建Tree(Meta page, root page, branch page, leaf page).
记录数越大, 越耗时. 超过内存大小时, 还需要使用临时文件存储排序结果.

优化:
PG 11开始支持了并行索引创建. 并行排序, 充分利用CPU多核资源.

算法详见:
《PostgreSQL 11 preview - 并行排序、并行索引 (性能线性暴增) 单实例100亿TOP-K仅40秒》

用到了多核, 所以瓶颈变成了单机的CPU、内存、IO瓶颈.

硬件 数据量 非并行创建索引耗时 64并行度创建索引耗时 性能提升倍数
64线程机器 1亿 26.244 秒 15.55 秒 1.69 倍
64线程机器 10亿 1442.7 秒 160.77 秒 8.97 倍
64线程机器 100亿 7456.9 秒 1869.5 秒 4 倍

并行创建索引用到的是external sorting algorithm 排序方法(external sort + external merge)。

postgres=# set trace_sort =on;    
SET    
postgres=# set client_min_messages =log;    
SET    
set maintenance_work_mem='8GB';    
alter table ttt set ( parallel_workers=4 );    
postgres=# drop index idx_ttt;    
DROP INDEX    
postgres=# create index idx_ttt on ttt(id);    
LOG:  begin index sort: unique = f, workMem = 1677721, randomAccess = f    
LOG:  begin index sort: unique = f, workMem = 1677721, randomAccess = f    
LOG:  begin index sort: unique = f, workMem = 1677721, randomAccess = f    
LOG:  begin index sort: unique = f, workMem = 1677721, randomAccess = f    
LOG:  begin index sort: unique = f, workMem = 1677721, randomAccess = f    
LOG:  performsort of 0 starting: CPU: user: 3.90 s, system: 0.51 s, elapsed: 4.41 s    
LOG:  performsort of 4 starting: CPU: user: 3.74 s, system: 0.66 s, elapsed: 4.40 s    
LOG:  4 switching to external sort with 7 tapes: CPU: user: 3.74 s, system: 0.66 s, elapsed: 4.40 s    
LOG:  performsort of 3 starting: CPU: user: 3.78 s, system: 0.62 s, elapsed: 4.40 s    
LOG:  3 switching to external sort with 7 tapes: CPU: user: 3.78 s, system: 0.62 s, elapsed: 4.40 s    
LOG:  performsort of 2 starting: CPU: user: 3.74 s, system: 0.66 s, elapsed: 4.41 s    
LOG:  2 switching to external sort with 7 tapes: CPU: user: 3.74 s, system: 0.66 s, elapsed: 4.41 s    
LOG:  performsort of 1 starting: CPU: user: 3.76 s, system: 0.64 s, elapsed: 4.41 s    
LOG:  1 switching to external sort with 7 tapes: CPU: user: 3.76 s, system: 0.64 s, elapsed: 4.41 s    
LOG:  1 starting quicksort of run 1: CPU: user: 3.76 s, system: 0.65 s, elapsed: 4.41 s    
LOG:  0 switching to external sort with 7 tapes: CPU: user: 3.90 s, system: 0.51 s, elapsed: 4.41 s    
LOG:  3 starting quicksort of run 1: CPU: user: 3.78 s, system: 0.62 s, elapsed: 4.40 s    
LOG:  2 starting quicksort of run 1: CPU: user: 3.74 s, system: 0.66 s, elapsed: 4.41 s    
LOG:  0 starting quicksort of run 1: CPU: user: 3.90 s, system: 0.51 s, elapsed: 4.41 s    
LOG:  4 starting quicksort of run 1: CPU: user: 3.74 s, system: 0.66 s, elapsed: 4.40 s    
LOG:  1 finished quicksort of run 1: CPU: user: 3.90 s, system: 0.65 s, elapsed: 4.55 s    
LOG:  3 finished quicksort of run 1: CPU: user: 3.92 s, system: 0.62 s, elapsed: 4.55 s    
LOG:  2 finished quicksort of run 1: CPU: user: 3.88 s, system: 0.66 s, elapsed: 4.55 s    
LOG:  4 finished quicksort of run 1: CPU: user: 3.88 s, system: 0.66 s, elapsed: 4.55 s    
LOG:  0 finished quicksort of run 1: CPU: user: 4.05 s, system: 0.51 s, elapsed: 4.56 s    
LOG:  0 finished writing run 1 to tape 0: CPU: user: 5.04 s, system: 1.88 s, elapsed: 6.91 s    
LOG:  4 finished writing run 1 to tape 0: CPU: user: 4.57 s, system: 1.81 s, elapsed: 6.38 s    
LOG:  performsort of 4 done: CPU: user: 4.57 s, system: 1.85 s, elapsed: 6.43 s    
LOG:  parallel external sort of 4 ended, 48940 disk blocks used: CPU: user: 4.57 s, system: 1.85 s, elapsed: 6.43 s    
LOG:  3 finished writing run 1 to tape 0: CPU: user: 4.61 s, system: 1.79 s, elapsed: 6.41 s    
LOG:  performsort of 3 done: CPU: user: 4.61 s, system: 1.83 s, elapsed: 6.45 s    
LOG:  parallel external sort of 3 ended, 48363 disk blocks used: CPU: user: 4.61 s, system: 1.83 s, elapsed: 6.45 s    
LOG:  2 finished writing run 1 to tape 0: CPU: user: 4.58 s, system: 1.78 s, elapsed: 6.37 s    
LOG:  performsort of 2 done: CPU: user: 4.58 s, system: 1.82 s, elapsed: 6.41 s    
LOG:  parallel external sort of 2 ended, 48558 disk blocks used: CPU: user: 4.58 s, system: 1.82 s, elapsed: 6.41 s    
LOG:  1 finished writing run 1 to tape 0: CPU: user: 4.58 s, system: 1.82 s, elapsed: 6.41 s    
LOG:  performsort of 1 done: CPU: user: 4.58 s, system: 1.86 s, elapsed: 6.44 s    
LOG:  parallel external sort of 1 ended, 48234 disk blocks used: CPU: user: 4.58 s, system: 1.86 s, elapsed: 6.44 s    
LOG:  performsort of 0 done: CPU: user: 5.04 s, system: 1.91 s, elapsed: 6.95 s    
LOG:  parallel external sort of 0 ended, 50526 disk blocks used: CPU: user: 5.04 s, system: 1.91 s, elapsed: 6.95 s   
LOG:  begin index sort: unique = f, workMem = 8388608, randomAccess = f    
LOG:  performsort of -1 starting: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s    
LOG:  -1 using 8388553 KB of memory for read buffers among 5 input tapes    
LOG:  performsort of -1 done (except 5-way final merge): CPU: user: 0.11 s, system: 1.08 s, elapsed: 1.20 s    
LOG:  parallel external sort of -1 ended, 244621 disk blocks used: CPU: user: 8.52 s, system: 3.98 s, elapsed: 12.49 s    
CREATE INDEX    
 * Callers must do the following to perform a sort in parallel using multiple    
 * worker processes:    
 *    
 * 1. Request tuplesort-private shared memory for n workers.  Use    
 *    tuplesort_estimate_shared() to get the required size.    
 * 2. Have leader process initialize allocated shared memory using    
 *    tuplesort_initialize_shared().  Launch workers.    
 * 3. Initialize a coordinate argument within both the leader process, and    
 *    for each worker process.  This has a pointer to the shared    
 *    tuplesort-private structure, as well as some caller-initialized fields.    
 *    Leader's coordinate argument reliably indicates number of workers    
 *    launched (this is unused by workers).    
 * 4. Begin a tuplesort using some appropriate tuplesort_begin* routine,    
 *    (passing the coordinate argument) within each worker.  The workMem    
 *    arguments need not be identical.  All other arguments should match    
 *    exactly, though.    
 * 5. tuplesort_attach_shared() should be called by all workers.  Feed tuples    
 *    to each worker, and call tuplesort_performsort() within each when input    
 *    is exhausted.    
 * 6. Call tuplesort_end() in each worker process.  Worker processes can shut    
 *    down once tuplesort_end() returns.    
 * 7. Begin a tuplesort in the leader using the same tuplesort_begin*    
 *    routine, passing a leader-appropriate coordinate argument (this can    
 *    happen as early as during step 3, actually, since we only need to know    
 *    the number of workers successfully launched).  The leader must now wait    
 *    for workers to finish.  Caller must use own mechanism for ensuring that    
 *    next step isn't reached until all workers have called and returned from    
 *    tuplesort_performsort().  (Note that it's okay if workers have already    
 *    also called tuplesort_end() by then.)    
 * 8. Call tuplesort_performsort() in leader.  Consume output using the    
 *    appropriate tuplesort_get* routine.  Leader can skip this step if    
 *    tuplesort turns out to be unnecessary.    
 * 9. Call tuplesort_end() in leader.    

PolarDB:
将sort任务拆分到多个节点(RW和RO节点都可以参与计算), 每个计算节点扫描部分数据并对其进行排序, 充分利用多机的CPU、内存、IO资源.
所以PolarDB节点数越多, 创建索引越快.

经分析建索引过程中:80%是在排序和构建索引页,20%在写索引页。通过使用分布式并行来加速排序过程,同时流水化批量写入。

pic

上述优化能够使得创建索引有 4~5 倍的提升。

pic

本期问题1:
创建索引可能包含哪些阶段?
- a. 排序
- b. 构建tree
- c. 读取数据
- d. 写入磁盘
- e. 删除数据
- f. 写临时空间
- g. 写WAL日志
- h. 并行排序

答案:
- abcdfgh

解释:
- 参考本文内容

本期问题2:
大表创建索引时比较容易遇到的较大瓶颈有哪些?
- a. 排序时消耗CPU
- b. 排序时写临时文件
- c. 写WAL日志
- d. 排序结果写入索引页

答案:
- ab

解释:
- 参考本文内容

本期问题3:
PolarDB 如何解决大表创建索引的瓶颈问题?
- a. 利用多个计算节点的IO能力
- b. 排序时写临时文件
- c. 利用多个计算节点的内存, 减少临时文件的使用
- d. 利用多个计算节点的CPU能力

答案:
- acd

解释:
- 参考本文内容

期望 PostgreSQL 增加什么功能?

PolarDB for PostgreSQL云原生分布式开源数据库

PostgreSQL 解决方案集合

德哥 / digoal's github - 公益是一辈子的事.

digoal's wechat

「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论