openGauss每日一练第21天|《学习openGauss存储模型-行存和列存》学习心得体会和课后练习

原创闫伟 2021-12-21

317

学习openGauss存储模型-行存和列存

行存储是指将表按行存储到硬盘分区上，列存储是指将表按列存储到硬盘分区上。默认情况下，创建的表为行存储。

行、列存储模型各有优劣，通常用于TP场景的数据库，默认使用行存储，仅对执行复杂查询且数据量大的AP场景时，才使用列存储

课程学习

连接数据库

#第一次进入等待15秒
#数据库启动中...
su - omm
gsql -r

1.创建行存表

CREATE TABLE test_t1
(
col1 CHAR(2),
col2 VARCHAR2(40),
col3 NUMBER
);

–压缩属性为no

\d+ test_t1
insert into test_t1 select col1, col2, col3 from (select generate_series(1, 100000) as key, repeat(chr(int4(random() * 26) + 65), 2) as  col1, repeat(chr(int4(random() * 26) + 65), 30) as  col2, (random() * (10^4))::integer as col3);


omm=# \d+ test_t1
                               Table "public.test_t1"
 Column |         Type          | Modifiers | Storage  | Stats target | Description 
--------+-----------------------+-----------+----------+--------------+-------------
 col1   | character(2)          |           | extended |              | 
 col2   | character varying(40) |           | extended |              | 
 col3   | numeric               |           | main     |              | 
Has OIDs: no
Options: orientation=row, compression=no

2.创建列存表

CREATE TABLE test_t2
(
col1 CHAR(2),
col2 VARCHAR2(40),
col3 NUMBER
)
WITH (ORIENTATION = COLUMN);

–压缩属性为low

\d+ test_t2;

–插入和行存表相同的数据

insert into test_t2 select * from test_t1;

3.占用空间对比

\d+
omm=# \d+
                                        List of relations
 Schema |  Name   | Type  | Owner |  Size   |               Storage                | Descripti
on 
--------+---------+-------+-------+---------+--------------------------------------+----------
---
 public | test_t1 | table | omm   | 6760 kB | {orientation=row,compression=no}     | 
 public | test_t2 | table | omm   | 1112 kB | {orientation=column,compression=low} | 
(2 rows)

4.对比读取一列的速度

analyze VERBOSE test_t1;
analyze VERBOSE test_t2;

–列存表时间少于行存表

explain analyze select distinct col1 from test_t1;
explain analyze select distinct col1 from test_t2;
omm=# explain analyze select distinct col1 from test_t1;
                                                     QUERY PLAN                               
                       
----------------------------------------------------------------------------------------------
-----------------------
 HashAggregate  (cost=2091.00..2091.27 rows=27 width=3) (actual time=51.640..51.646 rows=27 lo
ops=1)
   Group By Key: col1
   ->  Seq Scan on test_t1  (cost=0.00..1841.00 rows=100000 width=3) (actual time=0.012..24.79
4 rows=100000 loops=1)
 Total runtime: 51.700 ms
(4 rows)

omm=# explain analyze select distinct col1 from test_t2;
                                                         QUERY PLAN                           
                              
----------------------------------------------------------------------------------------------
------------------------------
 Row Adapter  (cost=1008.27..1008.27 rows=27 width=3) (actual time=4.226..4.228 rows=27 loops=
1)
   ->  Vector Sonic Hash Aggregate  (cost=1008.00..1008.27 rows=27 width=3) (actual time=4.224
..4.224 rows=27 loops=1)
         Group By Key: col1
         ->  CStore Scan on test_t2  (cost=0.00..758.00 rows=100000 width=3) (actual time=0.03
0..0.302 rows=100000 loops=1)
 Total runtime: 4.327 ms
(5 rows)

5.对比插入一行的速度

–行存表时间少于列存表

explain analyze insert into test_t1 values('x', 'xxxx', '123');
explain analyze insert into test_t2 values('x', 'xxxx', '123');

6.清理数据

drop table test_t1;
drop table test_t2;

课程作业

1.创建行存表和列存表，并批量插入10万条数据(行存表和列存表数据相同)

create table test_t1 (id int,name text);
CREATE TABLE test_t2
(
id int,
name text
)
WITH (ORIENTATION = COLUMN);

insert into test_t1 SELECT id,name from (select generate_series(1,100000) as key, (random()*(6^2))::integer as id,repeat(chr(int4(random()*26)+65),4)) as name);

insert into test_t2 select * from test_t1;




analyze VERBOSE test_t1;
analyze VERBOSE test_t2;

omm=# analyze VERBOSE test_t1;
analyze VERBOSE test_t2;INFO:  analyzing "public.test_t1"(gaussdb pid=1)
INFO:  ANALYZE INFO : "test_t1": scanned 541 of 541 pages, containing 100000 live rows and 0 dead rows; 30000 rows in sample, 100000 estimated total rows(gaussdb pid=1)
ANALYZE
omm=# 
INFO:  analyzing "public.test_t2"(gaussdb pid=1)
INFO:  ANALYZE INFO : estimate total rows of "pg_delta_16433": scanned 0 pages of total 0 pages with 1 retry times, containing 0 live rows and 0 dead rows,  estimated 0 total rows(gaussdb pid=1)
INFO:  ANALYZE INFO : "test_t2": scanned 2 of 2 cus, sample 30000 rows, estimated total 100000 rows(gaussdb pid=1)
ANALYZE

2.对比行存表和列存表空间大小

omm=# \d+
                                        List of relations
 Schema |  Name   | Type  | Owner |  Size   |               Storage                | Descripti
on 
--------+---------+-------+-------+---------+--------------------------------------+----------
---
 public | test_t1 | table | omm   | 4360 kB | {orientation=row,compression=no}     | 
 public | test_t2 | table | omm   | 440 kB  | {orientation=column,compression=low} | 
(2 rows)

3.对比查询一列和插入一行的速度

explain analyze insert into test_t1 values(12 , 'xxxx');
explain analyze insert into test_t2 values(12, 'xxxx');
omm=# explain analyze insert into test_t1 values(12 , 'xxxx');
explain analyze insert into test_t2 values(12, 'xxxx');                                          QUERY PLAN                                          
 
----------------------------------------------------------------------------------------------
-
 [Bypass]
 Insert on test_t1  (cost=0.00..0.01 rows=1 width=0) (actual time=0.069..0.070 rows=1 loops=1)
   ->  Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.001..0.001 rows=1 loops=1)
 Total runtime: 0.175 ms
(4 rows)

omm=# 
                                          QUERY PLAN                                          
 
----------------------------------------------------------------------------------------------
-
 Insert on test_t2  (cost=0.00..0.01 rows=1 width=0) (actual time=0.136..0.137 rows=1 loops=1)
   ->  Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.001..0.002 rows=1 loops=1)
 Total runtime: 0.237 ms
(3 rows)

4.清理数据

drop table test_t1;
drop table test_t2;

opengauss

「喜欢这篇文章，您的关注和赞赏是给作者最好的鼓励」

关注作者

openGauss每日一练第21天|《学习openGauss存储模型-行存和列存》学习心得体会和课后练习

课程学习

1.创建行存表

2.创建列存表

3.占用空间对比

4.对比读取一列的速度

5.对比插入一行的速度

6.清理数据

课程作业

评论