引言:
有个朋友问我OB的update本质是不是先delete再insert,他说他遇到了一个场景问题,一个表的字段default值是一个序列,在ob中update其他列,新insert的数据用的序列值被推高了(这句话没看懂没关系,后面我会模拟)。
其实这个理解不完全对,但是结果现象差不多,这个知识点在ob的从0 到1数据库实践教程中的OceanBase存储引擎结构》和《基于LSM-Tree的 OceanBase 存储引擎》的课程中有讲到,如下图所示,相同rowkey的数据有不同的时间戳,访问的是fuse后的最新时间戳的该rowkey下的数据,整个数据链会在转储合并的时候进行compaction。

案例说明:
其实这个点说完,还是不能解释ob的行为与oracle的区别。那就通过实验来看现象吧。
oracle:
先在oracle上创建一个测试表,还原问题场景。
SQL> create table zrycs(a number primary key,b varchar2(20)); Table created. SQL> alter table zrycs modify a default SEQ_CS.nextval; Table altered. SQL> insert into zrycs(b) values('xx'); 1 row created. SQL> commit; Commit complete. SQL> select * from zrycs; A B ---------- -------------------- 1 xx
可以看到插入数据的时候没有指定a的数值,自动补充了default的序列值。
可以看下这个insert的执行计划。
-------------------------------------------------------------------------------- ------------- | Id | Operation | Name | Starts | A-Rows | A-Time | Buffers | Reads | PLAN_TABLE_OUTPUT ------------------------------------------------------------------------------------------- | 0 | INSERT STATEMENT | | 1 | 0 |00:00:00.02 |19 | 3 | | 1 | LOAD TABLE CONVENTIONAL | ZRYCS | 1 | 0 |00:00:00.02 |19 | 3 | | 2 | SEQUENCE | SEQ_CS | 1 | 1 |00:00:00.01 | --------------------------------------------------------------------------------
是访问了序列,然后插入数据到表中。
再update下,看下执行计划。
SQL> update zrycs set b='zry01' where b='xx'; 1 row updated. SQL> commit; Commit complete. SQL> select * from zrycs; A B ---------- -------------------- 1 zry01
执行计划如下,可以看到没有访问序列的操作了
-------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | -------------------------------------------------------------------------------------- | 0 | UPDATE STATEMENT | | 1 | | 0 |00:00:00.01 | 3 | | 1 | UPDATE | ZRYCS | 1 | | 0 |00:00:00.01 | 3 | PLAN_TABLE_OUTPUT -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |* 2 | TABLE ACCESS FULL| ZRYCS | 1 | 1 | 0 |00:00:00.01 | 3 | -------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - filter("B"='xx') Note ----- - dynamic statistics used: dynamic sampling (level=2)
再插入一条不带a字段的数据,序列号是正常增加的。
SQL> select * from zrycs; A B ---------- -------------------- 1 zry01 SQL> insert into zrycs(b) values('zz02'); 1 row created. SQL> commit; Commit complete. SQL> select * from zrycs; A B ---------- -------------------- 1 zry01 2 zz02
最后再看下插入一条带a字段数值的数据的计划是什么样的。
SQL> insert into zrycs(a,b) values(111,'zz03'); SQL> select * from table(dbms_xplan.display_cursor(null,null,'ALLSTATS LAST')); PLAN_TABLE_OUTPUT -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- SQL_ID 12ss7auvd61gr, child number 0 ------------------------------------- insert into zrycs(a,b) values(111,'zz03') ----------------------------------------------------------------------------------- | Id | Operation | Name | Starts | A-Rows | A-Time | Buffers | ----------------------------------------------------------------------------------- | 0 | INSERT STATEMENT | | 1 | 0 |00:00:00.01 | 4 | | 1 | LOAD TABLE CONVENTIONAL | ZRYCS | 1 | 0 |00:00:00.01 | 4 | ----------------------------------------------------------------------------------- PLAN_TABLE_OUTPUT -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Note ----- - cpu costing is off (consider enabling it)
可以看到是没有访问序列的。
OceanBase:
我在OB进行了同样操作的一个验证
obclient [ZRY]> create sequence SEQ_CS MINVALUE 1 MAXVALUE 999999999999999999999999999 start with 1 increment by 1 nocache order ; Query OK, 0 rows affected (0.476 sec) obclient [ZRY]> create table zrycs(a number primary key,b varchar2(20)); Query OK, 0 rows affected (0.467 sec) obclient [ZRY]> alter table zrycs modify a default SEQ_CS.nextval; Query OK, 0 rows affected (0.216 sec) obclient [ZRY]> insert into zrycs(b) values('xx'); Query OK, 1 row affected (0.173 sec) obclient [ZRY]> commit; Query OK, 0 rows affected (0.003 sec) obclient [ZRY]> select * from zrycs; +------+------+ | A | B | +------+------+ | 1 | xx | +------+------+ 1 row in set (0.023 sec) obclient [ZRY]> update zrycs set b='zry01' where b='xx'; Query OK, 1 row affected (0.029 sec) Rows matched: 1 Changed: 1 Warnings: 0 obclient [ZRY]> commit; Query OK, 0 rows affected (0.008 sec) obclient [ZRY]> select * from zrycs; +------+-------+ | A | B | +------+-------+ | 1 | zry01 | +------+-------+ 1 row in set (0.001 sec) obclient [ZRY]> insert into zrycs(b) values('zz02'); Query OK, 1 row affected (0.032 sec) obclient [ZRY]> commit; Query OK, 0 rows affected (0.003 sec) obclient [ZRY]> select * from zrycs; +------+-------+ | A | B | +------+-------+ | 1 | zry01 | | 3 | zz02 | +------+-------+
根据上面的操作可以看到,update一次已有数据后,下一次插入的序列值本应为2,现在实际插入的是3。
其实我们可以看下这个update的执行计划。
obclient [ZRY]> explain extended update zrycs set b='zry01' where b='xx' ; +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Query Plan | +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | ==================================================== | | |ID|OPERATOR |NAME |EST.ROWS|EST.TIME(us)| | | ---------------------------------------------------- | | |0 |UPDATE | |1 |36 | | | |1 |└─SEQUENCE | |1 |5 | | | |2 | └─TABLE FULL SCAN|ZRYCS|1 |5 | | | ==================================================== | | Outputs & filters: | | ------------------------------------- | | 0 - output(nil), filter(nil) | | table_columns([{ZRYCS: ({ZRYCS: (ZRYCS.A(0x7f2623a28f20), ZRYCS.B(0x7f2623a104a0))})}]), | | update([ZRYCS.B(0x7f2623a104a0)=column_conv(VARCHAR2,gbk_bin,length:20,NULL,cast('zry01', VARCHAR2(1048576 ))(0x7f2623a27e70))(0x7f2623a10a50)]) | | 1 - output([ZRYCS.A(0x7f2623a28f20)], [ZRYCS.B(0x7f2623a104a0)]), filter(nil) | | 2 - output([ZRYCS.A(0x7f2623a28f20)], [ZRYCS.B(0x7f2623a104a0)]), filter([ZRYCS.B(0x7f2623a104a0) = cast('xx', VARCHAR2(1048576 ))(0x7f2623a33540)(0x7f2623a32b20)]), rowset=16 | | access([ZRYCS.A(0x7f2623a28f20)], [ZRYCS.B(0x7f2623a104a0)]), partitions(p0) | | is_index_back=false, is_global_index=false, filter_before_indexback[false], | | range_key([ZRYCS.A(0x7f2623a28f20)]), range(MIN ; MAX)always true
可以看到尽管这个update没有去操作a列,他还是访问了序列,正如上面开始说的,ob的update是inert一条新数据,除了现在这个现象外,如果在存在数据的表上创建约束后update已有数据的时候也会校验update后的数据是否符合约束。
先看下inert一条使用到序列补充数据的sql的计划。
+------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Query Plan | +------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | ================================================== | | |ID|OPERATOR |NAME|EST.ROWS|EST.TIME(us)| | | -------------------------------------------------- | | |0 |DISTRIBUTED INSERT| |1 |13 | | | |1 |└─EXPRESSION | |1 |1 | | | |2 | └─SEQUENCE | |0 |1 | | | ================================================== | | Outputs & filters: | | ------------------------------------- | | 0 - output(nil), filter(nil) | | columns([{ZRYCS: ({ZRYCS: (ZRYCS.A, ZRYCS.B)})}]), | | column_values([column_conv(NUMBER,PS:(-1,-85),NOT NULL,column_conv(NUMBER,PS:(-1,-85),NOT NULL,SEQ_CS.NEXTVAL))], [column_conv(VARCHAR2,gbk_bin,length:20, | | NULL,__values.B)]) | | 1 - output([__values.B]), filter(nil) | | values({'zz04'}) | | 2 - output(nil), filter(nil) | +------------------------------------------------------------------------------------------------------------------------------------------------------------------+
这个没问题,和oracle的计划差不多,还可以扩展下看看insert一条明确a字段的数据计划是什么样。
obclient [ZRY]> desc insert into zrycs(a,b) values(111,'zz03'); +---------------------------------------------------------------------------------------------------------------------------------------+ | Query Plan | +---------------------------------------------------------------------------------------------------------------------------------------+ | ================================================== | | |ID|OPERATOR |NAME|EST.ROWS|EST.TIME(us)| | | -------------------------------------------------- | | |0 |DISTRIBUTED INSERT| |1 |13 | | | |1 |└─EXPRESSION | |1 |1 | | | |2 | └─SEQUENCE | |0 |1 | | | ================================================== | | Outputs & filters: | | ------------------------------------- | | 0 - output(nil), filter(nil) | | columns([{ZRYCS: ({ZRYCS: (ZRYCS.A, ZRYCS.B)})}]), | | column_values([column_conv(NUMBER,PS:(-1,-85),NOT NULL,__values.A)], [column_conv(VARCHAR2,gbk_bin,length:20,NULL,__values.B)]) | | 1 - output([__values.A], [__values.B]), filter(nil) | | values({111, 'zz03'}) | | 2 - output(nil), filter(nil) | +---------------------------------------------------------------------------------------------------------------------------------------+ 15 rows in set (0.004 sec)
可以看到虽然实际数据没有用到序列,但是还是访问了序列,这个与oracle的行为就不一样了。有点类似列上有约束或者触发器,在操作前进行下校验。
我们看下这个计划显示调用序列了,序列值有没有变化。
obclient [ZRY]> insert into zrycs(a,b) values(111,'zz03'); Query OK, 1 row affected (0.012 sec) obclient [ZRY]> commit; Query OK, 0 rows affected (0.005 sec) obclient [ZRY]> select * from zrycs; +------+-------+ | A | B | +------+-------+ | 1 | zry01 | | 3 | zz02 | | 111 | zz03 | +------+-------+ 3 rows in set (0.002 sec) obclient [ZRY]> insert into zrycs(b) values('zz04'); Query OK, 1 row affected (0.006 sec) obclient [ZRY]> commit; Query OK, 0 rows affected (0.006 sec) obclient [ZRY]> select * from zrycs; +------+-------+ | A | B | +------+-------+ | 1 | zry01 | | 3 | zz02 | | 5 | zz04 | | 111 | zz03 | +------+-------+ 4 rows in set (0.001 sec)
果然,序列的值被推高了,本来应该是4的现在新插入的数据成了5。
总结:
这个现象是数据库的架构和实现方式造成的体现不同,说不上谁对谁错,也是希望通过这个现象的理解,可以更好的理解ob的存储架构以及原理。
行之所向,莫问远方。




