暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

OGG 大事务处理 特大事务处理

– OGG 大事务处理 特大事务处理[应用进程]

故障分析1 :OGG应用进程延时非常高
Program     Status      Group       Lag at Chkpt  Time Since Chkpt
REPLICAT    RUNNING     R_07        00:00:04      39:27:19  

故障分析2:检查进程日志位置,应用位置不变,应用检查点是6号文件
GGSCI (ogg_db1) 38> info R_07
REPLICAT   R_07   Last Started 2021-08-26 09:18   Status RUNNING
INTEGRATED
Checkpoint Lag       00:00:04 (updated 40:24:14 ago)
Process ID           97630                -- 可以使用processid和v$session关联,查看执行的SQL
Log Read Checkpoint  File ./dirdat/b3000000006
                     2021-08-24 17:38:31.000000  RBA 887034291

故障分析3:send status检查应用进程状态,命令被卡住无反应
GGSCI (ogg_db1) 52> send R_07,status


故障分析4:直接KILL掉应用进程,重新启动
GGSCI (ogg_db1) 53> kill R_07
GGSCI (ogg_db1) 54> start R_07

故障分析5:启用应用进程之后,info R_07 还是6号文件位置,send status在变,说明在应用大事务
GGSCI (ogg_db1) 60> send R_07,status
Sending STATUS request to REPLICAT R_07 ...
  Current status: Processing data
  Sequence #: 8
  RBA: 1,403,433,736
  14,800,801 records in current transaction.
STOP request pending end-of-transaction (14,800,801 records so far).

GGSCI (ogg_db1) 61> send R_07,status
Sending STATUS request to REPLICAT R_07 ...
  Current status: Processing data
  Sequence #: 9        -- sequence 和 rba在变 
  RBA: 874,973,822
  19,601,669 records in current transaction.
STOP request pending end-of-transaction (19,601,669 records so far).  -- 已经读入了1900万数据(未提交)

-- 故障分析6:检查目标数据库中应用进程
SYS@orcl(orcldg): 1> select sid,serial#,event,status from v$session where username='OGG_ADMI' and process=97630;
    SID            SERIAL# EVENT                                    status
------- ------------------ ---------------------------------------- ----------
  10988               3527 latch: shared pool                       ACTIVE

SYS@orcl(orcldg): 10988> @sesscli 10988
SID_SERIAL
INST_PID               USERNAME        OSUSER     status     EVENT                          machine         TERMINAL   PROGRAM              CLIENT_INFO
---------------------- --------------- ---------- ---------- ------------------------------ --------------- ---------- -------------------- --------------------
10988,3527 1 260855    OGG_ADMI        oracle     ACTIVE     Streams: resolve low memory co khfwpt-scoggdb-            replicat@khfwpt-scog
                      
                       SID_SERIAL                                                            
INST_PID               SQL_ID         STATE             LAST_EXECTIME  CALLET   SEQ# BLOCKING
---------------------- -------------- ----------------- -------------- ------ ------ --------
10988,3527 1 260855    .2vh58kxmbt9gc WAITING           08-26 10:44:54 18S     14145         

发现进程等待事件1: latch: shared pool                       
发现进程等待事件2: Streams: resolve low memory condition                       
未直接发现进程应用相关的SQL


-- 检查数据库负载和内存stream参数
SYS@orcl(orcldg): 1> !free -g
             total       used       free     shared    buffers     cached
Mem:          1008        462        546          1          0         11
-/+ buffers/cache:        449        558
Swap:           15          0         15

SYS@orcl(orcldg): 1> @sga
-- V$SGAINFO
NAME                                          SIZE UN RES
------------------------------- ------------------ -- ---
Maximum SGA Size                             408.2 GB No
Buffer Cache Size                            319.5 GB Yes
Streams Pool Size                             45.5 GB Yes
Shared Pool Size                              35.5 GB Yes
Startup overhead in Shared Pool               21.1 GB No
Large Pool Size                                3.5 GB Yes
Java Pool Size                                 3.5 GB Yes
Redo Buffers                                   684 MB No
Granule Size                                   512 MB No
Fixed SGA Size                                   2 MB No
Shared IO Pool Size                              0 B  Yes
Free SGA Memory Available                        0 B

top - 10:55:25 up 4 days, 21:29,  4 users,  load average: 12.52, 13.26, 13.86
Tasks: 7438 total,  16 running, 7422 sleeping,   0 stopped,   0 zombie
Cpu(s):  7.0%us,  1.7%sy,  0.0%ni, 91.0%id,  0.1%wa,  0.0%hi,  0.2%si,  0.0%st



-- 故障分析7:检查UNDO,发现一个大事务,用到了8GB的UNDO空间
                                                                                     T-S        TRA                                                                             OSUSER
SEGMENT_NAME              START_TIME        SQL_ID         SID_SERIAL                STAT used_ublk    SIZE_MB       BLKS MACHINE    PROGRAM         TERMINAL                       USERNAME
------------------------- ----------------- -------------- ------------------------- ---- --------- ---------- ---------- ---------- --------------- ------------------------------ -------------------
_SYSSMU89_178821965$      08/26/21 10:56:03 5k2jgfrr8nyav. 9244,1051 363181          A A          9          8       1024 orcl3     oracle@orcl3 ( UNKNOWN                        oracle OGG_ADMI
_SYSSMU77_682973236$      08/26/21 09:21:46 9x4xnbb41m42r. 13764,7177 269166         A A     968193       7593     971904 orcl3     oracle@orcl3 ( UNKNOWN                        oracle OGG_ADMI


-- 检查大事务的SQL和进程:
SYS@orcl(orcldg): 1> @sql 9x4xnbb41m42r

DELETE /*+ restrict_all_ref_cons */ FROM "gz1"."MK_*_LOG" WHERE "FLOW_SN"=:1
 
SYS@orcl(orcldg): 1> @sesscli 13764
SID_SERIAL
INST_PID               USERNAME        OSUSER     STATUS       EVENT                          machine         TERMINAL   PROGRAM              CLIENT_INFO
---------------------- --------------- ---------- ------------ ------------------------------ --------------- ---------- -------------------- --------------------
13764,7177 1 269166    OGG_ADMI        oracle     ACTIVE       latch: row cache objects       orcl3          UNKNOWN    oracle@orcl3 (AS1E)

SID_SERIAL
INST_PID               USERNAME        OSUSER     status     EVENT                          machine         TERMINAL   PROGRAM              CLIENT_INFO
---------------------- --------------- ---------- ---------- ------------------------------ --------------- ---------- -------------------- --------------------
13764,7177 1 269166    OGG_ADMI        oracle     ACTIVE     gc current request             orcl3          UNKNOWN    oracle@orcl3 (AS1E)

1 row selected.
 

-- 故障分析8: 查询出来的表在OGG应用配置中匹配
-- 发现大事务操作的表在这个异常的进程中,基本定位问题
-- 和应用调协,这个一个不太重要的日志分区表,有13亿数据量,可以先跳过这张日志表的同步,待追平之后再启用这张表的同步,后续由应用自己处理数据表差异
MAP gz1.MK_*_LOG        ,TARGET gz1.MK_*_LOG;


-- 解决方案步骤:
1 OGG端kill掉应用进程
GGSCI (ogg_db1) 53> kill R_07

2 数据库查看并杀掉大事务,等待完成回滚
SYS@orcl(orcldg): 1> ! kill -9 269166
SYS@orcl(orcldg): 1> ! kill -9 260855

3 应用进程中取消大事务表同步
-- MAP gz1.MK_*_LOG        ,TARGET gz1.MK_*_LOG;

4 启动应用进程
GGSCI (ogg_db1) 53> start R_07
-- 跳过这个大事务日志表之后,应用队列文件从sequence# 6 直接跳到了sequence# 19 (2000M一个队列文件)
-- 之前重启应用进程,应用了3个小时,用了8GB的UNDO空间,日志文件读了2个多一点

5 待追平之后,重新启用日志表同步,停启应用进程
-- 加上冲突解决
MAP gz1.MK_*_LOG        ,TARGET gz1.MK_*_LOG,RESOLVECONFLICT(INSERTROWEXISTS,(DEFAULT,OVERWRITE)),RESOLVECONFLICT(DELETEROWMISSING,(DEFAULT,DISCARD)),RESOLVECONFLICT(UPDATEROWMISSING,(DEFAULT,OVERWRITE));



-- 总结
1 特大分区日志表,删除数据操作维护,需要按照分区、子分区,按主键或rowid范围拆分成小事务再提交,或者直接truncate,避免delete操作
2 OGG对用大事务,长事务的支持不是很友好,应尽量避免大事务,长事务
最后修改时间:2021-08-26 14:44:11
「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论