背景
mysql定期归档到oracle12c任务,这个月突然卡住了,mysql 批量生成的insert 脚本执行时hang住了。脚本如下:
#!/bin/bash
$ORACLE_HOME/bin/sql usern/pass >> imp.log <<eof
set define off
show user;
@imp.sql
commit;
set time on;
exit;
eof
问题分析
1、等待事件趋势
第一次排查hang后通过event入手
TO_CHAR(SAMPL SESSION_ID SESSION_SERIAL# EVENT STATUS BLOCKING_SESSION COUNT(*)
------------- ---------- --------------- ---------------------------------------------------------------- ------- ---------------- ----------
2021-08-12 18 2003 9370 latch: shared pool WAITING 1429 1
2021-08-12 18 2003 9370 ON CPU 1912
2021-08-12 19 2003 9370 ON CPU 1833
2021-08-12 20 2003 9370 latch: shared pool WAITING 1706 1
2021-08-12 20 2003 9370 ON CPU 1912
2021-08-12 21 2003 9370 ON CPU 1880
2021-08-12 22 2003 9370 latch: shared pool WAITING 866 2
2021-08-12 22 2003 9370 ON CPU 1900
2021-08-12 23 2003 9370 ON CPU 1917
2021-08-13 00 2003 9370 SQL*Net break/reset to client WAITING 1
2021-08-13 00 2003 9370 ON CPU 1923
TO_CHAR(SAMPL SESSION_ID SESSION_SERIAL# EVENT STATUS BLOCKING_SESSION COUNT(*)
------------- ---------- --------------- ---------------------------------------------------------------- ------- ---------------- ----------
2021-08-13 01 2003 9370 ON CPU 1864
2021-08-13 02 2003 9370 ON CPU 1844
2021-08-13 03 2003 9370 ON CPU 1897
2021-08-13 04 2003 9370 latch: shared pool WAITING 1431 1
2021-08-13 04 2003 9370 ON CPU 1841
2021-08-13 05 2003 9370 ON CPU 1910
2021-08-13 06 2003 9370 ON CPU 881
最后的blcking session 1431,12c⾃动捕获报告特性.但频率不是很高,时间也对应不上,凌晨6点卡住,最后block是凌晨5点
TO_CHAR(SAMPL SESSION_ID SESSION_SERIAL# EVENT STATUS BLOCKING_SESSION MODULE COUNT(*)
------------- ---------- --------------- ---------------------------------------------------------------- ------- ---------------- -------------------- ----------
2021-08-12 15 1431 3223 ON CPU MMON_SLAVE 11
2021-08-12 15 1431 31349 ON CPU MMON_SLAVE 1
2021-08-12 16 1431 55435 ON CPU MMON_SLAVE 1
2021-08-12 16 1431 62320 ON CPU MMON_SLAVE 1
2021-08-12 16 1431 64971 ON CPU MMON_SLAVE 1
2021-08-12 17 1431 49301 ON CPU MMON_SLAVE 1
2021-08-12 17 1431 54968 ON CPU MMON_SLAVE 1
2021-08-12 23 1431 23088 ON CPU MMON_SLAVE 1
2021-08-12 23 1431 38171 ON CPU MMON_SLAVE 1
2021-08-13 02 1431 36114 ON CPU MMON_SLAVE 1
2021-08-13 02 1431 39168 ON CPU MMON_SLAVE 1
TO_CHAR(SAMPL SESSION_ID SESSION_SERIAL# EVENT STATUS BLOCKING_SESSION MODULE COUNT(*)
------------- ---------- --------------- ---------------------------------------------------------------- ------- ---------------- -------------------- ----------
2021-08-13 04 1431 14682 ON CPU MMON_SLAVE 1
2021-08-13 04 1431 20053 ON CPU MMON_SLAVE 1
2021-08-13 05 1431 62271 ON CPU MMON_SLAVE 1
2021-08-13 09 1431 47165 ON CPU MMON_SLAVE 1
2、锁信息
EVENT SQL_ID MODULE STATUS STATE BLOCKING_SESSION LOCKWAIT WAIT_TIME
------------------------------ -------------------- ------------------------------ -------- ------------------- ---------------- ---------------- ----------
SQL*Net message from client java@xxxx (TNS V1-V3) INACTIVE WAITING
从这看很奇怪,会话的等待为SQL*Net message from client,从lock看也没有blocking。
SID TYPE HOLD REQUEST ID1 ID2 CTIME BLOCK_OTHERS
---------- ------------ ------------ ------------ ---------- ---------- ---------- ----------------
4 AE Share None 133 0 6964804 Not Blocking
12 AE Share None 133 0 57237 Not Blocking
864 AE Share None 133 0 71450 Not Blocking
866 AE Share None 133 0 2083 Not Blocking
1139 Redo Thread Exclusive None 1 0 6964808 Not Blocking
1141 AE Share None 133 0 64226 Not Blocking
1423 XR Null None 4 0 6964813 Not Blocking
1423 RD Null None 1 0 6964813 Not Blocking
1423 Control File Row-S (SS) None 0 0 6964813 Not Blocking
1423 RS Row-S (SS) None 25 1 6964808 Not Blocking
1429 AE Share None 133 0 74523 Not Blocking
1706 KT Share None 18974 0 32728 Not Blocking
1706 KD Exclusive None 0 0 6964805 Not Blocking
1717 AE Share None 133 0 0 Not Blocking
1991 Temp Segment Row-X (SX) None 3 1 6964805 Not Blocking
2003 AE Share None 133 0 82454 Not Blocking
2003 DML Row-X (SX) None 328259 0 82451 Not Blocking
2003 Transaction Exclusive None 327704 292546 82451 Not Blocking
3、分析10046 trace ,截取部分内容如下:
SO: 0xb1118e88, type: 109, owner: 0xee4b9740, flag: INIT/-/-/-/0x00 if: 0x3 c: 0x3
proc=0xed7414c8, name=LIBRARY OBJECT LOCK, file=kgl.h LINE:9855, pg=0, conuid=0
LibraryObjectLock: Address=0xb1118e88 Handle=0x89c36e20 Mode=N
CanBeBrokenCount=1 Incarnation=1 ExecutionCount=0
Context=0x7f04ca789660
User=0xee4b9740 Session=0xee4b9740 ReferenceCount=1
Flags=[0000] SavepointNum=0 Time=08/13/2021 06:28:25
LibraryHandle: Address=0x89c36e20 Hash=0 LockMode=N PinMode=0 LoadLockMode=0 Status=VALD
Name: Namespace=SQL AREA(00) Type=CURSOR(00) ContainerId=0
Statistics: InvalidationCount=0 ExecutionCount=1 LoadCount=1 ActiveLocks=1 TotalLockCount=1 TotalPinCount=2
Counters: BrokenCount=1 RevocablePointer=1 KeepDependency=0 Version=0 BucketInUse=0 HandleInUse=0 HandleReferenceCount=0
Concurrency: DependencyMutex=0x89c36ed0(0, 0, 0, 0) Mutex=0x7b9e2598(0, 73, 0, 0)
Flags=RON/PIN/PN0/EXP/CHD/[10012111] Flags2=[0000]
WaitersLists:
Lock=0x89c36eb0[0x89c36eb0,0x89c36eb0]
Pin=0x89c36e90[0x89c36e90,0x89c36e90]
LoadLock=0x89c36f08[0x89c36f08,0x89c36f08]
LibraryObject: Address=0x8bbc4a28 HeapMask=0000-0001-0001-0000 Flags=EXS[0000] Flags2=[0000] PublicFlags=[0000]
NamespaceDump:
Child Cursor: Heap0=0x8bbc4af8 Heap6=0xabc80400 Heap0 Load Time=08-13-2021 06:28:25 Heap6 Load Time=08-13-2021 06:28:25
----------------------------------------
SO: 0xc3005ff8, type: 109, owner: 0xee4b9740, flag: INIT/-/-/-/0x00 if: 0x3 c: 0x3
proc=0xed7414c8, name=LIBRARY OBJECT LOCK, file=kgl.h LINE:9855, pg=0, conuid=0
LibraryObjectLock: Address=0xc3005ff8 Handle=0x7b9e2448 Mode=N
CanBeBrokenCount=1 Incarnation=1 ExecutionCount=1
User=0xee4b9740 Session=0xee4b9740 ReferenceCount=1
Flags=CNB/[0001] SavepointNum=1a1607ca Time=08/13/2021 06:28:25
LibraryHandle: Address=0x7b9e2448 Hash=51d22eff LockMode=N PinMode=0 LoadLockMode=0 Status=VALD
ObjectName: Name=INSERT INTO table—name VALUES (xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx)
10046的trace中并没有发现异常,在对应insert hang的时间点发现最后执行的insert 语句也是正常的。
FullHashValue=b27cdaeb5a2603c11dfc564551d22eff Namespace=SQL AREA(00) Type=CURSOR(00) ContainerId=0 ContainerUid=0 Identifier=1372729087 OwnerIdn=80
Statistics: InvalidationCount=0 ExecutionCount=1 LoadCount=2 ActiveLocks=1 TotalLockCount=1 TotalPinCount=1
Counters: BrokenCount=1 RevocablePointer=1 KeepDependency=1 Version=0 BucketInUse=0 HandleInUse=0 HandleReferenceCount=0
Concurrency: DependencyMutex=0x7b9e24f8(0, 7, 0, 0) Mutex=0x7b9e2598(0, 73, 0, 0)
Flags=RON/PIN/TIM/PN0/DBN/[10012841] Flags2=[0000]
WaitersLists:
Lock=0x7b9e24d8[0x7b9e24d8,0x7b9e24d8]
Pin=0x7b9e24b8[0x7b9e24b8,0x7b9e24b8]
LoadLock=0x7b9e2530[0x7b9e2530,0x7b9e2530]
Timestamp: Current=08-13-2021 06:28:25
HandleReference: Address=0x7b9e26f0 Handle=(nil) Flags=[00]
LibraryObject: Address=0xb3bae630 HeapMask=0000-0001-0001-0000 Flags=EXS[0000] Flags2=[0000] PublicFlags=[0000]
ChildTable: size='16'
Child: id='0' Table=0xb3baf4b0 Reference=0xb3baefa0 Handle=0x89c36e20
NamespaceDump:
Parent Cursor: sql_id=1vz2q8p8x4brz parent=0xb3bae700 maxchild=1 plk=y ppn=n prsfcnt=0 obscnt=0
处理思路
好像没有头绪了,操作系统信息,alert日志等也没有异常。多次测试都在凌晨4 5点发生异常,怀疑是不是这个时间点受oracle job问题。做了如下操作尝试:
禁用sql tuning 与space advisor
exrc dbms_auto_task_admin.disable('auto space advisor',NULL,NULL);
exec dbms_auto_task_admin.disable('sql tuning advisor',NULL,NULL);
并锁住导入表,避免收集统计信息
EXEC DBMS_STATS.delete_table_stats('<owner name>','<table name>');
EXEC DBMS_STATS.LOCK_TABLE_STATS ('owner name', 'table name');
再次尝试仍是相同问题。并且测试换个时间段在白天也复现了。继续返回trace文件分析。上述trace中记录到的最后一条insert 语句是正常的,分析批量脚本。
grep '001' imp.sql -C 3 截取部分数据,001是trace中记录到的最后insert。
INSERT INTO tf_b_sms_receive_202009 VALUES ('001');
001的下一行002 乱码与\' \0
INSERT INTO tf_b_sms_receive_202009 VALUES ('002','�\0�Ip�x(\'\0\0\0\0\0.��\0v�����C����6ўz��\0��]b\".۰F N�О��^#q�','�\0�Ip�x(\'\0\0\0\0\0.��\0v�����C����6ўz��\0��]b\".۰F N�О��^#q�');
直接删除这条语句再试,成功导入。
模拟分析
删除后虽然跑成功了,但还要细看是什么原因导致的,如何避免,几百万行导入不可能从头排查。
$ORACLE_HOME/bin/sql t1/t1 > /home/oracle/test/imp.log <<eof
set define off
show user;
alter session set max_dump_file_size = unlimited;
alter session set events '10046 trace name context forever, level 12';
@/home/oracle/test/hang.sql
commit;
set time on;
exit;
eof
测试环境执行脚本导入并生成10046 trace分析,结果和上述trace相同,仍然到问题行就hang住,且trace中没有记录解析此sql任何信息。
使用sql developer 图形化工具单独执行报错ORA-01756。
'�\0�Ip�x(\'\0\0\0\0\0.��\0v�����C����6ўz��\0��]b\".۰F N�О��^#q�'
尝试将’替换为\。再次执行通过。这其实也是调用$ORACLE_HOME/bin/sql的原因。使用sqlplus是直接报错跳过的,但sql支持更长的insert字符,并且其他的符号兼容相对较好,各有利弊吧。
预防其实很简单,导入前批量替换 \’。如担心误伤,可以通过grep -n 将带\'的sql先抓出来,然后sed -i 删除指定行。
最后修改时间:2021-08-24 15:50:49
「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。




