说明:老白将整理一些以前的案例,在公众号上和大家分享。这些案例可能有些已经是几年前,甚至十年前的案例。老白将会对这些案例进行重新的梳理,有些案例当年的处理可能不是最佳的,老白也会做相关的点评。
今天碰到一个奇怪的现象,在一个客户的割接现场,有一些大表,好几个G,需要并行建索引,但是建索引的过程中发现,实际上操作还是串行的。导致索引创建很慢。想要查清为什么不能并行建索引。
由于这个问题是可以重现的,于是老白首先创建一张测试表:
createtable xuji_test tablespace sysaux as select * from dba_objects ;
altertable xuji_test parallel 20;
selectcount(*) from xuji_test;
发现确实没有走并行。查询一张大表
select/*+ full(a) parallel( a 50) */ from xxxx a; 查询执行了10分钟,期间查询V$PX_SESSION发现无并行进程。看样子也没有走并行。
为什么会这样呢?按照常规的分析流程,我们首先查看一下并行执行的相关参数:
SQL> show parameterparallel
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
fast_start_parallel_rollback string LOW
parallel_adaptive_multi_user boolean TRUE
parallel_automatic_tuning boolean FALSE
parallel_execution_message_size integer 2152
parallel_instance_group string sjzzw31
parallel_max_servers integer 1000
parallel_min_percent integer 0
parallel_min_servers integer 10
parallel_server boolean TRUE
parallel_server_instances integer 2
parallel_threads_per_cpu integer 2recovery_parallelism integer 0
SQL>
从并行执行相关的参数来看,似乎也没什么问题。下一步怎么办呢?对于并行查询,如果通过表面现象无法定位问题的时候,我们通过_px_trace参数来进行跟踪。
SQL> alter session set"_px_trace"="compilation","execution","messaging"
2会话已更改。
SQL> select count(*) fromxuji_test;
COUNT(*)
----------
61059SQL>
设置了并行执行追踪参数后,一旦执行并行操作,就会生成TRACE文件。我们可以到udump目录下去查看TRACE文件:
*** ACTION NAME:() 2011-10-2020:43:29.336
*** MODULE NAME:(sqlplus@sjzzw31 (TNS V1-V3)) 2011-10-20 20:43:29.336
*** SERVICE NAME:(SYS$USERS) 2011-10-20 20:43:29.336
*** SESSION ID:(2720.703) 2011-10-20 20:43:29.336
kkfdapdml
pgadep:0 pdml mode:0 PQ allowed DMLallowed not autonomous => not allowe
d
kxfplist
Getting instance info for open group
kxfralo
serial - instance group has no openmembers
~
~
从这里我们看到了一个问题,选择serial的原因是instancegroup has no open members,也就是说我们的parallel_instance_group参数不属于instance_groups中指定的GROUP。这个时候我们再来看看这两个参数:
SQL> show parameterinstance_group
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
instance_groups string sjzzw3,sjzzw31
parallel_instance_group string sjzzw31
好像也没什么问题吧。在这里我被忽悠了很久。后来看了一个正常的系统的INSTANCE_GROUPS设置才恍然大悟:
SQL> show parameterinstance_group
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
instance_groups string sjzzw1, sjzzw11
parallel_instance_group string sjzzw11
咋一看,这二者之间似乎也没什么不同吧。不过细心的读者可能发现了,正确的那个instance_groups的两个组之间有一个空格,这就是区别了。我们通过
createpfile='/tmp/init.ora' from spfile生成一个pfile来看看这个参数:
有问题的系统的参数设置为:
instance_groups='sjzzw3,sjzzw31'
正确的系统的参数设置为:
instance_groups='sjzzw1','sjzzw11'
这回我们看到的结果更清晰一些了。一个是用单引号引起了两个组,一个是每个组用单引号引起来,用逗号分割。第一个的错误之处在于实际上这种设置方法把INSTANCE_GROUPS设置为一个叫做sjzzw3,sjzzw31的组,因为逗号是组成组名的合法字符。
由于设置INSTANCE_GROUPS参数要重启实例,为了解决这个问题,我想出了一个简单的办法,将parallel_instance_group设置为那个带逗号的:
SQL> alter session set"_px_trace"="compilation","execution","messaging";
会话已更改。
SQL> alter session setparallel_instance_group='sjzzw3,sjzzw31';
会话已更改。
SQL> select count(*) fromxuji_test;
COUNT(*)
----------
61059
这时再来看TRACE文件:
*** ACTION NAME:() 2011-10-2020:53:48.616
*** MODULE NAME:(sqlplus@sjzzw31 (TNS V1-V3)) 2011-10-20 20:53:48.616
*** SERVICE NAME:(SYS$USERS) 2011-10-20 20:53:48.616
*** SESSION ID:(4121.314) 2011-10-20 20:53:48.616
kkfdapdml
pgadep:0 pdml mode:0 PQ allowed DMLallowed not autonomous => not allowe
d
kxfplist
Getting instance info for open group
kxfrSysInfo
DOP trace -- compute default DOPfrom system info
# instance alive = 1(kxfrsnins)
kxfrDefaultDOP
DOP Trace -- compute default DOP
#CPU = 64
Threads/CPU= 2 ("parallel_threads_per_cpu")
default DOP= 128 (# CPU * Threads/CPU)
default DOP= 128 (DOP * # instance)
kxfrSysInfo
system default DOP = 128 (fromkxfrDefaultDOP())
kxfralo
DOP trace -- requested thread frombest ref obj = 20 (from kxfrIsBestRef
())
kxfralo
threads requested = 20 (fromkxfrComputeThread())
kxfralo
adjusted no. threads = 20 (fromkxfrAdjustDOP())
kxfralo
about to allocate 20 slaves
kxfrAllocSlaves
DOP trace -- call kxfpgsg to get 20slaves
kxfpgsg
num server requested = 20
kxfplist
Getting instance info for open group
kxfpiinfo
inst[cpus:mxslv]
1[64:1000]
kxfpclinfo
inst(load:user:pct:fact)aff
1(3:0:100:2133)
kxfpAdaptDOP
Requested=20 Granted=20 Target=512Load=3 Default=128 users=0 sets=1
kxfpgsggetting 1 sets of 20 threads, client parallel query execution flg=0x30
Height=20, Affinity List Size=0,inst_total=1, coord=1
Insts 1
Threads 20
kxfpg1srv
trying to get slave P000 on instance1
kxfpg1sg
Got It. 1 so far.
kxfpg1srv
trying to get slave P001 on instance1
kxfpg1sg
Got It. 2 so far.
kxfpg1srv
trying to get slave P002 on instance1
kxfpg1sg
Got It. 3 so far.
kxfpg1srv
trying to get slave P003 on instance1
kxfpg1sg
Got It. 4 so far.
kxfpg1srv
trying to get slave P004 on instance1
kxfpg1sg
Got It. 5 so far.
kxfpg1srv
trying to get slave P005 on instance1
kxfpg1sg
Got It. 6 so far.
kxfpg1srv
trying to get slave P006 on instance1
kxfpg1sg
Got It. 7 so far.
kxfpg1srv
trying to get slave P007 on instance1
kxfpg1sg
Got It. 8 so far.
kxfpg1srv
trying to get slave P008 on instance1
kxfpg1sg
Got It. 9 so far.
kxfpg1srv
trying to get slave P009 on instance1
并行执行又回来了。这充分说明了刚才所说的参数设置问题是确实存在的。
这个案例大家主要是掌握并行执行的跟踪和分析方法。除了看参数外,通过_PX_TRACE来跟踪并行操作的执行。另外在某些情况下,找个正常的系统,对比参数设置会有帮助。