由于这个问题是可以重现的,并且当时是在即将进行割接的系统上进行操作,于是很方便的建立测试环境进行测试。老白首先创建一张测试表:
createtable xuji_test tablespace sysaux as select * from dba_objects ; alter table xuji_test parallel 20; select count(*) from xuji_test |
| select/*+ full(a) parallel( a ,50) */ * from xxxx a; |
SQL> show parameterparallel NAME TYPE VALUE recovery_parallelism integer 0 |
SQL> alter session set "_px_trace" = "compilation","execution","messaging"
2会话已更改。
SQL> select count(*) fromxuji_test;
COUNT(*)
----------
61059SQL>
设置了并行执行追踪参数后,一旦执行并行操作,就会生成TRACE文件。我们可以到udump目录下去查看TRACE文件:
kkfdapdml
pgadep:0 pdml mode:0 PQ allowed DML allowed not autonomous => not allowed
kxfplist
Getting instance info for open group
kxfralo
serial - instance group has no open members
~
SQL> show parameterinstance_group
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
instance_groups string sjzzw3,sjzzw31
parallel_instance_group string sjzzw31
好像也没什么问题吧。在这里我被忽悠了很久,也没看出来问题出在哪。这种情况下该如何处理呢?找一个正常的系统去对比一下就可以了。割接前的系统上是能跑并行的,刚刚开发商也用并行在老系统上做了一些数据处理。于是我们把老系统的参数文件和新系统做了一个比较。看到了正常的系统的INSTANCE_GROUPS设置,我才恍然大悟:
SQL> show parameterinstance_group
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
instance_groups string sjzzw1, sjzzw11
parallel_instance_group string sjzzw11
createpfile='/tmp/init.ora' from spfile生成一个pfile来看看这个参数:
有问题的系统的参数设置为:
instance_groups='sjzzw3,sjzzw31'
正确的系统的参数设置为:
instance_groups='sjzzw1','sjzzw11'
SQL> alter session set"_px_trace"="compilation","execution","messaging";
会话已更改。
SQL> alter session setparallel_instance_group='sjzzw3,sjzzw31';
会话已更改。
SQL> select count(*) fromxuji_test;
COUNT(*)
----------
61059
这时再来看TRACE文件:
kkfdapdml
pgadep:0 pdml mode:0 PQ allowed DMLallowed not autonomous => not allowe
d
kxfplist
Getting instance info for open group
kxfrSysInfo
DOP trace -- compute default DOPfrom system info
# instance alive = 1(kxfrsnins)
kxfrDefaultDOP
DOP Trace -- compute default DOP
#CPU = 64
Threads/CPU= 2 ("parallel_threads_per_cpu")
default DOP= 128 (# CPU * Threads/CPU)
default DOP= 128 (DOP * # instance)
kxfrSysInfo
system default DOP = 128 (fromkxfrDefaultDOP())
kxfralo
DOP trace -- requested thread frombest ref obj = 20 (from kxfrIsBestRef
())
kxfralo
threads requested = 20 (fromkxfrComputeThread())
kxfralo
adjusted no. threads = 20 (fromkxfrAdjustDOP())
kxfralo
about to allocate 20 slaves
kxfrAllocSlaves
DOP trace -- call kxfpgsg to get 20slaves
kxfpgsg
num server requested = 20
kxfplist
Getting instance info for open group
kxfpiinfo
inst[cpus:mxslv]
1[64:1000]
kxfpclinfo
inst(load:user:pct:fact)aff
1(3:0:100:2133)
kxfpAdaptDOP
Requested=20 Granted=20 Target=512Load=3 Default=128 users=0 sets=1
kxfpgsggetting 1 sets of 20 threads, client parallel query execution flg=0x30
Height=20, Affinity List Size=0,inst_total=1, coord=1
Insts 1
Threads 20
kxfpg1srv
trying to get slave P000 on instance1
kxfpg1sg
Got It. 1 so far.
kxfpg1srv
trying to get slave P001 on instance1
kxfpg1sg
Got It. 2 so far.
kxfpg1srv
trying to get slave P002 on instance1
kxfpg1sg
Got It. 3 so far.
kxfpg1srv
trying to get slave P003 on instance1
kxfpg1sg
Got It. 4 so far.
kxfpg1srv
trying to get slave P004 on instance1
kxfpg1sg
Got It. 5 so far.
kxfpg1srv
trying to get slave P005 on instance1
kxfpg1sg
Got It. 6 so far.
kxfpg1srv
trying to get slave P006 on instance1
kxfpg1sg
Got It. 7 so far.
kxfpg1srv
trying to get slave P007 on instance1
kxfpg1sg
Got It. 8 so far.
kxfpg1srv
trying to get slave P008 on instance1
kxfpg1sg
Got It. 9 so far.
kxfpg1srv
trying to get slave P009 on instance1
并行执行又回来了。这充分说明了刚才所说的参数设置问题是确实存在的。这个案例大家主要是掌握并行执行的跟踪和分析方法。除了看参数外,通过_PX_TRACE来跟踪并行操作的执行。另外在某些情况下,找个正常的系统,对比参数设置会有帮助。




