Oracle如何限制数据泵只在RAC的某个节点上运行

原创杨卓 2022-12-08

2469

文章概述

搜MOS文章看到这篇比较有意思的文章，Oracle数据泵导出和导入的时候使用了并行之后，如果没有加上cluster=n，那么很容易报错。这篇文章中MOS提供了另一个解决方法，虽然不太实用，可以了解一下。

数据泵导入报错ORA-29913 ORA-31640 ORA-19505 ORA-27037

参考
DataPump Import With PARALLEL > 1 In RAC Environment Fails With Errors ORA-29913 ORA-31640 ORA-19505 ORA-27037 (Doc ID 1173765.1)

报错现象

Oracle Database - Enterprise Edition - Version 11.2.0.1 to 11.2.0.1 [Release 11.2]
When importing a dump file using IMPDP where the dump file is stored on a cluster filesystem, the following errors are raised:

ORA-31693: Table data object "<SCHEMA_NAME>"."<TABLE_NAME>":"<PARTITION_NAME>" failed to load/unload and is being skipped due to error:
ORA-29913: error in executing ODCIEXTTABLEOPEN callout
ORA-31640: unable to open dump file "<PATH>/<DUMP_NAME>.dmp" for read
ORA-19505: failed to identify file "<PATH>/<DUMP_NAME>.dmp"
ORA-27037: unable to obtain file status
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3
The IMPDP parameters used are:

directory=<DIRECTORY_NAME>
filesize=30G
full=yes
dumpfile=<DUMP_NAME>%U.dmp
logfile=<LOG_NAME>.log
status=200
parallel=7

CAUSE


The cause of this problem has been identified in unpublished bug 9378256. It is caused by a RAC instance not having physical access to the dump file (as indicated by the DIRECTORY parameter) and using a parallel degree > 1. The problem with this setup is that the PQ/PX processes are not constrained to the RAC instance from which the IMPDP operation has been started, and when such a PQ/PX process doesn't have access to the dump file, the errors are raised.

The behavior is also reported in unpublished Bug 8415620 - DATA PUMP DOES NOT HONOR THE BOUNDARIES OF THE CONNECTED SERVICE ON RAC

DataPump RAC support is provided in 11.2. Prior to 11.2, once you use a service to make the initial connection to the database, an instance is selected and the master process and all worker processes run on that instance. 

In 11.2, the new DataPump parameter CLUSTER is introduced:

CLUSTER : Default=Y

Purpose :
Determines whether Data Pump can use Oracle Real Application Clusters (RAC)
resources and start workers on other Oracle RAC instances.

Syntax and Description : CLUSTER=[Y | N]

SOLUTION

The issue has been fixed in the 11.2.0.2 patchset.

Possible workarounds for the 11.2.0.1 release are:

use the CLUSTER or SERVICE_NAME parameters to constrain the IMPDP job never to run on the instance not having access to the dump file

configure the database parameters in such a way that PQ/PX processes are never started on the instance not having access to the dump file

use the PARALLEL=1 IMPDP command line parameter to disable parallel functionality during import

store all dump files on a local filesystem of one of the RAC nodes and use a parallel degree lower than the number of dump files and import with the CLUSTER=N command line option

mount the cluster filesystem on all nodes and set the parallel degree to be equal to the number of dump files and import with the CLUSTER=N command line option

我们来回溯一下整个过程：
1.数据泵导入使用了并行，没有配置cluster参数;
2.环境是一套RAC,dump文件的目录呢RAC的节点并不是共享的，那么数据泵cluster默认参数是y,那么使用并行之后oracle后台为了负载均衡，可能会让子进程在RAC的不同节点执行导入的动作，那么其它节点的子进程需要读取dump文件，由于文件系统dump对应的目录对于RAC的其它节点操作系统并不共享，因此读取失败，就报错了；
3.官方建议解决办法，也是我们非常熟悉的cluster=n去解决，限定在单个主机上，你的导入的任务的进程就不要跑到其它节点上了；
4.官方有另一个比较有意思的SERVICE_NAME 指向，这个就是本篇文章想分享的内容。同样把MOS文档放出来。

How To Limit The Data Pump Slaves To Only A Handful Of RAC Instances?

参考
(Doc ID 1590699.1)

GOAL
In RAC environment, the Data Pump use all the instances of the database rather than just one instance for all the slaves. This is the default behavior because of cluster=y (by default).
What if you have more than two instances and while you want to run Data Pump slaves in cluster mode, you want to limit them to only a handful of instances only; not all?

SOLUTION
Use the DP parameter: service_name. Create a service name that is defined on only those instances and use it in Data Pump.
Here is an example where you have four instances and you want to use only instances 2 and 3 for the Data Pump job. Create a service name – nodes_2_3 – as shown below:

$ srvctl add service -d <DatabaseName> -s nodes_2_3 -a RAC2,RAC3 -r RAC1,RAC4 -P BASIC


Then you should use the parameter in Data Pump:

cluster=y service_name=nodes_2_3
Now the DP job will use RAC2 and RAC3 instances only, i.e. nodes 2 and 3.

Note: If you start the expdp process on nodes 1 or 4, the Data Pump processes can start there as well, in addition to nodes 2 and 3.

小结：本质上和RAC通过service_name配置主备节点，避免GC的原理是一样的。有兴趣的小伙伴知道有这个技巧就可以了。之前不知道这个，就写个文章记录一下。

「喜欢这篇文章，您的关注和赞赏是给作者最好的鼓励」

关注作者