今天朋友遇到一job的问题,问题如下:
2013-05-31 00:02:03.346000 +08:00
Errors in file /oracle/app/oracle/diag/rdbms/param/PARAM1/trace/PARAM1_j000_14550298.trc:
ORA-12012: 自动执行作业 181 出错
ORA-12005: 不能安排过去时间的自动刷新
2013-05-31 00:06:03.458000 +08:00
Errors in file /oracle/app/oracle/diag/rdbms/param/PARAM1/trace/PARAM1_j001_40371792.trc:
ORA-12012: 自动执行作业 181 出错
ORA-12005: 不能安排过去时间的自动刷新
2013-05-31 00:14:04.258000 +08:00
Errors in file /oracle/app/oracle/diag/rdbms/param/PARAM1/trace/PARAM1_j002_263860.trc:
ORA-12012: 自动执行作业 181 出错
ORA-12005: 不能安排过去时间的自动刷新
2013-05-31 00:30:05.914000 +08:00
Errors in file /oracle/app/oracle/diag/rdbms/param/PARAM1/trace/PARAM1_j001_33424922.trc:
ORA-12012: 自动执行作业 181 出错
ORA-12005: 不能安排过去时间的自动刷新
2013-05-31 01:02:08.932000 +08:00
Errors in file /oracle/app/oracle/diag/rdbms/param/PARAM1/trace/PARAM1_j002_38798042.trc:
ORA-12012: 自动执行作业 181 出错
ORA-12005: 不能安排过去时间的自动刷新
2013-05-31 02:06:09.112000 +08:00
Errors in file /oracle/app/oracle/diag/rdbms/param/PARAM1/trace/PARAM1_j007_26740206.trc:
ORA-12012: 自动执行作业 181 出错
ORA-12005: 不能安排过去时间的自动刷新
2013-05-31 06:33:46.476000 +08:00
Errors in file /oracle/app/oracle/diag/rdbms/param/PARAM1/trace/PARAM1_ora_19793338.trc:
ORA-00245: control file backup failed; target is likely on a local file system
2013-05-31 08:30:11.718000 +08:00
Errors in file /oracle/app/oracle/diag/rdbms/param/PARAM1/trace/PARAM1_j001_65667362.trc:
ORA-12012: 自动执行作业 181 出错
ORA-12005: 不能安排过去时间的自动刷新
查看JOB状态:
Job ID LOG_USER PRIV_USER SCHEMA_USER Last Run Date THIS_DATE Next Run Date TOTAL_TIME JOB Interval Failures What
-------- ------------------------------ ------------------------------ ------------------------------ ------------------- ------------------- ------------------- ---------- --- ---------------------------------------- ---------- ----------------------------------------
181 COMM COMM COMM 2013-05-01 10:06:18 2013-05-31 17:02:11 4687 NO trunc(add_months(sysdate,1),'MM')-1 9 del_b_info_capability_intf;
他觉得这里比较奇怪的问题有两个:
1、Next Run Date是2013-05-31 17:02:11 ,为什么这里从凌晨00:02就可以报错了
2、为什么会报错,如果只单独执行这个JOB里面的存储过程时正常的,不会有错
为此在自己的虚拟机上尝试重现该错误,看是否能找到以上两个问题的答案。
一、创建测试表以及测试的存储过程
---给予测试用户相关权限
grant EXECUTE on sys.dbms_lock to hcn;
grant execute on SYS.DBMS_JOB to hcn;
----创建测试表
create table hcn.job_test(t_time date,t_status varchar2(30));
-----创建测试用的存储过程
create or replace procedure hcn.test_job
as
begin
insert into hcn.job_test select sysdate,'Start' from dual;
sys.dbms_lock.sleep(300);
insert into hcn.job_test select sysdate,'Stop' from dual;
commit;
end;
/
SQL> select * from job_test;
no rows selected
SQL>
SQL>
SQL>
----测试存储过程是否能执行
SQL> exec test_job;
PL/SQL procedure successfully completed.
SQL>
SQL> select to_char(T_TIME,'yyyy-mm-dd hh24:mi:ss') t_time,t_status from job_test;
T_TIME T_STATUS
------------------- ------------------------------
2013-05-31 16:50:44 Start
2013-05-31 16:55:44 Stop
SQL>
SQL>
SQL> truncate table job_test;
Table truncated.
SQL>
SQL> select to_char(T_TIME,'yyyy-mm-dd hh24:mi:ss') t_time,t_status from job_test;
no rows selected
SQL>
二、修改虚拟机主机日期为2013年5月30日(当前为31日),在关闭数据库的情况下设置
[root@ora10g ~]#
[root@ora10g ~]# date -s 5/30/13
Thu May 30 00:00:00 CST 2013
[root@ora10g ~]#
[root@ora10g ~]#
[root@ora10g ~]# date
Thu May 30 00:00:02 CST 2013
[root@ora10g ~]#
[root@ora10g ~]#
[root@ora10g ~]#
[root@ora10g ~]# date -s 17:28:30
Thu May 30 17:28:30 CST 2013
[root@ora10g ~]#
[root@ora10g ~]#
[root@ora10g ~]#
[root@ora10g ~]# date
Thu May 30 17:28:33 CST 2013
[root@ora10g ~]#
三、启动数据库后设置JOB
SQL>
----查看当前是否有job
SQL> set lines 200
SQL> col what for a40
SQL> col interval for a40
SQL> select job,
2 log_user,
3 to_char(last_date,'yyyy-mm-dd hh24:mi:ss') last_date,
4 to_char(next_date,'yyyy-mm-dd hh24:mi:ss') next_date,
5 total_time,
6 interval,
7 what
8 from user_jobs;
no rows selected
SQL>
SQL>
---设置JOB
SQL> VARIABLE jobno NUMBER;
SQL> BEGIN
2 DBMS_JOB.SUBMIT(:jobno, 'test_job;', SYSDATE, 'trunc(add_months(sysdate,1),''MM'')- 1');
3 COMMIT;
4 END;
5 /
PL/SQL procedure successfully completed.
SQL>
SQL>
SQL>
等该JOB执行完后查看
SQL>
SQL> set lines 200
SQL> col what for a40
SQL> col interval for a40
SQL> select job,
2 log_user,
3 to_char(last_date,'yyyy-mm-dd hh24:mi:ss') last_date,
4 to_char(next_date,'yyyy-mm-dd hh24:mi:ss') next_date,
5 total_time,
6 interval,
7 what
8 from user_jobs;
JOB LOG_USER LAST_DATE NEXT_DATE TOTAL_TIME INTERVAL WHAT
---------- ------------------------------ ------------------- ------------------- ---------- ---------------------------------------- ----------------------------------------
23 HCN 2013-05-30 17:40:07 2013-05-31 00:00:00 300 trunc(add_months(sysdate,1),'MM')- 1 test_job;
SQL>
---计算出来的next_date为2013-05-31 00:00:00,这里能解释第一个问题为什么00:02分的时候就开始报错了
SQL>
SQL>
SQL> select to_char(T_TIME,'yyyy-mm-dd hh24:mi:ss') t_time,t_status from job_test;
T_TIME T_STATUS
------------------- ------------------------------
2013-05-30 17:40:07 Start
2013-05-30 17:45:07 Stop
SQL>
由于该Job的Interval设置的是trunc(add_months(sysdate,1),'MM')-1,所以会计算出NEXT_DATE的时间为最后一次JOB执行时那个月的最后一天的00:00,所以在最初看到alert里面报错的信息是从00:02开始报的。为什么查到的next_date不是00:02而是2013-05-31 17:02:11,继续看下面的内容。
四、关闭数据库,修改操作系统时间为当前的真实时间
[root@ora10g ~]#
[root@ora10g ~]# date -s 5/31/13
Fri May 31 00:00:00 CST 2013
[root@ora10g ~]# date -s 17:55:30
Fri May 31 17:55:30 CST 2013
[root@ora10g ~]#
[root@ora10g ~]#
[root@ora10g ~]#
[root@ora10g ~]# date
Fri May 31 17:55:31 CST 2013
[root@ora10g ~]#
五、启动库,查看job状态
SQL>
SQL> set lines 200
SQL> col what for a40
SQL> col interval for a40
SQL> select job,
2 log_user,
3 to_char(last_date,'yyyy-mm-dd hh24:mi:ss') last_date,
4 to_char(next_date,'yyyy-mm-dd hh24:mi:ss') next_date,
5 total_time,
6 interval,
7 what
8 from user_jobs;
JOB LOG_USER LAST_DATE NEXT_DATE TOTAL_TIME INTERVAL WHAT
---------- ------------------------------ ------------------- ------------------- ---------- ---------------------------------------- ----------------------------------------
23 HCN 2013-05-30 17:40:07 2013-05-31 18:00:36 300 trunc(add_months(sysdate,1),'MM')- 1 test_job;
SQL>
----这里的Next_DATE已经变为了2013-05-31 18:00:36这里,是我当前的时间
SQL>
SQL>
SQL> select to_char(T_TIME,'yyyy-mm-dd hh24:mi:ss') t_time,t_status from job_test;
no rows selected
SQL>
Fri May 31 17:58:36 CST 2013
Errors in file /u01/oracle/admin/hcndb/bdump/hcndb_j000_5226.trc:
ORA-12012: error on auto execute of job 23
ORA-12005: may not schedule automatic refresh for times in the past
[oracle@ora10g bdump]$
[oracle@ora10g bdump]$
[oracle@ora10g bdump]$
[oracle@ora10g bdump]$ cat /u01/oracle/admin/hcndb/bdump/hcndb_j000_5226.trc
/u01/oracle/admin/hcndb/bdump/hcndb_j000_5226.trc
Oracle Database 10g Enterprise Edition Release 10.2.0.5.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
ORACLE_HOME = /u01/oracle/product/10.2.0/db_1
System name: Linux
Node name: ora10g.oracle.com
Release: 2.6.32-200.13.1.el5uek
Version: #1 SMP Wed Jul 27 21:02:33 EDT 2011
Machine: x86_64
Instance name: hcndb
Redo thread mounted by this instance: 1
Oracle process number: 27
Unix process pid: 5226, image: oracle@ora10g.oracle.com (J000)
*** SERVICE NAME:(SYS$USERS) 2013-05-31 17:58:36.349
*** SESSION ID:(158.9) 2013-05-31 17:58:36.349
*** 2013-05-31 17:58:36.349
ORA-12012: error on auto execute of job 23
ORA-12005: may not schedule automatic refresh for times in the past
[oracle@ora10g bdump]$
错误再现,O(∩_∩)O哈!。现在来介绍下第二个问题,为啥单独执行存储过程是对的,一用到job中就出问题了。
主要原因还是在Interval设置为trunc(add_months(sysdate,1),'MM')-1这个上,当通过Interval计算NEXT_DATE的时候,发现计算出来的值(2013-05-31 00:00:00)已经在当前时间之前(2013-05-31 18:00:36),此时就会报以下错误:
ORA-12012: error on auto execute of job 23
ORA-12005: may not schedule automatic refresh for times in the past
回到第一个问题:
Next Run Date是2013-05-31 17:02:11 ,为什么这里从凌晨00:02就可以报错了
为什么Next Run Date为显示为2013-05-31 17:02:11 ,后面的凌晨00:02执行报错的原因已经解释。
耐心地等待一段时间后,查看告警日志以及JOB的状态
SQL>
SQL> set lines 200
SQL> col what for a40
SQL> col interval for a40
SQL> select job,
2 log_user,
3 to_char(last_date,'yyyy-mm-dd hh24:mi:ss') last_date,
4 to_char(next_date,'yyyy-mm-dd hh24:mi:ss') next_date,
5 total_time,
6 interval,
7 failures,
8 what
9 from user_jobs;
JOB LOG_USER LAST_DATE NEXT_DATE TOTAL_TIME INTERVAL FAILURES WHAT
---------- ------------------------------ ------------------- ------------------- ---------- ---------------------------------------- ---------- ----------------------------------------
23 HCN 2013-05-30 17:40:07 2013-05-31 18:12:36 300 trunc(add_months(sysdate,1),'MM')- 1 3 test_job;
SQL>
SQL>
SQL>
Fri May 31 17:58:36 CST 2013
Completed: ALTER DATABASE OPEN
Fri May 31 17:58:36 CST 2013
db_recovery_file_dest_size of 2048 MB is 7.86% used. This is a
user-specified limit on the amount of space that will be used by this
database for recovery-related files, and does not reflect the amount of
space available in the underlying filesystem or ASM diskgroup.
Fri May 31 17:58:36 CST 2013
Errors in file /u01/oracle/admin/hcndb/bdump/hcndb_j000_5226.trc:
ORA-12012: error on auto execute of job 23
ORA-12005: may not schedule automatic refresh for times in the past
Fri May 31 18:00:36 CST 2013
Errors in file /u01/oracle/admin/hcndb/bdump/hcndb_j000_5245.trc:
ORA-12012: error on auto execute of job 23
ORA-12005: may not schedule automatic refresh for times in the past
Fri May 31 18:04:36 CST 2013
Errors in file /u01/oracle/admin/hcndb/bdump/hcndb_j000_5261.trc:
ORA-12012: error on auto execute of job 23
ORA-12005: may not schedule automatic refresh for times in the past
[oracle@ora10g bdump]$
[oracle@ora10g bdump]$
[oracle@ora10g bdump]$
SQL>
SQL> set lines 200
SQL> col what for a40
SQL> col interval for a40
SQL> select job,
2 log_user,
3 to_char(last_date,'yyyy-mm-dd hh24:mi:ss') last_date,
4 to_char(next_date,'yyyy-mm-dd hh24:mi:ss') next_date,
5 total_time,
6 interval,
7 failures,
8 broken,
9 what
10 from user_jobs;
JOB LOG_USER LAST_DATE NEXT_DATE TOTAL_TIME INTERVAL FAILURES B WHAT
---------- ------------------------------ ------------------- ------------------- ---------- ---------------------------------------- ---------- - ----------------------------------------
23 HCN 2013-05-30 17:40:07 2013-05-31 18:28:36 300 trunc(add_months(sysdate,1),'MM')- 1 4 N test_job;
SQL>
SQL>
alert日志中,会发现job的错误一直在持续,查看JOB状态后,发现FAILURES次数为3和FAILURES次数为4的时候,NEXT_DATE的值从2013-05-31 18:12:36变为了2013-05-31 18:28:36,当JOB执行失败后,还在反复的重复尝试,随着尝试次数的增加,NEXT_DATE的时间也开始呈现增长的趋势。主要的原因是当一个job运行时返回一个error是,oracle会再次设法执行它,第一次尝试是在一分钟后,第二次是在2分钟后,第三次是4分钟后,以此类推。
每次尝试的周期加倍,直到下一次的运行时刻来到,又再次尝试,当尝试执行失败16次时,oracle自动标示此job为broken,不再执行此任务。
而且从最开始的告警日志中也能从报错的时间上来发现这个趋势。
原因找到,下面就是如何避免该问题,主要的方法还是通过修改INTERVAL的值来进行,此次不说明。感谢朋友提供的案例,O(∩_∩)O哈!




