问题描述
某个朋友公司的客户,友情帮忙分析的。客户使用的是oracle 12c(12.1.0.1),应用通过jdbc访问发现时快时慢。但是通过sqlplus访问发现一切正常。
开始以为是防火墙问题,检查发现防火墙什么的都是禁用掉了,甚至我还修改了selinux=disable,发现问题依旧。
由于之前处理过几个类似的case,都是jdbc版本的问题,因此开始我让他们换几个jdbc版本测试下,发现问题依旧。类似如下结果:
[oracle@12c_single ~]$ /oracle/product/12.1/db_1/jdk/bin/java -jar a16_12101.jar 2013-11-24 18:19:20 start: ?????? ?????? init ?????? ?????? init end. 2013-11-24 18:19:21 [oracle@12c_single ~]$ /oracle/product/12.1/db_1/jdk/bin/java -jar a16_12101.jar 2013-11-24 18:19:23 start: ?????? ?????? init ?????? ?????? init end. 2013-11-24 18:19:43 [oracle@12c_single ~]$ /oracle/product/12.1/db_1/jdk/bin/java -jar a16_12101.jar 2013-11-24 18:19:48 start: ?????? ?????? init ?????? ?????? init end. 2013-11-24 18:19:56 [oracle@12c_single ~]$ /oracle/product/12.1/db_1/jdk/bin/java -jar a16_12101.jar 2013-11-24 18:19:59 start: ?????? ?????? init ?????? ?????? init end. 2013-11-24 18:20:17
专家解答
后面我通过strace 跟踪发现了一些蛛丝马迹,如下的跟踪的结果:
strace -fr /oracle/product/12.1/db_1/jdk/bin/java -jar a16_12101.jar 。。。。。 [pid 17242] 0.000142 stat("/dev/random", {st_mode=S_IFCHR|0666, st_rdev=makedev(1, 8), ...}) = 0 [pid 17242] 0.000084 stat("/dev/urandom", {st_mode=S_IFCHR|0666, st_rdev=makedev(1, 9), ...}) = 0 [pid 17242] 0.000376 lseek(3, 46548757, SEEK_SET) = 46548757 [pid 17242] 0.000037 read(3, "PK\3\4\n\0\0\0\0\0\201^8A\275}\357^K\16\0\0K\16\0\0/\0\0\0", 30) = 30 [pid 17242] 0.000046 lseek(3, 46548834, SEEK_SET) = 46548834 [pid 17242] 0.000032 read(3, "\312\376\272\276\0\0\0001\0\245\10\0\r\10\0\22\10\0)\10\0.\10\0:\10\0@\1\0\3("..., 3659) = 3659 [pid 17242] 0.000275 open("/dev/random", O_RDONLY) = 10 [pid 17242] 0.000042 fstat(10, {st_mode=S_IFCHR|0666, st_rdev=makedev(1, 8), ...}) = 0 [pid 17242] 0.000033 fcntl(10, F_GETFD) = 0 [pid 17242] 0.000023 fcntl(10, F_SETFD, FD_CLOEXEC) = 0 [pid 17242] 0.000055 open("/dev/urandom", O_RDONLY) = 11 [pid 17242] 0.000032 fstat(11, {st_mode=S_IFCHR|0666, st_rde ........ [pid 17250] 0.000044 futex(0x7fb5ec0b9528, FUTEX_WAKE_PRIVATE, 1) = 0 [pid 17250] 0.000058 futex(0x40110734, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 1, {1385288524, 652223000}, ffffffff) = -1 ETIMEDOUT (Connection timed out) [pid 17250] 0.050426 futex(0x7fb5ec0b9528, FUTEX_WAKE_PRIVATE, 1) = 0 [pid 17250] 0.000060 futex(0x40110734, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 1, {1385288524, 702709000}, ffffffff) = -1 ETIMEDOUT (Connection timed out) [pid 17250] 0.050573 futex(0x7fb5ec0b9528, FUTEX_WAKE_PRIVATE, 1) = 0 [pid 17250] 0.000044 futex(0x40110734, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 1, {1385288524, 753331000}, ffffffff) = -1 ETIMEDOUT (Connection timed out) [pid 17250] 0.050433 futex(0x7fb5ec0b9528, FUTEX_WAKE_PRIVATE, 1) = 0 ........
到futex 这里一直timeout,大概10多秒。其中的open(“/dev/random”, O_RDONLY) 引起了我的注意,google搜了一下,还真有不少人遇到过。
Oracle 从11g开始,对于jdbc这块儿安全上进行了加强,大概是这样的一个解释:
The JDBC 11g needs about 40 bytes of secure random numbers, gathered from /dev/random, to encrypt its connect string.
那么解决方法就是将java_home下面的 Java.security文件中的如下内容进行修改:
securerandom.source=file:/dev/random 修改为:securerandom.source=file:/dev/urandom
当我客户检查时,发现这个配置文件已经是securerandom.source=file:/dev/urandom 了。到这里我似乎感觉是jdbc版本的问题了或者是12c本身的问题。
将客户的jar把传到自己的12.1.0.1和12.1.0.2环境中进行测试,发现现象一样,时快时慢。
通过strace跟踪了一下,发现信息跟之前在客户环境中的strace结果类似,这是很怪异的。
后面我怀疑可能是这个配置文件并没有起作用,最后搜了下mos发现有一篇文档:
How To Configure Database JVM (JavaVM) To Use /dev/urandom (In Order To Avoid JDBC Connection Delays Due To Lack Of Random Number Entropy) (ID 1594701.1)
里面的建议是直接修改配置文件,如下:
1) Set a system property in your Java code: System.setProperty("java.security.egd", "file:///dev/urandom"); // the 3 '/' are important to make it a URL 2) Set a database system property: declare c_property varchar2(32767); begin c_property := dbms_java.set_property('java.security.egd', '/dev/urandom'); end; The database user needs a privilege in order to execute this call: begin dbms_java.grant_permission( grantee => '{Database Schema}', permission_type => 'SYS:java.util.PropertyPermission', permission_name => 'java.security.egd', permission_action => 'read,write' ); end;
根据这个docs进行修改之后,再次测试,我发现一切正常了。。。。 如下是测试过程:
[oracle@12c_single ~]$ /oracle/product/12.1/db_1/jdk/bin/java -jar a16_12101_new.jar 2013-11-24 19:06:41 start: ?????? ?????? init ?????? ?????? init end. 2013-11-24 19:06:42 [oracle@12c_single ~]$ /oracle/product/12.1/db_1/jdk/bin/java -jar a16_12101_new.jar 2013-11-24 19:06:43 start: ?????? ?????? init ?????? ?????? init end. 2013-11-24 19:06:43 [oracle@12c_single ~]$ /oracle/product/12.1/db_1/jdk/bin/java -jar a16_12101_new.jar 2013-11-24 19:06:45 start: ?????? ?????? init ?????? ?????? init end. 2013-11-24 19:06:45
这个case本身是并不复杂,比较简单,跟大家分享一下,欢迎交流!
注意:这里最好是使用oracle自己的java,保持版本一致,我这里测试发现如果使用os自己的java,版本较低,连接仍然会比较慢。
[oracle@12c_single ~]$ java -jar a16_12101_new.jar 2013-11-24 19:09:20 start: ?????? ?????? init ?????? ?????? init end. 2013-11-24 19:09:46 [oracle@12c_single ~]$ java -version java version "1.6.0_22" OpenJDK Runtime Environment (IcedTea6 1.10.4) (rhel-1.41.1.10.4.el6-x86_64) OpenJDK 64-Bit Server VM (build 20.0-b11, mixed mode)
这个版本很明显是低于Oracle 12.1.0.1 官方文档中的要求的,必须是1.6.0_37以上版本。