load data命令将 OBServer 本地文件系统上的 CSV 文件加载到数据库中,在 4.3.0 版本开始支持客户端加载文件命令
load local data。现在有了外部表后,就不用使用这个加载命令了,可以直接将数据通过外部表读取到 OB 的表中。
secure_file_priv为文件所在目录或父目录。为了控制风险,这个变量的修改只能在 OBServer 节点上通过 socket 文件直连 OB 租户时才能修改。
obclient -S /home/admin/oceanbase/run/sql.sock -usys@oboracle -paaAA11__ -c -A tpch
Welcome to the OceanBase. Commands end with ; or \g.
Your OceanBase connection id is 3221490174
Server version: OceanBase 4.3.2.1 (r101010012024091010-76d71bcca930ac06c8f550ed3827136738e6b81f) (Built Sep 10 2024 10:56:57)
Copyright (c) 2000, 2018, OceanBase and/or its affiliates. All rights reserved.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
obclient [TPCH]> set global secure_file_priv='/data/1';
Query OK, 0 rows affected (0.060 sec)
obclient [TPCH]> show global variables like '%secure%';
+------------------+---------+
| VARIABLE_NAME | VALUE |
+------------------+---------+
| secure_file_priv | /data/1 |
+------------------+---------+
1 row in set (0.003 sec)
/data/1/tpch/4.2.0/dbgen/s100[root@server065 s100]# ls -lrth customer.tbl
-rw-r--r-- 1 admin admin 2.3G Sep 28 15:09 customer.tbl
[root@server065 s100]# wc -l customer.tbl
15000000 customer.tbl
CREATE EXTERNAL TABLE ext_customer (
c_custkey number ,
c_name varchar(25) ,
c_address varchar(40) ,
c_nationkey number ,
c_phone char(15) ,
c_acctbal number ,
c_mktsegment char(10) ,
c_comment varchar(117)
)
LOCATION = '/data/1/tpch/4.2.0/dbgen/s100'
FORMAT = (
TYPE = 'CSV'
FIELD_DELIMITER = '|'
)
PATTERN = 'customer.tbl'
;
obclient [TPCH]> CREATE EXTERNAL TABLE ext_customer (
-> c_custkey number ,
-> c_name varchar(25) ,
-> c_address varchar(40) ,
-> c_nationkey number ,
-> c_phone char(15) ,
-> c_acctbal number ,
-> c_mktsegment char(10) ,
-> c_comment varchar(117)
-> )
-> LOCATION = '/data/1/tpch/4.2.0/dbgen/s100'
-> FORMAT = (
-> TYPE = 'CSV'
-> FIELD_DELIMITER = '|'
-> )
-> PATTERN = 'customer.tbl'
-> PARTITION BY HASH(c_custkey) PARTITIONS 4;
ORA-00600: internal error code, arguments: -4007, partition col expr contains non metadata$fileurl pseudo column not supported
obclient [TPCH]> show create table ext_customer\G
*************************** 1. row ***************************
TABLE: EXT_CUSTOMER
CREATE TABLE: CREATE EXTERNAL TABLE "EXT_CUSTOMER" (
"C_CUSTKEY" NUMBER GENERATED ALWAYS AS (METADATA$FILECOL1),
"C_NAME" VARCHAR2(25) GENERATED ALWAYS AS (METADATA$FILECOL2),
"C_ADDRESS" VARCHAR2(40) GENERATED ALWAYS AS (METADATA$FILECOL3),
"C_NATIONKEY" NUMBER GENERATED ALWAYS AS (METADATA$FILECOL4),
"C_PHONE" CHAR(15) GENERATED ALWAYS AS (METADATA$FILECOL5),
"C_ACCTBAL" NUMBER GENERATED ALWAYS AS (METADATA$FILECOL6),
"C_MKTSEGMENT" CHAR(10) GENERATED ALWAYS AS (METADATA$FILECOL7),
"C_COMMENT" VARCHAR2(117) GENERATED ALWAYS AS (METADATA$FILECOL8)
)
LOCATION='file:///data/1/tpch/4.2.0/dbgen/s100'
PATTERN='customer.tbl'
AUTO_REFRESH = OFF
FORMAT (
TYPE = 'CSV',
FIELD_DELIMITER = '|',
ENCODING = 'utf8mb4'
)
1 row in set (0.008 sec)
AUTO_REFRESH就是用于控制外部数据文件内容发生变化时,读取表的时候结果是否立即刷新。这个刷新肯定是有代价的,就看业务场景的需要。也可以设置定时刷新间隔(单位秒)或手动刷新。
obclient [TPCH]> call dbms_external_table.auto_refresh_external_table(60);
Query OK, 0 rows affected (0.162 sec)
obclient [TPCH]> alter external table ext_customer refresh;
Query OK, 0 rows affected (0.032 sec)
obclient [TPCH]> set session ob_enable_show_trace=on;
Query OK, 0 rows affected (0.001 sec)
obclient [TPCH]> select /*+ parallel(4) */ count(*) from ext_customer;
+----------+
| COUNT(*) |
+----------+
| 15000000 |
+----------+
1 row in set (17.634 sec)
obclient [TPCH]> show trace;
+-----------------------------------------------------------+---------------------+--------------+
| OPERATION | START_TIME | ELAPSE_TIME |
+-----------------------------------------------------------+---------------------+--------------+
| obclient | 2024-09-29 20:02:03 | 17634.285 ms |
| ├── com_query_process | 2024-09-29 20:02:03 | 0.257 ms |
| │ └── mpquery_single_stmt | 2024-09-29 20:02:03 | 0.240 ms |
| │ └── sql_compile | 2024-09-29 20:02:03 | 0.173 ms |
| │ └── pc_get_plan | 2024-09-29 20:02:03 | 0.073 ms |
| └── com_query_process | 2024-09-29 20:02:03 | 17633.409 ms |
| └── mpquery_single_stmt | 2024-09-29 20:02:03 | 17633.384 ms |
| ├── sql_compile | 2024-09-29 20:02:03 | 0.108 ms |
| │ └── pc_get_plan | 2024-09-29 20:02:03 | 0.043 ms |
| └── sql_execute | 2024-09-29 20:02:03 | 17624.122 ms |
| ├── open | 2024-09-29 20:02:03 | 0.115 ms |
| ├── response_result | 2024-09-29 20:02:03 | 17623.880 ms |
| │ ├── px_schedule | 2024-09-29 20:02:03 | 10.621 ms |
| │ │ ├── inner_execute_read | 2024-09-29 20:02:03 | 0.374 ms |
| │ │ │ ├── sql_compile | 2024-09-29 20:02:03 | 0.095 ms |
| │ │ │ │ └── pc_get_plan | 2024-09-29 20:02:03 | 0.046 ms |
| │ │ │ ├── open | 2024-09-29 20:02:03 | 0.019 ms |
| │ │ │ └── do_local_das_task | 2024-09-29 20:02:03 | 0.061 ms |
| │ │ ├── close | 2024-09-29 20:02:03 | 0.043 ms |
| │ │ │ ├── close_das_task | 2024-09-29 20:02:03 | 0.012 ms |
| │ │ │ └── end_transaction | 2024-09-29 20:02:03 | 0.005 ms |
| │ │ └── px_task | 2024-09-29 20:02:03 | 17613.154 ms |
| │ │ ├── do_local_das_task | 2024-09-29 20:02:03 | 0.047 ms |
| │ │ └── close_das_task | 2024-09-29 20:02:21 | 0.007 ms |
| │ └── px_schedule | 2024-09-29 20:02:21 | 0.002 ms |
| └── close | 2024-09-29 20:02:21 | 0.080 ms |
| └── end_transaction | 2024-09-29 20:02:21 | 0.016 ms |
+-----------------------------------------------------------+---------------------+--------------+
27 rows in set (0.013 sec)
obclient [TPCH]> explain select /*+ parallel(4) */ count(*) from ext_customer;
+--------------------------------------------------------------------------------------------------------------+
| Query Plan |
+--------------------------------------------------------------------------------------------------------------+
| ===================================================================== |
| |ID|OPERATOR |NAME |EST.ROWS|EST.TIME(us)| |
| --------------------------------------------------------------------- |
| |0 |SCALAR GROUP BY | |1 |400456 | |
| |1 |└─PX COORDINATOR | |4 |400455 | |
| |2 | └─EXCHANGE OUT DISTR |:EX10000 |4 |400454 | |
| |3 | └─MERGE GROUP BY | |4 |400454 | |
| |4 | └─PX BLOCK ITERATOR | |100000 |400000 | |
| |5 | └─EXTERNAL TABLE SCAN|EXT_CUSTOMER|100000 |400000 | |
| ===================================================================== |
| Outputs & filters: |
| ------------------------------------- |
| 0 - output([T_FUN_COUNT_SUM(T_FUN_COUNT(*))]), filter(nil), rowset=256 |
| group(nil), agg_func([T_FUN_COUNT_SUM(T_FUN_COUNT(*))]) |
| 1 - output([T_FUN_COUNT(*)]), filter(nil), rowset=256 |
| 2 - output([T_FUN_COUNT(*)]), filter(nil), rowset=256 |
| dop=4 |
| 3 - output([T_FUN_COUNT(*)]), filter(nil), rowset=256 |
| group(nil), agg_func([T_FUN_COUNT(*)]) |
| 4 - output(nil), filter(nil), rowset=256 |
| 5 - output(nil), filter(nil), rowset=256 |
| access(nil), partitions(p0) |
| is_index_back=false, is_global_index=false, |
| range_key([EXT_CUSTOMER.__file_id], [EXT_CUSTOMER.__line_number]), range(MIN,MIN ; MAX,MAX)always true |
+--------------------------------------------------------------------------------------------------------------+
24 rows in set (0.004 sec)
obclient [TPCH]> insert /*+ parallel(4) enable_parallel_dml append */ into customer select * from ext_customer;
Query OK, 15000000 rows affected (2 min 17.709 sec)
Records: 15000000 Duplicates: 0 Warnings: 0
AK和
SK)。
AK:Access Key ID,接入键标识,用户在OBS系统中的接入键标识,一个接入键标识唯一对应一个用户,一个用户可以同时拥有多个接入键标识。OBS系统通过接入键标识识别访问系统的用户。 SK:Secret Access Key,安全接入键,用户在OBS系统中的安全接入键,是用户访问OBS系统的密钥,用户根据安全接入键和请求头域生成鉴权信息。安全接入键和接入键标识一一对应。

obsutil(阿里云用
ossutil)。下载工具后首先就是配置 OBS 连接。
obsutil config -i=OEKL4************* -k=1K6F******************************* -e=https://obs.cn-north-4.myhuaweicloud.com
[root@server065 ~]# obsutil ls -a
Start at 2024-09-29 10:10:19.518191331 +0000 UTC
Bucket CreationDate Location BucketType
obs://obpilot 2024-09-29T09:02:39Z cn-north-4 OBJECT
Bucket number: 1
[root@server065 ~]# obsutil ls -a obs://obpilot
Start at 2024-09-29 10:10:25.393991546 +0000 UTC
Listing objects .
Folder list:
obs://obpilot/obdata/
obs://obpilot/obdata/tpch/
Object list:
key LastModified Size StorageClass ETag
obs://obpilot/obdata/tpch/customer.tbl 2024-09-29T09:15:56Z 2.29GB standard "73611844e25e28079de3c719befa759f-294"
Total size of bucket: 2.29GB
Folder number: 2
File number: 1
Listing multipart uploads [0].
Upload list:
Key Initiated StorageClass UploadId
obs://obpilot/obdata/tpch/customer.tbl 2024-09-29T09:08:37Z standard 000001923D09FCC6D247B45D48EC0876
Folder number: 0
Upload number: 1
create external table ext_obs_customer (
c_custkey number ,
c_name varchar(25) ,
c_address varchar(40) ,
c_nationkey number ,
c_phone char(15) ,
c_acctbal number ,
c_mktsegment char(10) ,
c_comment varchar(117)
)
format (type = 'CSV')
location 's3://OEKL4************:1K6Fs***********************************@obs.cn-north-4.myhuaweicloud.com/obpilot/obdata/tpch/'
pattern = 'customer.tbl'
;
obclient [TPCH]> select count(*) from ext_obs_customer;
ORA-00600: internal error code, arguments: -9071, no I/O operation permission at the backup destination
obclient [TPCH]> select last_trace_id() from dual;
+-----------------------------------+
| LAST_TRACE_ID() |
+-----------------------------------+
| YB420A000041-000623254B352C3F-0-0 |
+-----------------------------------+
1 row in set (0.002 sec)
obclient [TPCH]> select count(*) from ext_obs_customer;select last_trace_id() from dual;
+----------+
| COUNT(*) |
+----------+
| 15000000 |
+----------+
1 row in set (8 min 37.425 sec)
+-----------------------------------+
| LAST_TRACE_ID() |
+-----------------------------------+
| YB420A000041-0006232554452BFB-0-0 |
+-----------------------------------+
1 row in set (0.000 sec)


更多阅读参考:
文章转载自数据库技术闲谈,如果涉嫌侵权,请发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。




