暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

GBase南大通用 8a up 连接星环TDH

原创 wanghongyu 2023-12-21
419


1 用户认证连接

1.1 配置星环     

1.1.1 修改参数,需重启

hadoop_security.authentication.oauth2.enable,改为false

1.1.2 创建数据交换目录,加配额

hdfs dfs mkdir /tmp/gbase8up

hdfs dfs chmod 777 /tmp/gbase8up

hdfs dfsadmin -setSpaceQuota 10G /tmp/gbase8up

1.1.3 提供认证用户和密码

提供连接时所需的用户和密码,并赋予建库,建表,增删改查的权限

1.2 操作系统配置

GBase UP集群的所有管理节点都要安装星环客户端,注意:开kerboers和不开kerberos的不一样,需要在星环HD管理页面上下载

  ①安装配置java环境

  ②配置yum

  ③解压tdh客户端安装包,使用root用户执行一键安装脚本

source init.sh

  ④把 source ..../init.sh添加到gbase用户的环境变量(.bash_profile)中去。

  ⑤验证beeline客户端,是否可以连接通星环HD

su - gbase

beeline

 

!connect jdbc:hive2://tdh01:10000

⑥验证hdfs 命令

hdfs dfs –ls /tmp

hdfs dfs touch /tmp/gbase8up/1.txt

1.3 配置GBase UP

1.3.1 修改配置文件gbase_8a_gcluster.cnf,需重启集群

[hive]

# According to the realistic environment to modify those configuration as follows.

hive_hdfs_user_name=ndty01   ###认证用户

# HA zookeeper connect

hive_zookeeper_host=tdh01:2181,tdh02:2181,tdh03:2181   ##zookeeper

# HA namenode

hive_hdfs_cluster_name=nameservice1       ##hadoop集群名称

 

# The NameNode's IP.

hive_hdfs_server_name=tdh03        ##namenodeIP,如果多个,逗号分隔(如果报错,就只写一个)

# The port of HDFS server.

hive_hdfs_server_port=8020 ##HDFS端口

 

# Tha HBase Master's IP

hive_hbase_server_name=n35

 

# Which table to store Blob data.

hive_hbase_table_name=HbaseStream

 

# The local directory to store BlobCache.

hive_blob_cache_path=/opt/gcluster/userdata/blob_cache

 

# Which directory to store Blob data.

hive_hdfs_blob_path=/blobstorage

 

# When data exchange,the minmum hdfs space must be >1G.Default is 10G,units values is in bytes.

hive_hdfs_tmpspace_limit=10G

 

# When the data exchange,the minmum hive space must be 1G-100G.Default is 10G,units values is in bytes.

hive_space_limit=10G

 

# When the data exchange,the minmum of gnode space must be >1G.Default is 10G,units values is in bytes.

hive_gcluster_space_limit=10G

 

# The maximum of connections range 1-64,default 5.

hive_max_connections=5

 

# The number of  connection in pool error,range 0-64,default 3.

hive_max_connections_in_pool=2

 

# The number of retry when get connection error,range 1-10,default 5.

hive_max_try_connect_times=5

 

# The Column Family name of table which to store Blob data.

hive_hbase_col_family_name=file

 

# The port of HBase server.

hive_hbase_server_port=9090

 

# The maximum of connections to HBase is range 1-64,default 5.

hive_hbase_max_connections=5

 

# The maximum length of single blob data,range 2M-16M,default 16M,units values is in bytes.

# If the length of blob data exceed 16M,it will be stored to HDFS.

hive_blob_max_size_in_hbase=16M

 

# The maximum length of single blob data stored to HDFS,range 1G-1T,default 16G,units values is in bytes.

hive_blob_max_size_in_hdfs=16G

 

# The maximum space of BlobCache cached in the local disk,range 1G-16G,default 8G,units values is in bytes.

hive_blob_cache_space_limit=8G

 

# range: 0 or 1

# 0: Will send the blob data stored by HDFS to client.

# 1: First,get the uri from 8a mpp.Then,get the blob data by uri.Finally,send the blob data to client.

hive_blob_send_from_uri=1

 

1.3.2 添加实例

install plugin hive soname 'ha_hive.so';

select create_engine_instance('hive','inst1','param://hive?hostlist=tdh01?port=10000?thriftversion=6?thrifttimeout=60?authmode=1?authtype=PLAIN?mechanism=PLAIN?user=ndty01?password=ndty01');

1.3.3 设置rpc协议

set global gbase_hdfs_protocol=RPC;

-- set global gbase_hdfs_namenodes=tdh02:8020,tdh03:8020;

set global gbase_hdfs_port=8020;

 

2 kerberos认证连接

2.1 配置星环    

2.1.1 修改星环参数,需重启

hadoop_security.authentication.oauth2.enable,改为false

2.1.2 创建数据交换目录,并加配额

hdfs dfs -mkdir /tmp/gbase8up

hdfs dfs -chmod 777 /tmp/gbase8up

hdfs dfsadmin -setSpaceQuota 10G /tmp/gbase8up

2.1.3 提供认证用户、keytab文件

提供连接时所需的用户、keytab文件,并赋予建库,建表,增删改查的权限

2.2 配置操作系统

2.2.1 安装星环客户端

GBase UP集群的所有管理节点都要安装星环客户端,注意:开kerboers和不开kerberos的客户端不一样,需要在星环HD管理页面上下载

  ①安装配置java环境

  ②配置yum

  ③解压tdh客户端安装包,使用root用户执行一键安装脚本

source init.sh

  ④初始化keytab,并将命令直接放入gbase用户的环境变量(.bash_profile)中去,然后把keytab文件拷到每个节点上的相同目录

  kinit -kt /opt/gbase/keytab文件  username

⑤把 source ..../init.sh添加到gbase用户的环境变量(.bash_profile)中去。

  ⑥使用gbase用户,验证beeline客户端,是否可以连接通星环HD,验证建库、建表的权限

  su - gbase

beeline

!connect jdbc:hive2://tdh01:10000/default;principal=hive/tdh01@TDH

⑤验证hdfs 命令的读和写

hdfs dfs -ls /tmp

hdfs dfs -put 1.txt /tmp/gbase8up/

2.2.2 配置gbase环境变量

①创建目录

mkdir -p /home/gbase/tmp

②添加环境变量

vi /home/gbase/.bash_profile中添加

export KRB5CCNAME=/home/gbase/tmp/krb5cache

export KRB5CONF=/etc/krb5.conf

2.2.3 Keytab文件处理

klist -kt /opt/hdfs.headless.keytab命令获取主题名

 

2.3 配置GBase UP

2.3.1 修改配置文件gbase_8a_gbase.cnf,需重启集群

[gbased]

........

gbase_hdfs_auth_mode=kerberos

gbase_hdfs_protocol=RPC

gbase_hdfs_keytab=/opt/ndty01.keytab

#gbase_hdfs_namenodes=tdh02:8020,tdh03:8020

_gbase_hdfs_rpcconfig='dfs.block.access.token.enable=true'

gbase_hdfs_port=8020

gbase_hdfs_principal=ndty01@TDH

2.3.2 添加实例

install plugin hive soname 'ha_hive.so';

select create_engine_instance('hive','inst1','param://hive?hostlist=tdh01?port=10000?thriftversion=6?thrifttimeout=60?authmode=1?authtype=KERBEROS?mechanism=GSSAPI?principal=ndty01@TDH?keytabfile=/opt/ndty01.keytab?serverid=tdh01?protocol=hive?renewtime=7');

2.3.3 修改配置文件gbase_8a_gcluster.cnf,需重启集群

[gbased]

........

gbase_hdfs_auth_mode=kerberos

gbase_hdfs_protocol=RPC

gbase_hdfs_keytab=/opt/ndty01.keytab

#gbase_hdfs_namenodes=tdh02:8020,tdh03:8020

_gbase_hdfs_rpcconfig='dfs.block.access.token.enable=true'

gbase_hdfs_port=8020

gbase_hdfs_principal=ndty01@TDH

.......

[hive]

# According to the realistic environment to modify those configuration as follows.

hive_hdfs_user_name=ndty01   ###认证用户

# HA zookeeper connect

hive_zookeeper_host=tdh01:2181,tdh02:2181,tdh03:2181   ##zookeeper

# HA namenode

hive_hdfs_cluster_name=nameservice1       ##hadoop集群名称

 

# The NameNode's IP.

hive_hdfs_server_name=tdh03        ##namenodeIP,如果多个,逗号分隔(如果报错,就只写一个)

# The port of HDFS server.

hive_hdfs_server_port=8020 ##HDFS端口

 

# Tha HBase Master's IP

hive_hbase_server_name=n35

 

# Which table to store Blob data.

hive_hbase_table_name=HbaseStream

 

# The local directory to store BlobCache.

hive_blob_cache_path=/opt/gcluster/userdata/blob_cache

 

# Which directory to store Blob data.

hive_hdfs_blob_path=/blobstorage

 

# When data exchange,the minmum hdfs space must be >1G.Default is 10G,units values is in bytes.

hive_hdfs_tmpspace_limit=10G

 

# When the data exchange,the minmum hive space must be 1G-100G.Default is 10G,units values is in bytes.

hive_space_limit=10G

 

# When the data exchange,the minmum of gnode space must be >1G.Default is 10G,units values is in bytes.

hive_gcluster_space_limit=10G

 

# The maximum of connections range 1-64,default 5.

hive_max_connections=5

 

# The number of  connection in pool error,range 0-64,default 3.

hive_max_connections_in_pool=2

 

# The number of retry when get connection error,range 1-10,default 5.

hive_max_try_connect_times=5

 

# The Column Family name of table which to store Blob data.

hive_hbase_col_family_name=file

 

# The port of HBase server.

hive_hbase_server_port=9090

 

# The maximum of connections to HBase is range 1-64,default 5.

hive_hbase_max_connections=5

 

# The maximum length of single blob data,range 2M-16M,default 16M,units values is in bytes.

# If the length of blob data exceed 16M,it will be stored to HDFS.

hive_blob_max_size_in_hbase=16M

 

# The maximum length of single blob data stored to HDFS,range 1G-1T,default 16G,units values is in bytes.

hive_blob_max_size_in_hdfs=16G

 

# The maximum space of BlobCache cached in the local disk,range 1G-16G,default 8G,units values is in bytes.

hive_blob_cache_space_limit=8G

 

# range: 0 or 1

# 0: Will send the blob data stored by HDFS to client.

# 1: First,get the uri from 8a mpp.Then,get the blob data by uri.Finally,send the blob data to client.

hive_blob_send_from_uri=1

 

3 注意事项

1. GBase UP不支持ladp认证连接,只支持无认证、用户密码认证、kerberos认证

2. GBase UP直接去星环中建表(text表),无法进行insert,需要先通过beeline创建orc表,再通过gccli命令行create table if not exists

beeline

> create table if not exists t_hive(a int) clustered by (a) into 4 buckets stored as orc tblproperties ("transactional"="true");

gccli

> create table if not exists t_hive(a int)engine=hive.inst1;

3. 元数据同步时,字段必须加not null设置,gccli语法上不兼容 default 默认,如果不设置not null, 会自动识别为 default null

4. gccli不支持创建 (orc\holodesk\parquet\cvs)格式的hive表,语法不通过

「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论