71. dolphinscheduler资源中心配置详情

大数据技能圈 2023-04-18

256

1. 资源中心配置详情

•资源中心通常用于上传文件、UDF 函数，以及任务组管理等操作。

•资源中心可以对接分布式的文件存储系统，如Hadoop（2.6+）或者MinIO集群，也可以对接远端的对象存储，如AWS S3或者阿里云 OSS等。

•资源中心也可以直接对接本地文件系统。在单机模式下，您无需依赖Hadoop或S3一类的外部存储系统，可以方便地对接本地文件系统进行体验。

•除此之外，对于集群模式下的部署，您可以通过使用S3FS-FUSE将S3挂载到本地，或者使用JINDO-FUSE将OSS挂载到本地等，再用资源中心对接本地文件系统方式来操作远端对象存储中的文件。

1.1 对接本地文件系统

1.1.1 配置 common.properties 文件

Dolphinscheduler 资源中心使用本地系统默认是开启的，不需要用户做任何额外的配置，但是当用户需要对默认配置做修改时，请确保同时完成下面的修改。

•如果以集群模式或者伪集群模式部署DolphinScheduler,需要对以下路径的文件进行配置：

api-server/conf/common.properties 和 worker-server/conf/common.properties；

•若以单机模式部署DolphinScheduler，只需要配置

standalone-server/conf/common.properties，具体配置如下：

可能需要涉及如下的修改：

•将resource.storage.upload.base.path 改为本地存储路径，请确保部署 DolphinScheduler的用户拥有读写权限，例如：resource.storage.upload.base.path=/tmp/dolphinscheduler。当路径不存在时会自动创建文件夹

注意

LOCAL模式不支持分布式模式读写，意味着上传的资源只能在一台机器上使用，除非使用共享文件挂载点

如果您不想用默认值作为资源中心的基础路径，请修改resource.storage.upload.base.path的值。

当配置 resource.storage.type=LOCAL，其实您配置了两个配置项，分别是 resource.storage.type=HDFS 和 resource.hdfs.fs.defaultFS=file:/// ，我们单独配置 resource.storage.type=LOCAL 这个值是为了
方便用户，并且能使得本地资源中心默认开启

1.2 对接分布式或远端对象存储

当需要使用资源中心进行相关文件的创建或者上传操作时，所有的文件和资源都会被存储在分布式文件系统HDFS或者远端的对象存储，如S3上。所以需要进行以下配置：

1.2.1 配置 common.properties 文件

在 3.0.0-alpha 版本之后，如果需要使用到资源中心的 HDFS 或 S3 上传资源，我们需要对以下路径的进行配置：api-server/conf/common.properties 和 worker-server/conf/common.properties。可参考如下：

# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#


# user data local directory path, please make sure the directory exists and have read write permissions
data.basedir.path=/tmp/dolphinscheduler


# resource storage type: LOCAL, HDFS, S3, OSS, GCS
resource.storage.type=LOCAL


# resource store on HDFS/S3/OSS path, resource file will store to this hadoop hdfs path, self configuration,
# please make sure the directory exists on hdfs and have read write permissions. "/dolphinscheduler" is recommended
resource.storage.upload.base.path=/tmp/dolphinscheduler


# The AWS access key. if resource.storage.type=S3 or use EMR-Task, This configuration is required
resource.aws.access.key.id=minioadmin
# The AWS secret access key. if resource.storage.type=S3 or use EMR-Task, This configuration is required
resource.aws.secret.access.key=minioadmin
# The AWS Region to use. if resource.storage.type=S3 or use EMR-Task, This configuration is required
resource.aws.region=cn-north-1
# The name of the bucket. You need to create them by yourself. Otherwise, the system cannot start. All buckets in Amazon S3 share a single namespace; ensure the bucket is given a unique name.
resource.aws.s3.bucket.name=dolphinscheduler
# You need to set this parameter when private cloud s3. If S3 uses public cloud, you only need to set resource.aws.region or set to the endpoint of a public cloud such as S3.cn-north-1.amazonaws.com.cn
resource.aws.s3.endpoint=http://localhost:9000


# alibaba cloud access key id, required if you set resource.storage.type=OSS 
resource.alibaba.cloud.access.key.id=<your-access-key-id>
# alibaba cloud access key secret, required if you set resource.storage.type=OSS
resource.alibaba.cloud.access.key.secret=<your-access-key-secret>
# alibaba cloud region, required if you set resource.storage.type=OSS
resource.alibaba.cloud.region=cn-hangzhou
# oss bucket name, required if you set resource.storage.type=OSS
resource.alibaba.cloud.oss.bucket.name=dolphinscheduler
# oss bucket endpoint, required if you set resource.storage.type=OSS
resource.alibaba.cloud.oss.endpoint=https://oss-cn-hangzhou.aliyuncs.com


# if resource.storage.type=HDFS, the user must have the permission to create directories under the HDFS root path
resource.hdfs.root.user=root
# if resource.storage.type=S3, the value like: s3a://dolphinscheduler;
# if resource.storage.type=HDFS and namenode HA is enabled, you need to copy core-site.xml and hdfs-site.xml to conf dir
resource.hdfs.fs.defaultFS=hdfs://localhost:8020


# whether to startup kerberos
hadoop.security.authentication.startup.state=false


# java.security.krb5.conf path
java.security.krb5.conf.path=/opt/krb5.conf


# login user from keytab username
login.user.keytab.username=hdfs-mycluster@ESZ.COM


# login user from keytab path
login.user.keytab.path=/opt/hdfs.headless.keytab


# kerberos expire time, the unit is hour
kerberos.expire.time=2
# resource view suffixs
#resource.view.suffixs=txt,log,sh,bat,conf,cfg,py,java,sql,xml,hql,properties,json,yml,yaml,ini,js


# resourcemanager port, the default value is 8088 if not specified
resource.manager.httpaddress.port=8088
# if resourcemanager HA is enabled, please set the HA IPs; if resourcemanager is single, keep this value empty
yarn.resourcemanager.ha.rm.ids=192.168.xx.xx,192.168.xx.xx
# if resourcemanager HA is enabled or not use resourcemanager, please keep the default value;
# If resourcemanager is single, you only need to replace ds1 to actual resourcemanager hostname
yarn.application.status.address=http://localhost:%s/ds/v1/cluster/apps/%s
# job history status url when application number threshold is reached(default 10000, maybe it was set to 1000)
yarn.job.history.status.address=http://localhost:19888/ds/v1/history/mapreduce/jobs/%s


# datasource encryption enable
datasource.encryption.enable=false


# datasource encryption salt
datasource.encryption.salt=!@#$%^&*


# data quality option
data-quality.jar.name=dolphinscheduler-data-quality-dev-SNAPSHOT.jar


#data-quality.error.output.path=/tmp/data-quality-error-data


# Network IP gets priority, default inner outer


# Whether hive SQL is executed in the same session
support.hive.oneSession=false


# use sudo or not, if set true, executing user is tenant user and deploy user needs sudo permissions;
# if set false, executing user is the deploy user and doesn't need sudo permissions
sudo.enable=true


# network interface preferred like eth0, default: empty
#dolphin.scheduler.network.interface.preferred=


# network IP gets priority, default: inner outer
#dolphin.scheduler.network.priority.strategy=default


# system env path
#dolphinscheduler.env.path=env/dolphinscheduler_env.sh


# development state
development.state=false


# rpc port
alert.rpc.port=50052


# way to collect applicationId: log(original regex match), aop
appId.collect: log

注意：

•如果只配置了 api-server/conf/common.properties 的文件，则只是开启了资源上传的操作，并不能满足正常使用。如果想要在工作流中执行相关文件则需要额外配置 worker-server/conf/common.properties。

•如果用到资源上传的功能，那么安装部署中，部署用户需要有这部分的操作权限。

•如果 Hadoop 集群的 NameNode 配置了 HA 的话，需要开启 HDFS 类型的资源上传，同时需要将 Hadoop 集群下的 core-site.xml 和 hdfs-site.xml 复制到 worker-server/conf 以及 api-server/conf，非 NameNode HA 跳过次步骤。

更新内容请关注字节智传公众号：

hdfs

文章转载自大数据技能圈，如果涉嫌侵权，请发送邮件至：contact@modb.pro进行举报，并提供相关证据，一经查实，墨天轮将立刻删除相关内容。