暂无图片
暂无图片
2
暂无图片
暂无图片
暂无图片

Doris 测试环境安装

2258

image.png
大家好,最近公司在做新型数据仓库选型来代替老一代hadoop平台那套东西: 候选的数仓平台有 Doris , GreenPlum 等等。

这篇文章是Doris基础安装部署的, 为公司后续 RDBMS(oracle/MYSQL/PG) + finkCDC + Doris 的架构提供测试环境。

我们准备3台VM测试机器:
(测试环境我们采用的FE/BE混装的方式,生产环境官网建议分布在不同的机器上,做到资源分离)

Server IP CPU RAM Component
10.67.38.170 8 core 32GB FE/BE
10.67.38.171 8 core 32GB FE/BE
10.67.38.173 8 core 32GB FE/BE

关于Doris FE/BE的官方架构图:

Image.png

OS 环境的要求:

1)JDK 1.8+,GCC 4.8.2+

JDK下载: https://www.oracle.com/java/technologies/javase/javase8-archive-downloads.html#license-lightbox

tar -xvf jdk-8u202-linux-x64.tar.gz vi .bash_profile if [ -f ~/.bashrc ]; then . ~/.bashrc fi JAVA_HOME=$HOME/jdk1.8.0_202 # User specific environment and startup programs PATH=$PATH:$HOME/.local/bin:$HOME/bin:$JAVA_HOME/bin export PATH --------------------------------------------------------- source .bash_profile java -version

2)设置系统最大打开文件句柄数

vi /etc/security/limits.conf * soft nofile 65536 * hard nofile 65536

3)时钟同步:配置NTP 服务

sudo yum install ntp ntpdate && \ sudo systemctl start ntpd.service && \ sudo systemctl enable ntpd.service

4)关闭 SWAP交换内存

echo "vm.swappiness = 0">> /etc/sysctl.conf swapoff -a && swapon -a sysctl -p

5)网络端口规划:

Image.png

关闭防火墙

sudo systemctl stop firewalld.service sudo systemctl disable firewalld.service

Doris 软件的下载与安装:https://github.com/apache/doris/releases

我们下载2.02版本

wget https://apache-doris-releases.oss-accelerate.aliyuncs.com/apache-doris-2.0.2-bin-x64.tar.gz

我们来配置一个FE的节点:

解压软件:

tar -xf apache-doris-2.0.2-bin-x64.tar.gz

修改FE的配置文件

cd apache-doris-2.0.2-bin-x64/fe/conf/ 修改参数: priority_networks=10.67.38.0/24 meta_dir = ${DORIS_HOME}/doris-meta

启动FE:

./bin/start_fe.sh --daemon

我们测试一下web端口服务:http://10.67.38.170:8030/login(默认管理员的账号root,密码为空)

Image.png

我们可以通过mysql 客户端命令去连接doris: 这里doris 对外暴露mysql协议的端口是 9030

/opt/mysql/product/percona8.0/bin/mysql -h 10.67.38.170 -u root -P 9030

查询FE的状态

mysql> show frontends\G; *************************** 1. row *************************** Name: fe_ee06b840_41cf_43fd_819c_333b08e26f98 Host: 10.67.38.170 EditLogPort: 9010 HttpPort: 8030 QueryPort: 9030 RpcPort: 9020 Role: FOLLOWER IsMaster: true ClusterId: 688693497 Join: true Alive: true ReplayedJournalId: 20898 LastHeartbeat: 2023-11-28 13:24:28 IsHelper: true ErrMsg: Version: doris-2.0.2-rc05-ae923f7 CurrentConnected: Yes 1 row in set (0.05 sec)

接下来,我们在同一个节点启动BE服务:

创建BE的存储路径:

mkdir -p /data/doris/be/data

修改BE文件夹的be.conf文件:

JAVA_HOME=/home/doris/jdk1.8.0_202 priority_networks = 10.67.38.0/24 storage_root_path=/data/doris/be/data

启动BE:

./bin/start_be.sh --daemon

启动成功后,我们可以看见BE暴露的端口:

INFRA [doris@ljzdccapp006 log]# netstat -nltp | grep be (Not all processes could be identified, non-owned process info will not be shown, you would have to be root to see it all.) tcp 0 0 0.0.0.0:9050 0.0.0.0:* LISTEN 128495/doris_be tcp 0 0 0.0.0.0:8060 0.0.0.0:* LISTEN 128495/doris_be tcp 0 0 0.0.0.0:9060 0.0.0.0:* LISTEN 128495/doris_be tcp 0 0 0.0.0.0:8040 0.0.0.0:* LISTEN 128495/doris_be

同样我们可以通过webpage查看BE节点的信息: http://10.67.38.170:8040/

Image.png

下一步,我们通过FE的mysql命令台把BE的节点加入到 doris的集群中:

mysql> ALTER SYSTEM ADD BACKEND "10.67.38.170:9050"; Query OK, 0 rows affected (0.02 sec) mysql> SHOW BACKENDS\G *************************** 1. row *************************** BackendId: 24505 Host: 10.67.38.170 HeartbeatPort: 9050 BePort: 9060 HttpPort: 8040 BrpcPort: 8060 LastStartTime: 2023-11-28 14:03:16 LastHeartbeat: 2023-11-28 14:13:10 Alive: true SystemDecommissioned: false TabletNum: 0 DataUsedCapacity: 0.000 TrashUsedCapcacity: 0.000 AvailCapacity: 1.000 B TotalCapacity: 0.000 UsedPct: 0.00 % MaxDiskUsedPct: 0.00 % RemoteUsedCapacity: 0.000 Tag: {"location" : "default"} ErrMsg: Version: doris-2.0.2-rc05-ae923f7 Status: {"lastSuccessReportTabletsTime":"N/A","lastStreamLoadTime":-1,"isQueryDisabled":false,"isLoadDisabled":false} HeartbeatFailureCounter: 0 NodeRole: mix 1 row in set (0.00 sec)

相同的道理,我们启动另外2个节点的BE服务:
重复上面的操作即可:

mkdir -p /data/doris/be/data JAVA_HOME=/home/doris/jdk1.8.0_202 priority_networks = 10.67.38.0/24 storage_root_path=/data/doris/be/data ./bin/start_be.sh --daemon

我们把新创建启动的2个BE节点,加入到FE的服务中:

mysql> ALTER SYSTEM ADD BACKEND "10.67.38.171:9050"; Query OK, 0 rows affected (0.00 sec) mysql> ALTER SYSTEM ADD BACKEND "10.67.38.173:9050"; Query OK, 0 rows affected (0.00 sec)

查看3个BE节点的运行情况:

mysql> SHOW BACKENDS\G *************************** 1. row *************************** BackendId: 24505 Host: 10.67.38.170 HeartbeatPort: 9050 BePort: 9060 HttpPort: 8040 BrpcPort: 8060 LastStartTime: 2023-11-28 14:03:16 LastHeartbeat: 2023-11-28 14:58:57 Alive: true SystemDecommissioned: false TabletNum: 0 DataUsedCapacity: 0.000 TrashUsedCapcacity: 18.174 KB AvailCapacity: 66.004 GB TotalCapacity: 195.862 GB UsedPct: 66.30 % MaxDiskUsedPct: 66.30 % RemoteUsedCapacity: 0.000 Tag: {"location" : "default"} ErrMsg: Version: doris-2.0.2-rc05-ae923f7 Status: {"lastSuccessReportTabletsTime":"2023-11-28 14:58:22","lastStreamLoadTime":-1,"isQueryDisabled":false,"isLoadDisabled":false} HeartbeatFailureCounter: 0 NodeRole: mix *************************** 2. row *************************** BackendId: 24576 Host: 10.67.38.171 HeartbeatPort: 9050 BePort: 9060 HttpPort: 8040 BrpcPort: 8060 LastStartTime: 2023-11-28 14:48:05 LastHeartbeat: 2023-11-28 14:58:57 Alive: true SystemDecommissioned: false TabletNum: 9 DataUsedCapacity: 0.000 TrashUsedCapcacity: 0.000 AvailCapacity: 105.623 GB TotalCapacity: 195.862 GB UsedPct: 46.07 % MaxDiskUsedPct: 46.07 % RemoteUsedCapacity: 0.000 Tag: {"location" : "default"} ErrMsg: Version: doris-2.0.2-rc05-ae923f7 Status: {"lastSuccessReportTabletsTime":"2023-11-28 14:58:10","lastStreamLoadTime":-1,"isQueryDisabled":false,"isLoadDisabled":false} HeartbeatFailureCounter: 0 NodeRole: mix *************************** 3. row *************************** BackendId: 24577 Host: 10.67.38.173 HeartbeatPort: 9050 BePort: 9060 HttpPort: 8040 BrpcPort: 8060 LastStartTime: 2023-11-28 14:53:32 LastHeartbeat: 2023-11-28 14:58:57 Alive: true SystemDecommissioned: false TabletNum: 12 DataUsedCapacity: 0.000 TrashUsedCapcacity: 0.000 AvailCapacity: 108.372 GB TotalCapacity: 195.862 GB UsedPct: 44.67 % MaxDiskUsedPct: 44.67 % RemoteUsedCapacity: 0.000 Tag: {"location" : "default"} ErrMsg: Version: doris-2.0.2-rc05-ae923f7 Status: {"lastSuccessReportTabletsTime":"2023-11-28 14:58:14","lastStreamLoadTime":-1,"isQueryDisabled":false,"isLoadDisabled":false} HeartbeatFailureCounter: 0 NodeRole: mix 3 rows in set (0.00 sec)

okay, 至此我们完成了向FE节点中加入3个BE的节点。

我们下一步,将FE的节点从单节点扩充到3节点。

我们需要先启动另外2点节点上的FE服务:

./bin/start_fe.sh --daemon

把启动后的2个FE的节点,加入到集群中:

mysql> ALTER SYSTEM ADD FOLLOWER "10.67.38.171:9010"; Query OK, 0 rows affected (0.01 sec) mysql> ALTER SYSTEM ADD FOLLOWER "10.67.38.173:9010"; Query OK, 0 rows affected (0.02 sec)

查看FE的节点状态:

mysql> show frontends\G; *************************** 1. row *************************** Name: fe_670b0313_19e0_4a1c_a7cc_157b18443567 Host: 10.67.38.173 EditLogPort: 9010 HttpPort: 8030 QueryPort: 0 RpcPort: 0 Role: FOLLOWER IsMaster: false ClusterId: 688693497 Join: false Alive: false ReplayedJournalId: 0 LastHeartbeat: NULL IsHelper: true ErrMsg: Socket is closed by peer. Version: NULL CurrentConnected: No *************************** 2. row *************************** Name: fe_ee06b840_41cf_43fd_819c_333b08e26f98 Host: 10.67.38.170 EditLogPort: 9010 HttpPort: 8030 QueryPort: 9030 RpcPort: 9020 Role: FOLLOWER IsMaster: true ClusterId: 688693497 Join: true Alive: true ReplayedJournalId: 23106 LastHeartbeat: 2023-11-28 15:23:43 IsHelper: true ErrMsg: Version: doris-2.0.2-rc05-ae923f7 CurrentConnected: Yes *************************** 3. row *************************** Name: fe_3504c3ca_2d40_421d_be2e_4a9e8e10b18a Host: 10.67.38.171 EditLogPort: 9010 HttpPort: 8030 QueryPort: 0 RpcPort: 0 Role: FOLLOWER IsMaster: false ClusterId: 688693497 Join: false Alive: false ReplayedJournalId: 0 LastHeartbeat: NULL IsHelper: true ErrMsg: Socket is closed by peer. Version: NULL CurrentConnected: No 3 rows in set (0.01 sec)

下面我们参照官网,导入一些数据进行测试:

mysql> create database demo; Query OK, 0 rows affected (0.01 sec) mysql> use demo; Database changed mysql> CREATE TABLE IF NOT EXISTS demo.example_tbl -> ( -> `user_id` LARGEINT NOT NULL COMMENT "用户id", -> `date` DATE NOT NULL COMMENT "数据灌入日期时间", -> `city` VARCHAR(20) COMMENT "用户所在城市", -> `age` SMALLINT COMMENT "用户年龄", -> `sex` TINYINT COMMENT "用户性别", -> `last_visit_date` DATETIME REPLACE DEFAULT "1970-01-01 00:00:00" COMMENT "用户最后一次访问时间", -> `cost` BIGINT SUM DEFAULT "0" COMMENT "用户总消费", -> `max_dwell_time` INT MAX DEFAULT "0" COMMENT "用户最大停留时间", -> `min_dwell_time` INT MIN DEFAULT "99999" COMMENT "用户最小停留时间" -> ) -> AGGREGATE KEY(`user_id`, `date`, `city`, `age`, `sex`) -> DISTRIBUTED BY HASH(`user_id`) BUCKETS 1 -> PROPERTIES ( -> "replication_allocation" = "tag.location.default: 1" -> ); Query OK, 0 rows affected (0.01 sec)

导入测试数据CSV格式:

10000,2017-10-01,北京,20,0,2017-10-01 06:00:00,20,10,10 10000,2017-10-01,北京,20,0,2017-10-01 07:00:00,15,2,2 10001,2017-10-01,北京,30,1,2017-10-01 17:05:45,2,22,22 10002,2017-10-02,上海,20,1,2017-10-02 12:59:12,200,5,5 10003,2017-10-02,广州,32,0,2017-10-02 11:20:00,30,11,11 10004,2017-10-01,深圳,35,0,2017-10-01 10:00:15,100,3,3 10004,2017-10-03,深圳,35,0,2017-10-03 10:20:22,11,6,6

命令行导入数据:返回插入7条数据

curl --location-trusted -u root: -T test.csv -H "column_separator:," http://127.0.0.1:8030/api/demo/example_tbl/_stream_load INFRA [doris@ljzdccapp006 ~]# curl --location-trusted -u root: -T test.csv -H "column_separator:," http://127.0.0.1:8030/api/demo/example_tbl/_stream_load { "TxnId": 44, "Label": "40488bbc-6302-4696-839e-2f2f5d54a366", "Comment": "", "TwoPhaseCommit": "false", "Status": "Success", "Message": "OK", "NumberTotalRows": 7, "NumberLoadedRows": 7, "NumberFilteredRows": 0, "NumberUnselectedRows": 0, "LoadBytes": 399, "LoadTimeMs": 232, "BeginTxnTimeMs": 18, "StreamLoadPutTimeMs": 161, "ReadDataTimeMs": 0, "WriteDataTimeMs": 11, "CommitAndPublishTimeMs": 40 }

查询数据:

你可能会好奇明明日志里显示导入"NumberLoadedRows": 7, 为什么查出来只有6条?

答案: 这个表 example_tbl 定义的 AGGREGATE KEY 是 (user_id, date, city, age, sex)

下面2条记录的 AGGREGATE KEY 都是一样的,所以2条记录进行了聚合的操作 cost = sum(20+15) = 35, max_dwell_time = max(10,2),min_dwell_time = min(10,2) 合并成了一条。

10000,2017-10-01,北京,20,0,2017-10-01 06:00:00,20,10,10
10000,2017-10-01,北京,20,0,2017-10-01 07:00:00,15,2,2

mysql> select * from example_tbl; +---------+------------+--------+------+------+---------------------+------+----------------+----------------+ | user_id | date | city | age | sex | last_visit_date | cost | max_dwell_time | min_dwell_time | +---------+------------+--------+------+------+---------------------+------+----------------+----------------+ | 10000 | 2017-10-01 | 北京 | 20 | 0 | 2017-10-01 07:00:00 | 35 | 10 | 2 | | 10001 | 2017-10-01 | 北京 | 30 | 1 | 2017-10-01 17:05:45 | 2 | 22 | 22 | | 10002 | 2017-10-02 | 上海 | 20 | 1 | 2017-10-02 12:59:12 | 200 | 5 | 5 | | 10003 | 2017-10-02 | 广州 | 32 | 0 | 2017-10-02 11:20:00 | 30 | 11 | 11 | | 10004 | 2017-10-01 | 深圳 | 35 | 0 | 2017-10-01 10:00:15 | 100 | 3 | 3 | | 10004 | 2017-10-03 | 深圳 | 35 | 0 | 2017-10-03 10:20:22 | 11 | 6 | 6 | +---------+------------+--------+------+------+---------------------+------+----------------+----------------+ 6 rows in set (0.17 sec) mysql> select * from example_tbl where city='上海'; +---------+------------+--------+------+------+---------------------+------+----------------+----------------+ | user_id | date | city | age | sex | last_visit_date | cost | max_dwell_time | min_dwell_time | +---------+------------+--------+------+------+---------------------+------+----------------+----------------+ | 10002 | 2017-10-02 | 上海 | 20 | 1 | 2017-10-02 12:59:12 | 200 | 5 | 5 | +---------+------------+--------+------+------+---------------------+------+----------------+----------------+ 1 row in set (0.08 sec) mysql> select city, sum(cost) as total_cost from example_tbl group by city; +--------+------------+ | city | total_cost | +--------+------------+ | 北京 | 37 | | 上海 | 200 | | 广州 | 30 | | 深圳 | 111 | +--------+------------+ 4 rows in set (0.11 sec)

本文是入门文章,属于你的第一篇doris 文档, 后续会带来 doris + flinkCDC + mysql & PG的数据集成篇。

Have a fun 🙂 !

「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论