helm的使用和redis、mq等公共组件的容器化实践

云时代IT运维 2019-05-02

803

Helm的使用

1. Helm简介:

1. Helm的概念

Helm是Kubernetes的一个包管理工具，用来简化Kubernetes应用的部署和管理。可以把Helm比作CentOS的yum工具.
Helm有如下几个基本概念：

Chart: 是Helm管理的安装包，里面包含需要部署的安装包资源。可以把Chart比作CentOS yum使用的rpm文件。每个Chart包含下面两部分：
1. 包的基本描述文件Chart.yaml
2. 放在templates目录中的一个或多个Kubernetes manifest文件模板
Release：是chart的部署实例，一个chart在一个Kubernetes集群上可以有多个release，即这个chart可以被安装多次
Repository：chart的仓库，用于发布和存储chart

使用Helm可以完成以下事情：

管理Kubernetes manifest files
管理Helm安装包charts
基于chart的Kubernetes应用分发

2. Helm的组成

Helm由两部分组成，客户端helm和服务端tiller。

tiller运行在Kubernetes集群上，管理chart安装的release
helm是一个命令行工具，可在本地运行，一般运行在CI/CD Server上。

3. Helm的安装

下载Helm二进制文件：https://github.com/helm/helm/releases/tag/v2.13.1

[root@test111 software]# tar xzvf helm-v2.13.1-linux-amd64.tar.gz
[root@test111 software]# cd linux-amd64/
[root@test111 linux-amd64]# cp -r helm /usr/local/bin/
[root@test111 linux-amd64]# helm version
Client: &version.Version{SemVer:"v2.13.1", GitCommit:"618447cbf203d147601b4b9bd7f8c37a5d39fbb4", GitTreeState:"clean"}
Error: Get http://localhost:8080/api/v1/namespaces/kube-system/pods?labelSelector=app%3Dhelm%2Cname%3Dtiller: dial tcp [::1]:8080: connect: connection refused
[root@test111 linux-amd64]#

此时运行helm version可以打印出客户端helm的版本，同时会提示无法连接到服务端Tiller。
为了安装服务端tiller，还需要在这台机器上配置好kubectl工具和kubeconfig文件，确保kubectl工具可以在这台机器上访问apiserver且正常使用。
从kubemaster的服务器上面拷贝两个文件到安装Helm客户端的机器，分别拷贝kubectl命令和/root/.kube/config文件

[root@kubemaster ~]# which kubectl
/usr/bin/kubectl
[root@kubemaster ~]# scp -r /usr/bin/kubectl root@test111:/usr/bin/
kubectl                                                                                                                                                      100%   37MB  12.5MB/s   00:03
[root@kubemaster ~]# scp -r /root/.kube/config root@test111:/root/.kube/
config                                                                                                                                                       100% 5448   328.6KB/s   00:00
[root@kubemaster ~]#

然后在安装Helm客户端的任何一台机器上面，输入命令检查是否可以连接集群

[root@test111 linux-amd64]# kubectl get cs
NAME                 STATUS    MESSAGE              ERROR
scheduler            Healthy   ok
controller-manager   Healthy   ok
etcd-0               Healthy   {"health": "true"}

接下来是安装服务端Tiller

# 配置 service account tiller 和 ClusterRoleBinding tiller
vim sa.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
  name: tiller
  namespace: kube-system
vim ClusterRoleBinding.yaml
  apiVersion: rbac.authorization.k8s.io/v1beta1
  kind: ClusterRoleBinding
  metadata:
    name: tiller
  roleRef:
    apiGroup: rbac.authorization.k8s.io
    kind: ClusterRole
    name: cluster-admin
  subjects:
    - kind: ServiceAccount
      name: tiller
      namespace: kube-system
kubectl apply -f  sa.yaml
kubectl apply -f  ClusterRoleBinding.yaml

初始化tiller服务端

helm init --service-account tiller --upgrade --tiller-image=registry.cn-hangzhou.aliyuncs.com/google_containers/tiller:v2.13.1
#注意这里的tiller服务器镜像版本，一定要和helm client的版本一致。这里的版本都为最新的版本  v2.13.1
[root@test111 ~]# helm version
Client: &version.Version{SemVer:"v2.13.1", GitCommit:"618447cbf203d147601b4b9bd7f8c37a5d39fbb4", GitTreeState:"clean"}
Server: &version.Version{SemVer:"v2.13.1", GitCommit:"a80231648a1473929271764b920a8e346f6de844", GitTreeState:"clean"}

设置charts仓库为微软azure仓库，这个仓库同步的是最新的helm包，阿里云仓库的helm包好久不更新了

[root@test111 ~]# cat /root/.helm/repository/repositories.yaml
apiVersion: v1
generated: 2018-11-10T20:32:27.187074383+08:00
repositories:
- caFile: ""
  cache: /root/.helm/repository/cache/stable-index.yaml
  certFile: ""
  keyFile: ""
  name: stable
  password: "password"
  url: http://mirror.azure.cn/kubernetes/charts/
  username: "username"
- caFile: ""
  cache: /root/.helm/repository/cache/local-index.yaml
  certFile: ""
  keyFile: ""
  name: local
  password: ""
  url: http://127.0.0.1:8879/charts
  username: ""
[root@test111 ~]#
#更新chart repo:
helm repo update
Hang tight while we grab the latest from your chart repositories...
...Skip local chart repository
...Successfully got an update from the "stable" chart repository
Update Complete. ⎈ Happy Helming!⎈

4. 管理chart仓库

[root@test111 software]# helm repo list
NAME      URL
stable    https://kubernetes.oss-cn-hangzhou.aliyuncs.com/charts
local     http://127.0.0.1:8879/charts
[root@test111 software]#
#查看目前的chart仓库
[root@test111 software]# helm repo remove stable
"stable" has been removed from your repositories
#删除目前的仓库
[root@test111 software]# helm repo add stable https://cnych.github.io/kube-charts-mirror/
"stable" has been added to your repositories
#添加新的远程仓库
[root@test111 software]# helm repo list
NAME      URL
local     http://127.0.0.1:8879/charts
stable    https://cnych.github.io/kube-charts-mirror/
#查看目前的chart仓库
[root@test111 software]# helm repo update
Hang tight while we grab the latest from your chart repositories...
...Skip local chart repository
...Successfully got an update from the "stable" chart repository
Update Complete. ⎈ Happy Helming!⎈
[root@test111 software]#
#更新repo

5. Helm安装hello-svc应用

使用helm create chart名称安装一个应用。同时会下载hello-svc项目的目录文件到本地

helm create hello-svc
[root@test111 ~]# tree hello-svc/
hello-svc/
├── charts
├── Chart.yaml
├── templates
│   ├── deployment.yaml
│   ├── _helpers.tpl
│   ├── ingress.yaml
│   ├── NOTES.txt
│   ├── service.yaml
│   └── tests
│       └── test-connection.yaml
└── values.yaml

我们来熟悉一下hello-svc这个chart中目录和文件内容：

charts目录中是本chart依赖的chart，当前是空的
Chart.yaml这个yaml文件用于描述Chart的基本信息，如名称版本等
templates目录是Kubernetes manifest文件模板目录，模板使用chart配置的值生成Kubernetes manifest文件。模板文件使用的Go语言模板语法
templates/NOTES.txt 纯文本文件，可在其中填写chart的使用说明
value.yaml 是chart配置的默认值
通过查看deployment.yaml和value.yaml，大概知道这个chart默认安装就是在Kubernetes部署一个nginx服务。

cd /root/hello-svc
helm install --dry-run --debug ./  #对chart的模板和配置进行测试
helm install ./  #在Kubernetes上安装chart
helm list    #查看release
[root@test111 hello-svc]# helm list
NAME                REVISION    UPDATED                     STATUS      CHART                  APP VERSION    NAMESPACE
silly-gecko         1           Wed Mar 27 11:25:11 2019    DEPLOYED    hello-svc-0.1.0        1.0            default
stultified-prawn    1           Mon Mar  4 15:21:21 2019    DEPLOYED    nginx-ingress-0.9.5    0.10.2         default
[root@test111 hello-svc]#

6. 使用Helm安装Redis-HA应用

在我以前的博文中介绍过在传统虚拟机上面实现Redis的安装步骤。本文重点介绍在kubernetes容器集群中如何部署Redis应用。在部署之前首先介绍一下Redis集群的两种常用方案：

1. redis-HA方案：

Redis-HA方案应该叫做主从复制方案，不能称为集群方案。主从复制方案的主要原理是，如果主节点宕机，从节点作为主节点的备份会随时顶替主节点工作，也可以将读请求分配到从节点，类似于Mysql的读写分离案节例一样。但是如果主节宕机，从节升级为主节点，但是应用程序连接的redis服务器IP仍然为主节点IP，这个时候还需要人工
干预应用程序的连结Redis服务器的IP，所以需要把这个IP配置成VIP地址，实现自动切换。

Redis Sentinel是一个分布式架构，包含若干个Sentinel节点和Redis数据节点，每个Sentinel节点会对数据节点和其余Sentinel节点进行监控，当发现节点不可达时，会对节点做下线标识。
如果被标识的是主节点，他还会选择和其他Sentinel节点进行“协商”，当大多数的Sentinel节点都认为主节点不可达时，他们会选举出一个Sentinel节点来完成自动故障转移工作，同时将这个变化通知给Redis应用方。

每个Sentinel节点都要定期PING命令来判断Redis数据节点和其余Sentinel节点是否可达，如果超过30000毫秒且没有回复，则判定不可达。当Sentinel节点集合对主节点故障判定达成一致时，Sentinel领导者节点会做故障转移操作，选出新的主节点，原来的从节点会向新的主节点发起复制操作。

2. redis-Cluster方案：

Redis官方在3.0版本之后才开始支持集群方案。Redis 集群是一个可以在多个 Redis 节点之间进行数据共享的设施（installation）。Redis 集群通过分区（partition）来提供一定程度的可用性（availability）：即使集群中有一部分节点失效或者无法进行通讯，集群也可以继续处理命令请求。
Redis集群提供了两个好处，第一是将数据自动切分（split)到多个节点，第二是当集群中有一部分节点失效或者无法进行通讯时，仍然可以继续处理命令请求；
Redis 集群使用数据分片（sharding）而非一致性哈希（consistency hashing）来实现。一个 Redis 集群包含 16384 个哈希槽（hash slot），数据库中的每个键都属于这 16384 个哈希槽的其中一个，集群使用公式 CRC16(key) % 16384 来计算键 key 属于哪个槽，其中 CRC16(key) 语句用于计算键 key 的 CRC16 校验和。
集群中的每个节点负责处理一部分哈希槽。举个例子，一个集群可以有三个哈希槽，其中：

节点 A 负责处理 0 号至 5500 号哈希槽
节点 B 负责处理 5501 号至 11000 号哈希槽
节点 C 负责处理 11001 号至 16384 号哈希槽。

这种将哈希槽分布到不同节点的做法使得用户可以很容易地向集群中添加或者删除节点。比如说：

如果用户将新节点 D 添加到集群中，那么集群只需要将节点 A 、B 、 C 中的某些槽移动到节点 D 就可以了。
与此类似，如果用户要从集群中移除节点 A ，那么集群只需要将节点 A 中的所有哈希槽移动到节点 B 和节点 C ，然后再移除空白（不包含任何哈希槽）的节点 A 就可以了。

为了使得集群在一部分节点下线或者无法与集群的大多数（majority）节点进行通讯的情况下，仍然可以正常运作， Redis 集群对节点使用了主从复制功能：集群中的每个节点都有 1 个至 N 个复制品（replica），其中一个复制品为主节点（master），而其余的 N-1 个复制品为从节点（slave）。

在之前列举的节点 A 、B 、C 的例子中，如果节点 B 下线了，那么集群将无法正常运行，因为集群找不到节点来处理 5501 号至 11000号的哈希槽。
另一方面，假如在创建集群的时候（或者至少在节点 B 下线之前），我们为主节点 B 添加了从节点 B1 ，那么当主节点 B 下线的时候，集群就会将 B1 设置为新的主节点，并让它代替下线的主节点 B ，继续处理 5501 号至 11000 号的哈希槽，这样集群就不会因为主节点 B 的下线而无法正常运作了。

对于Redis集群方案，国内的豌豆荚公司开发的一个分布式的基于代理的高性能Redis 集群解决方案，名字叫做Codis.（国内还有类似于Twemproxy的软件也能实现同样的功能）用Go语言开发的。对于上层的应用来说，连接到 Codis Proxy 和连接原生的 Redis Server 没有明显的区别（不支持的命令列表），Codis 底层会处理请求的转发，不停机的数据迁移等工作。所有后边的一切事情，对于前面的客户端来说是透明的，可以简单的认为后边连接的是一个内存无限大的 Redis 服务。具体的方案可以参考https://github.com/CodisLabs/codis

Codis 由四部分组成:

Codis Proxy (codis-proxy)，处理客户端请求，支持Redis协议，因此客户端访问Codis Proxy跟访问原生Redis没有什么区别；
Codis Dashboard (codis-config)，Codis 的管理工具，支持添加/删除 Redis 节点、添加/删除 Proxy 节点，发起数据迁移等操作。codis-config 本身还自带了一个 http server，会启动一个 dashboard，用户可以直接在浏览器上观察 Codis 集群的运行状态；
Codis Redis (codis-server)，Codis 项目维护的一个 Redis 分支，基于 2.8.21 开发，加入了 slot 的支持和原子的数据迁移指令；
ZooKeeper/Etcd，Codis 依赖 ZooKeeper 来存放数据路由表和 codis-proxy 节点的元信息，codis-config 发起的命令都会通过 ZooKeeper 同步到各个存活的 codis-proxy

3. 使用Helm安装redis-sentinel主从：

所有的官方Helm包都可以在https://github.com/helm/charts/tree/master/stable/ 网址下载。包括我本次实验的Redis-HA，Mysql、RabbitMQ、Jenkins等。

配置好helm仓库之后，最好微软的那个helm仓库，是同步更新github。然后可以使用命令:

[root@master-01 helm_jenkins]#  helm repo list
NAME        URL
local       http://127.0.0.1:8879/charts
stable      http://mirror.azure.cn/kubernetes/charts/
kiwigrid    https://kiwigrid.github.io
[root@master-01 helm_jenkins]# helm search redis
NAME                                CHART VERSION    APP VERSION    DESCRIPTION
stable/prometheus-redis-exporter    1.0.2            0.28.0         Prometheus exporter for Redis metrics
stable/redis                        7.0.0            4.0.14         Open source, advanced key-value store. It is often referr...
stable/redis-ha                     3.4.0            5.0.3          Highly available Kubernetes implementation of Redis
stable/sensu                        0.2.3            0.28           Sensu monitoring framework backed by the Redis transport
[root@master-01 helm_jenkins]#

然后下载redis-ha的helm包到本地.先创建一个本地的目录，然后直接把helm包下载到本地的目录

[root@master-01 tmp]# mkdir -p redis-ha
[root@master-01 tmp]# helm fetch stable/redis-ha --untar --untardir ./redis-ha/

第二步就是修改value.yaml文件。这个文件的修改方法一般是参考官方的helm说明。https://github.com/helm/charts/tree/master/stable/redis-ha 这个文件的修改方法一般是参考官方的helm说明。官网一般都会详细描述每一个key的用途，value值的默认参数等。

## Configure resource requests and limits
## ref: http://kubernetes.io/docs/user-guide/compute-resources/
##
image:
  repository: k8s.harbor.test.site/system/redis
  tag: 5.0.3-alpine
  pullPolicy: IfNotPresent
## replicas number for each component
replicas: 3
## Custom labels for the redis pod
labels: {}
## Pods Service Account
## ref: https://kubernetes.io/docs/tasks/configure-pod-container/configure-service-account/
serviceAccount:
  ## Specifies whether a ServiceAccount should be created
  ##
  create: true
  ## The name of the ServiceAccount to use.
  ## If not set and create is true, a name is generated using the redis-ha.fullname template
  # name:
## Role Based Access
## Ref: https://kubernetes.io/docs/admin/authorization/rbac/
##
rbac:
  create: true
sysctlImage:
  enabled: false
  command: []
  registry: docker.io
  repository: k8s.harbor.test.site/system/minideb:latest
  tag: latest
  pullPolicy: Always
  mountHostSys: false
## Redis specific configuration options
redis:
  port: 6379
  masterGroupName: mymaster
  config:
    ## Additional redis conf options can be added below
    ## For all available options see http://download.redis.io/redis-stable/redis.conf
    min-slaves-to-write: 1
    min-slaves-max-lag: 5   # Value in seconds
    maxmemory: "0"       # Max memory to use for each redis instance. Default is unlimited.
    maxmemory-policy: "volatile-lru"  # Max memory policy to use for each redis instance. Default is volatile-lru.
    # Determines if scheduled RDB backups are created. Default is false.
    # Please note that local (on-disk) RDBs will still be created when re-syncing with a new slave. The only way to prevent this is to enable diskless replication.
    save: "900 1"
    # When enabled, directly sends the RDB over the wire to slaves, without using the disk as intermediate storage. Default is false.
    repl-diskless-sync: "yes"
    rdbcompression: "yes"
    rdbchecksum: "yes"
  ## Custom redis.conf files used to override default settings. If this file is
  ## specified then the redis.config above will be ignored.
  # customConfig: |-
      # Define configuration here
  resources: {}
  #  requests:
  #    memory: 200Mi
  #    cpu: 100m
  #  limits:
  #    memory: 700Mi
## Sentinel specific configuration options
sentinel:
  port: 26379
  quorum: 2
  config:
    ## Additional sentinel conf options can be added below. Only options that
    ## are expressed in the format simialar to 'sentinel xxx mymaster xxx' will
    ## be properly templated.
    ## For available options see http://download.redis.io/redis-stable/sentinel.conf
    down-after-milliseconds: 10000
    ## Failover timeout value in milliseconds
    failover-timeout: 180000
    parallel-syncs: 5
  ## Custom sentinel.conf files used to override default settings. If this file is
  ## specified then the sentinel.config above will be ignored.
  # customConfig: |-
      # Define configuration here
  resources: {}
  #  requests:
  #    memory: 200Mi
  #    cpu: 100m
  #  limits:
  #    memory: 200Mi
securityContext:
  runAsUser: 1000
  fsGroup: 1000
  runAsNonRoot: true
## Node labels, affinity, and tolerations for pod assignment
## ref: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#nodeselector
## ref: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#taints-and-tolerations-beta-feature
## ref: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity
affinity: |
  podAntiAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchLabels:
            app: {{ template "redis-ha.name" . }}
            release: {{ .Release.Name }}
        topologyKey: kubernetes.io/hostname
    preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 100
        podAffinityTerm:
          labelSelector:
            matchLabels:
              app:  {{ template "redis-ha.name" . }}
              release: {{ .Release.Name }}
          topologyKey: failure-domain.beta.kubernetes.io/zone
# Prometheus exporter specific configuration options
exporter:
  enabled: true
  image: k8s.harbor.test.site/system/redis_exporter
  tag: v0.31.0
  pullPolicy: IfNotPresent
  # prometheus port & scrape path
  port: 9121
  scrapePath: /metrics
  # cpu/memory resource limits/requests
  resources: {}
  # Additional args for redis exporter
  extraArgs: {}
podDisruptionBudget: {}
  # maxUnavailable: 1
  # minAvailable: 1
## Configures redis with AUTH (requirepass & masterauth conf params)
auth: true
redisPassword: aaaaaa
## Use existing secret containing "auth" key (ignores redisPassword)
# existingSecret:
persistentVolume:
  enabled: true
  ## redis-ha data Persistent Volume Storage Class
  ## If defined, storageClassName: <storageClass>
  ## If set to "-", storageClassName: "", which disables dynamic provisioning
  ## If undefined (the default) or set to null, no storageClassName spec is
  ##   set, choosing the default provisioner.  (gp2 on AWS, standard on
  ##   GKE, AWS & OpenStack)
  ##
  storageClass: "dynamic"
  accessModes:
    - ReadWriteOnce
  size: 10Gi
  annotations: {}
init:
  resources: {}
# To use a hostPath for data, set persistentVolume.enabled to false
# and define hostPath.path.
# Warning: this might overwrite existing folders on the host system!
hostPath:
  ## path is evaluated as template so placeholders are replaced
  # path: "/data/{{ .Release.Name }}"
  # if chown is true, an init-container with root permissions is launched to
  # change the owner of the hostPath folder to the user defined in the
  # security context
  chown: true

在修改value.yaml之前，建议先看一下官方的说明。并且使用helm安装redis-ha等由状态应用之前，需要熟悉redis-ha有状态应用的原理，
并且最好是自己搭建过环境，理解里面的配置。

我一般的个人习惯是，先把value.yaml里面用到的镜像下载到本地然后上传到公司的harbor私有仓库，然后把value.yaml里面的镜像地址都换成
公司私有仓库的镜像地址，这样部署起来会比较快。因为redis-ha应用需要用到持久化存储，这个reids-ha应用主要是kubernetes里面的statefulset资源类型，
也就是每一个Pod重启之后都需要有固定的名称，一般为pod名称+无头服务的名称+namespace+svc.cluster.local. 并且每一个pod重新启动之后需要找到原来的
pvc.

所以我们还要存储这块的配置，增加我们以前配置的storageClassName的名字为dynamic。这个是我们这个环境的ceph RBD存储的sc的名字。这个heml包里面还自带安装了redis-exporter。专门用于将redis的metrics监控数据上传到prometheus的用途；

配置文件中这一段配置podAntiAffinity，主要的用途是调度反亲和性，属于运行时调度策略.podAntiAffinity主要解决POD不能和哪些POD部署在同一个拓扑域中的问题。它们处理的是Kubernetes集群内部POD和POD之间的关系。目的就是让redis Master不要和redis Slave两个pod不要启动在一个物理节点上面；

同时还定义了redis的配置文件，sentinel的配置文件。指定了redis的副本数为3,指定了连接redis的密码。

关于测试redis-ha配置是否正常，可以通过命令helm status redis-ha来查看

[root@master-01 redis-ha]# helm status my-redis
LAST DEPLOYED: Tue Apr 23 15:57:27 2019
NAMESPACE: kube-system
STATUS: DEPLOYED
RESOURCES:
==> v1/ServiceAccount
NAME               SECRETS  AGE
my-redis-redis-ha  1        8d
==> v1/Role
NAME               AGE
my-redis-redis-ha  8d
==> v1/RoleBinding
NAME               AGE
my-redis-redis-ha  8d
==> v1/Service
NAME                          TYPE       CLUSTER-IP      EXTERNAL-IP  PORT(S)                      AGE
my-redis-redis-ha-announce-2  ClusterIP  10.200.111.254  <none>       6379/TCP,26379/TCP           8d
my-redis-redis-ha-announce-0  ClusterIP  10.200.192.130  <none>       6379/TCP,26379/TCP           8d
my-redis-redis-ha-announce-1  ClusterIP  10.200.55.75    <none>       6379/TCP,26379/TCP           8d
my-redis-redis-ha             ClusterIP  None            <none>       6379/TCP,26379/TCP,9121/TCP  8d
==> v1/StatefulSet
NAME                      DESIRED  CURRENT  AGE
my-redis-redis-ha-server  3        3        8d
==> v1/Pod(related)
NAME                        READY  STATUS   RESTARTS  AGE
my-redis-redis-ha-server-0  3/3    Running  3         8d
my-redis-redis-ha-server-1  3/3    Running  4         5d18h
my-redis-redis-ha-server-2  3/3    Running  0         2d19h
==> v1/Secret
NAME               TYPE    DATA  AGE
my-redis-redis-ha  Opaque  1     8d
==> v1/ConfigMap
NAME                         DATA  AGE
my-redis-redis-ha-configmap  3     8d
my-redis-redis-ha-probes     2     8d
NOTES:
Redis can be accessed via port 6379 and Sentinel can be accessed via port 26379 on the following DNS name from within your cluster:
my-redis-redis-ha.kube-system.svc.cluster.local
To connect to your Redis server:
1. To retrieve the redis password:
   echo $(kubectl get secret my-redis-redis-ha -o "jsonpath={.data['auth']}" | base64 --decode)
2. Connect to the Redis master pod that you can use as a client. By default the my-redis-redis-ha-server-0 pod is configured as the master:
   kubectl exec -it my-redis-redis-ha-server-0 sh -n kube-system
3. Connect using the Redis CLI (inside container):
   redis-cli -a <REDIS-PASS-FROM-SECRET>

如果其他的应用需要连接这个redis-ha，就需要使用命令 redis-cli -h my-redis-redis-ha -p 6379 -a aaaaaa 使用这个命令只有在k8s集群内部的应用可以。如果是外部的应用，就需要把redis-ha的端口通过NodePort的类型暴露出去才可以；

4. 使用Helm安装RabbitMQ集群：

前面的博文中我也已经介绍过如何在虚拟机里面安装RabbitMQ集群。通过练习大家应该知道了RabbitMQ集群的一些基础概念和配置。包括生产者、消费者、Exchange、Queue、BinDings、Connection、Channel、交换机的类型单播和广播、消息的持久化定义等。

最简单的RabbitMQ集群需要有3个节点，两个内存节点，一个磁盘节点。顾名思义内存节点就是将所有数据放在内存，磁盘节点将数据放在磁盘。不过，如果在投递消息时，打开了消息的持久化，那么即使是内存节点，数据还是安全的放在磁盘。

由于RabbitMQ是用erlang开发的，RabbitMQ完全依赖Erlang的Cluster，因为erlang天生就是一门分布式语言,集群非常方便，但其本身并不支持负载均衡。Erlang的集群中各节点是经由过程一个magic cookie来实现的，这个cookie存放在 $home/.erlang.cookie 中(像我的root用户安装的就是放在我的root/.erlang.cookie中)，文件是400的权限。所以必须包管各节点cookie对峙一致，不然节点之间就无法通信。

Rabbitmq集群大概分为二种方式：

普通模式：默认的集群模式。
镜像模式：把需要的队列做成镜像队列。（我们一般都采用此模式）
```
rabbitmqctl set_policy ha-all "^" '{"ha-mode":"all"}'
```
这行命令创建了一个策略，策略名称为ha-all,策略模式为 all 即复制到所有节点，包含新增节点，策略正则表达式为 “^” 表示所有匹配所有队列名称。

良好的设计架构可以如下：在一个集群里，有3台以上机器，其中1台使用磁盘模式，其它使用内存模式。其它几台为内存模式的节点，无疑速度更快，因此客户端（consumer、producer）连接访问它们。而磁盘模式的节点，由于磁盘IO相对较慢，因此仅作数据备份使用。

和前面的redis-ha的helm安装一样，先是找到官方网站 https://github.com/helm/charts/tree/master/stable/rabbitmq-ha 仔细阅读以下官方的说明，官方文档里面有针对配置value.yaml的详细说明，并且指导你可以安装chart，卸载chart等；显示搜索rabbitmq，然后下载rabbitmq.这里就不在介绍了。主要有个问题说明以下，rabbitmq有几个版本的chart,有个rabbitmq的chart,我始终都没有研究成功。每次配置好之只要已启动就会报错

NAME                                   CHART VERSION    APP VERSION    DESCRIPTION
stable/prometheus-rabbitmq-exporter    0.4.0            v0.29.0        Rabbitmq metrics exporter for prometheus
stable/rabbitmq                        5.5.0            3.7.14         Open source message broker software that implements the A...
stable/rabbitmq-ha                     1.25.0           3.7.12         Highly available RabbitMQ cluster, the open source messag...
[root@master-01 redis-ha]#

我刚开始的时候一直使用的是stable/rabbitmq这个chart,然后研究了很久都不成功，安装helm包之后启动应用总是报错.

kubectl logs -f -n kube-system my-rabbitmq-0 rabbitmq
Error from server (BadRequest): a container name must be specified for pod my-rabbitmq-0, choose one of: [rabbitmq metrics]

后面找了很久也没有找到问题原因，网上都是说因为找不到正确的节点名字，所以集群无法创建成功。因为在k8s的rabbitmq方案中，rabbitmq有一个专门的k8s插件，这个插件可以实现通过k8s集群的api接口通过pod名字自动发现和创建集群。最终采用了另外一个chart stable/rabbitmq-ha就好了。如果有时间我还要再深究一下两个之间的区别。我的value.yaml文件配置如下：

## RabbitMQ application credentials
## Ref: http://rabbitmq.com/access-control.html
##
rabbitmqUsername: guest
rabbitmqPassword: guest123
## RabbitMQ Management user used for health checks
managementUsername: management
managementPassword: 123456
## Place any additional key/value configuration to add to rabbitmq.conf
## Ref: https://www.rabbitmq.com/configure.html#config-items
extraConfig: |
#  queue_master_locator = min-masters
## Place advanced.config file in /etc/rabbitmq/advanced.config
## Ref: https://www.rabbitmq.com/configure.html#advanced-config-file
advancedConfig: |
## Definitions specification within the secret, will always be mounted
## at /etc/definitions/defintions.json
definitionsSource: definitions.json
## Place any additional plugins to enable in /etc/rabbitmq/enabled_plugins
## Ref: https://www.rabbitmq.com/plugins.html
extraPlugins: |
  rabbitmq_shovel,
  rabbitmq_shovel_management,
  rabbitmq_federation,
  rabbitmq_federation_management,
definitions:
  users: |-
   {
     "name": "test_producer01",
     "password": "111111",
     "tags": "administrator"
   }
  vhosts: |-
   {
     "name": "/rabbit"
   }
  parameters: |-
   {
     "value": {
       "src-uri": "amqp://localhost",
       "src-queue": "source",
       "dest-uri": "amqp://localhost",
       "dest-queue": "destination",
       "add-forward-headers": false,
       "ack-mode": "on-confirm",
       "delete-after": "never"
     },
     "vhost": "/",
     "component": "shovel",
     "name": "test"
   }
  permissions: |-
   {
     "user": "test_producer01",
     "vhost": "/rabbit",
     "configure": ".*",
     "write": ".*",
     "read": ".*"
   }
  queues: |-
    {
       "name":"myName",
       "vhost":"/rabbit",
       "durable":true,
       "auto_delete":false,
       "arguments":{}
    }
  exchanges: |-
    {
       "name":"myName",
       "vhost":"/rabbit",
       "type":"direct",
       "durable":true,
       "auto_delete":false,
       "internal":false,
       "arguments":{}
    }
  bindings: |-
    {
       "source":"myName",
       "vhost":"/rabbit",
       "destination":"myName",
       "destination_type":"queue",
       "routing_key":"myKey",
       "arguments":{}
    }
## Sets the policies in definitions.json. This can be used to control the high
## availability of queues by mirroring them to multiple nodes.
## Ref: https://www.rabbitmq.com/ha.html
  policies: |-
    {
      "name": "ha-all",
      "pattern": ".*",
      "vhost": "/",
      "definition": {
        "ha-mode": "all",
        "ha-sync-mode": "automatic",
        "ha-sync-batch-size": 1
      }
    }
## RabbitMQ default VirtualHost
## Ref: https://www.rabbitmq.com/vhosts.html
##
rabbitmqVhost: "/"
## Erlang cookie to determine whether different nodes are allowed to communicate with each other
## Ref: https://www.rabbitmq.com/clustering.html
##
# rabbitmqErlangCookie:
## RabbitMQ Memory high watermark
## Ref: http://www.rabbitmq.com/memory.html
##
rabbitmqMemoryHighWatermark: 256MB
rabbitmqMemoryHighWatermarkType: absolute
## EPMD port for peer discovery service used by RabbitMQ nodes and CLI tools
## Ref: https://www.rabbitmq.com/clustering.html
##
rabbitmqEpmdPort: 4369
## Node port
rabbitmqNodePort: 5672
## Manager port
rabbitmqManagerPort: 15672
## Set to true to precompile parts of RabbitMQ with HiPE, a just-in-time
## compiler for Erlang. This will increase server throughput at the cost of
## increased startup time. You might see 20-50% better performance at the cost
## of a few minutes delay at startup.
rabbitmqHipeCompile: false
## SSL certificates
## Red: http://www.rabbitmq.com/ssl.html
rabbitmqCert:
  enabled: false
  # Specifies an existing secret to be used for SSL Certs
  existingSecret: ""
  ## Create a new secret using these values
  cacertfile: |
  certfile: |
  keyfile: |
## Extra volumes for statefulset
extraVolumes: []
## Extra volume mounts for statefulset
extraVolumeMounts: []
## Authentication mechanism
## Ref: http://www.rabbitmq.com/authentication.html
rabbitmqAuth:
  enabled: false
  config: |
    # auth_mechanisms.1 = PLAIN
    # auth_mechanisms.2 = AMQPLAIN
    # auth_mechanisms.3 = EXTERNAL
## Automatic Partition Handling Strategy (split brain handling)
## Ref: https://www.rabbitmq.com/partitions.html#automatic-handling
## Note: pause-if-all-down is not supported without using a custom configmap since it requires extra
## configuration.
rabbitmqClusterPartitionHandling: autoheal
## Authentication backend
## Ref: https://github.com/rabbitmq/rabbitmq-auth-backend-http
rabbitmqAuthHTTP:
  enabled: false
  config: |
    # auth_backends.1 = http
    # auth_http.user_path     = http://some-server/auth/user
    # auth_http.vhost_path    = http://some-server/auth/vhost
    # auth_http.resource_path = http://some-server/auth/resource
    # auth_http.topic_path    = http://some-server/auth/topic
## LDAP Plugin
## Ref: http://www.rabbitmq.com/ldap.html
rabbitmqLDAPPlugin:
  enabled: false
  ## LDAP configuration:
  config: |
    # auth_backends.1 = ldap
    # auth_ldap.servers.1  = my-ldap-server
    # auth_ldap.user_dn_pattern = cn=${username},ou=People,dc=example,dc=com
    # auth_ldap.use_ssl    = false
    # auth_ldap.port       = 389
    # auth_ldap.log        = false
## MQTT Plugin
## Ref: http://www.rabbitmq.com/mqtt.html
rabbitmqMQTTPlugin:
  enabled: false
  ## MQTT configuration:
  config: |
    # mqtt.default_user     = guest
    # mqtt.default_pass     = guest
    # mqtt.allow_anonymous  = true
## Web MQTT Plugin
## Ref: http://www.rabbitmq.com/web-mqtt.html
rabbitmqWebMQTTPlugin:
  enabled: false
  ## Web MQTT configuration:
  config: |
    # web_mqtt.ssl.port       = 12345
    # web_mqtt.ssl.backlog    = 1024
    # web_mqtt.ssl.certfile   = /etc/cert/cacert.pem
    # web_mqtt.ssl.keyfile    = /etc/cert/cert.pem
    # web_mqtt.ssl.cacertfile = /etc/cert/key.pem
    # web_mqtt.ssl.password   = changeme
## STOMP Plugin
## Ref: http://www.rabbitmq.com/stomp.html
rabbitmqSTOMPPlugin:
  enabled: false
  ## STOMP configuration:
  config: |
    # stomp.default_user = guest
    # stomp.default_pass = guest
## Web STOMP Plugin
## Ref: http://www.rabbitmq.com/web-stomp.html
rabbitmqWebSTOMPPlugin:
  enabled: false
  ## Web STOMP configuration:
  config: |
    # web_stomp.ws_frame = binary
    # web_stomp.cowboy_opts.max_keepalive = 10
## AMQPS support
## Ref: http://www.rabbitmq.com/ssl.html
rabbitmqAmqpsSupport:
  enabled: false
  # NodePort
  amqpsNodePort: 5671
  # SSL configuration
  config: |
    # listeners.ssl.default             = 5671
    # ssl_options.cacertfile            = /etc/cert/cacert.pem
    # ssl_options.certfile              = /etc/cert/cert.pem
    # ssl_options.keyfile               = /etc/cert/key.pem
    # ssl_options.verify                = verify_peer
    # ssl_options.fail_if_no_peer_cert  = false
## Number of replicas
replicaCount: 3
image:
  repository: k8s.harbor.test.site/system/rabbitmq
  tag: 3.7.12-alpine
  pullPolicy: IfNotPresent
  ## Optionally specify an array of imagePullSecrets.
  ## Secrets must be manually created in the namespace.
  ## ref: https://kubernetes.io/docs/tasks/configure-pod-container/pull-image-private-registry/
  ##
  # pullSecrets:
  #   - myRegistrKeySecretName
busyboxImage:
  repository: k8s.harbor.test.site/system/busybox
  tag: latest
  pullPolicy: Always
## Duration in seconds the pod needs to terminate gracefully
terminationGracePeriodSeconds: 10
service:
  annotations: {}
  clusterIP: None
  ## List of IP addresses at which the service is available
  ## Ref: https://kubernetes.io/docs/user-guide/services/#external-ips
  ##
  externalIPs: []
  loadBalancerIP: ""
  loadBalancerSourceRanges: []
  type: ClusterIP
  ## Customize nodePort number when the service type is NodePort
  ### Ref: https://kubernetes.io/docs/concepts/services-networking/service/#publishing-services-service-types
  ###
  epmdNodePort: null
  amqpNodePort: null
  managerNodePort: null
podManagementPolicy: OrderedReady
## Statefulsets rolling update update strategy
## Ref: https://kubernetes.io/docs/tutorials/stateful-application/basic-stateful-set/#rolling-update
##
updateStrategy: OnDelete
## Statefulsets Pod Priority
## Ref: https://kubernetes.io/docs/concepts/configuration/pod-priority-preemption/#priorityclass
## priorityClassName: ""
## We usually recommend not to specify default resources and to leave this as
## a conscious choice for the user. This also increases chances charts run on
## environments with little resources, such as Minikube. If you do want to
## specify resources, uncomment the following lines, adjust them as necessary,
## and remove the curly braces after 'resources:'.
## If you decide to set the memory limit, make sure to also change the
## rabbitmqMemoryHighWatermark following the formula:
##   rabbitmqMemoryHighWatermark = 0.4 * resources.limits.memory
##
resources: {}
  # limits:
  #   cpu: 100m
  #   memory: 1Gi
  # requests:
  #   cpu: 100m
  #   memory: 1Gi
initContainer:
  resources: {}
  #   limits:
  #     cpu: 100m
  #     memory: 128Mi
  #   requests:
  #     cpu: 100m
  #     memory: 128Mi
## Use an alternate scheduler, e.g. "stork".
## ref: https://kubernetes.io/docs/tasks/administer-cluster/configure-multiple-schedulers/
##
# schedulerName:
## Data Persistency
persistentVolume:
  enabled: true
  ## If defined, storageClassName: <storageClass>
  ## If set to "-", storageClassName: "", which disables dynamic provisioning
  ## If undefined (the default) or set to null, no storageClassName spec is
  ##   set, choosing the default provisioner.  (gp2 on AWS, standard on
  ##   GKE, AWS & OpenStack)
  ##
  storageClass: "dynamic"
  name: data
  accessModes:
    - ReadWriteOnce
  size: 8Gi
  annotations: {}
## Node labels for pod assignment
## Ref: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#nodeselector
##
nodeSelector: {}
## Node tolerations for pod assignment
## Ref: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#taints-and-tolerations-beta-feature
##
tolerations: []
## Extra Annotations to be added to pod
podAnnotations: {}
## Pod affinity
## Ref: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity
podAntiAffinity: soft
## Create default configMap
##
existingConfigMap: false
## Add additional labels to all resources
##
extraLabels: {}
## Role Based Access
## Ref: https://kubernetes.io/docs/admin/authorization/rbac/
##
rbac:
  create: true
## Service Account
## Ref: https://kubernetes.io/docs/admin/service-accounts-admin/
##
serviceAccount:
  create: true
  ## The name of the ServiceAccount to use.
  ## If not set and create is true, a name is generated using the fullname template
  # name:
ingress:
  ## Set to true to enable ingress record generation
  enabled: true
  path: /
  ## The list of hostnames to be covered with this ingress record.
  ## Most likely this will be just one host, but in the event more hosts are needed, this is an array
  hostName: k8s.rabbitmq.test.site
  ## Set this to true in order to enable TLS on the ingress record
  tls: false
  ## If TLS is set to true, you must declare what secret will store the key/certificate for TLS
  tlsSecret: myTlsSecret
  ## Ingress annotations done as key:value pairs
  annotations:
    kubernetes.io/ingress.class: traefik
livenessProbe:
  initialDelaySeconds: 120
  periodSeconds: 10
  timeoutSeconds: 5
  failureThreshold: 6
readinessProbe:
  failureThreshold: 6
  initialDelaySeconds: 20
  timeoutSeconds: 3
  periodSeconds: 5
# Specifies an existing secret to be used for RMQ password and Erlang Cookie
existingSecret: ""
## ref: https://kubernetes.io/docs/tasks/configure-pod-container/security-context/
##
securityContext:
  fsGroup: 101
  runAsGroup: 101
  runAsNonRoot: true
  runAsUser: 100
prometheus:
  ## Configures Prometheus Exporter to expose and scrape stats.
  exporter:
    enabled: true
    env: {}
    image:
      repository: kbudde/rabbitmq-exporter
      tag: v0.29.0
      pullPolicy: IfNotPresent
    ## Port Prometheus scrapes for metrics
    port: 9090
    ## Comma-separated list of extended scraping capabilities supported by the target RabbitMQ server
    capabilities: "bert,no_sort"
    ## Allow overriding of container resources
    resources: {}
     # limits:
     #   cpu: 200m
     #   memory: 1Gi
     # requests:
     #   cpu: 100m
     #   memory: 100Mi
  ## Prometheus is using Operator.  Setting to true will create Operator specific resources like ServiceMonitors and Alerts
  operator:
    ## Are you using Prometheus Operator? [Blog Post](https://coreos.com/blog/the-prometheus-operator.html)
    enabled: true
    ## Configures Alerts, which will be setup via Prometheus Operator / ConfigMaps.
    alerts:
      ## Prometheus exporter must be enabled as well
      enabled: true
      ## Selector must be configured to match Prometheus Install, defaulting to whats done by Prometheus Operator
      ## See [CoreOS Prometheus Chart](https://github.com/coreos/prometheus-operator/tree/master/helm)
      selector:
        role: alert-rules
      labels: {}
    serviceMonitor:
      ## Interval at which Prometheus scrapes RabbitMQ Exporter
      interval: 10s
      # Namespace Prometheus is installed in
      namespace: kube-system
      ## Defaults to whats used if you follow CoreOS [Prometheus Install Instructions](https://github.com/coreos/prometheus-operator/tree/master/helm#tldr)
      ## [Prometheus Selector Label](https://github.com/coreos/prometheus-operator/blob/master/helm/prometheus/templates/prometheus.yaml#L65)
      ## [Kube Prometheus Selector Label](https://github.com/coreos/prometheus-operator/blob/master/helm/kube-prometheus/values.yaml#L298)
      selector:
        prometheus: kube-prometheus
## Kubernetes Cluster Domain
clusterDomain: cluster.local
## Pod Disruption Budget
podDisruptionBudget: {}
  # maxUnavailable: 1
  # minAvailable: 1

根据官方文档的参考，我主要修改了如下的配置

rabbitmq的一些初始账户和密码信息；
默认安装了一些pulgins，主要是队列消息同步的组件；
定义了一个初始化的vhost、生产者用户、用户权限、队列名、Exchange、Bindings、同步策略等；
定义了集群的参数，包括ErlangCookie、集群同步的端口、web管理端口15672、服务端口5672；
定义Pods的副本数为3，也就是创建一个3节点的mq集群；定义镜像全部为私有仓库镜像，镜像包括mq镜像，用于初始化的busybox镜像，用于prometheus监控的rabbitmq-exporter镜像
还定义了Pod的启动策略 OrderedReady，因为mq集群也是有状态应用，启动和停止都需要有顺序；
定义了持久化存储，使用ceph sc；
定义了一个ingress,主要是用于访问web管理页面；

使用helm status my-rabbitmq-ha 来查看如何访问rabbitmq

[root@master-01 rabbitmq-ha]# helm status my-rabbitmq-ha
LAST DEPLOYED: Thu Apr 25 19:43:02 2019
NAMESPACE: kube-system
STATUS: DEPLOYED
RESOURCES:
==> v1/Secret
NAME            TYPE    DATA  AGE
my-rabbitmq-ha  Opaque  4     6d16h
==> v1beta1/Role
NAME            AGE
my-rabbitmq-ha  6d16h
==> v1beta1/Ingress
NAME            HOSTS                       ADDRESS  PORTS  AGE
my-rabbitmq-ha  k8s.rabbitmq.maimaiti.site  80       6d16h
==> v1/PrometheusRule
NAME                            AGE
my-rabbitmq-ha-rabbitmq-alerts  6d16h
==> v1/ConfigMap
NAME            DATA  AGE
my-rabbitmq-ha  2     6d16h
==> v1/ServiceAccount
NAME            SECRETS  AGE
my-rabbitmq-ha  1        6d16h
==> v1beta1/RoleBinding
NAME            AGE
my-rabbitmq-ha  6d16h
==> v1/Service
NAME                      TYPE       CLUSTER-IP  EXTERNAL-IP  PORT(S)                               AGE
my-rabbitmq-ha-discovery  ClusterIP  None        <none>       15672/TCP,5672/TCP,4369/TCP           6d16h
my-rabbitmq-ha            ClusterIP  None        <none>       15672/TCP,5672/TCP,4369/TCP,9090/TCP  6d16h
==> v1beta1/StatefulSet
NAME            DESIRED  CURRENT  AGE
my-rabbitmq-ha  3        3        6d16h
==> v1/ServiceMonitor
NAME            AGE
my-rabbitmq-ha  6d16h
==> v1/Pod(related)
NAME              READY  STATUS   RESTARTS  AGE
my-rabbitmq-ha-0  2/2    Running  2         5d20h
my-rabbitmq-ha-1  2/2    Running  2         6d15h
my-rabbitmq-ha-2  2/2    Running  0         2d20h
NOTES:
** Please be patient while the chart is being deployed **
  Credentials:
    Username      : guest
    Password      : $(kubectl get secret --namespace kube-system my-rabbitmq-ha -o jsonpath="{.data.rabbitmq-password}" | base64 --decode)
    ErLang Cookie : $(kubectl get secret --namespace kube-system my-rabbitmq-ha -o jsonpath="{.data.rabbitmq-erlang-cookie}" | base64 --decode)
  RabbitMQ can be accessed within the cluster on port 5672 at my-rabbitmq-ha.kube-system.svc.cluster.local
  To access the cluster externally execute the following commands:
    export POD_NAME=$(kubectl get pods --namespace kube-system -l "app=rabbitmq-ha" -o jsonpath="{.items[0].metadata.name}")
    kubectl port-forward $POD_NAME --namespace kube-system 5672:5672 15672:15672
    kubectl port-forward $POD_NAME --namespace kube-system 9090:9090
  To Access the RabbitMQ AMQP port:
    amqp://127.0.0.1:5672/
  To Access the RabbitMQ Exporter metrics port:
    http://127.0.0.1:9090/ 
  To Access the RabbitMQ Management interface:
    URL : http://127.0.0.1:15672
[root@master-01 rabbitmq-ha]#

好的，今天的博文就写到这里。由于51放假这几天的大多时间都用来陪伴家人了，趁着今天值班就把这几天的k8s实验总结一下。最近这几个月一直都在忙k8s的项目，后续还会补充一些helm安装mysql、helm安装jenkins,重点还会介绍一下k8s容器里面的CI/CD是如何实现的。希望大家持续关注一下我的技术博客和公众号。