暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

在Kubernetes中搭建日志系统

乔克的好奇心 2021-09-27
338
【微语】无论你什么时候开始,重要的是开始了就不要停止;无论你什么时候结束,重要的是结束了就不要悔恨。

参考资料:

作者:阳明

博客地址:https://www.qikqiak.com/k8s-book/

Kubernetes 中比较流行的日志收集解决方案是 Elasticsearch、Fluentd 和 Kibana(EFK)技术栈,也是官方现在比较推荐的一种方案。

  • Elasticsearch 是一个实时的、分布式的可扩展的搜索引擎,允许进行全文、结构化搜索,它通常用于索引和搜索大量日志数据,也可用于搜索许多不同类型的文档。

  • Kibana 是 Elasticsearch 的一个功能强大的数据可视化 Dashboard,Kibana 允许你通过 web 界面来浏览 Elasticsearch 日志数据。

  • Fluentd是一个流行的开源数据收集器,我们将在 Kubernetes 集群节点上安装 Fluentd,通过获取容器日志文件、过滤和转换日志数据,然后将数据传递到 Elasticsearch 集群,在该集群中对其进行索引和存储。

正常情况下,上面这种方案就足够我们使用,但是如果集群日志太多,ES不堪重负,我们就需要接入中间件来缓冲数据,对于这些中间件来说kafka和redis无疑是我们的首选方案。我们这里采用了kafka,我们追求一切容器化,所以将这些组件全部都部署在Kubernetes中。

注:

(1)、我们将所有的组件都部署在一个单独的namespace中,我这里是新建了一个kube-ops的namespace;

(2)、集群部署到分布式存储,可选ceph,NFS等,我这里采用的NFS,如果你和我一样使用NFS并且不会搭建,可以参考https://www.qikqiak.com/k8s-book/docs/35.StorageClass.html;

创建Namespace

首先创建一个Namespace,可以使用命令,如下:

  1. kubectl create ns kube-ops

也可以使用YAML清单,如下(efk-ns.yaml):

  1. apiVersion: v1

  2. kind: Namespace

  3. metadata:

  4. name: kube-ops

如果使用清单,需要创建清单文件:

  1. kubectl apply -f efk-ns.yaml

部署Elasticsearch

首先我们来部署Elasticsearch集群。

开始部署3个节点的ElasticSearch。其中关键点是应该设置discover.zen.minimummasternodes=N/2+1,其中N是 Elasticsearch 集群中符合主节点的节点数,比如我们这里3个节点,意味着N应该设置为2。这样,如果一个节点暂时与集群断开连接,则另外两个节点可以选择一个新的主节点,并且集群可以在最后一个节点尝试重新加入时继续运行,在扩展 Elasticsearch 集群时,一定要记住这个参数。

(1)、创建 elasticsearch无头服务(elasticsearch-svc.yaml)

  1. apiVersion: v1

  2. kind: Service

  3. metadata:

  4. name: elasticsearch

  5. namespace: kube-ops

  6. labels:

  7. app: elasticsearch

  8. spec:

  9. selector:

  10. app: elasticsearch

  11. clusterIP: None

  12. ports:

  13. - name: rest

  14. port: 9200

  15. - name: inter-node

  16. port: 9300

定义为无头服务,是因为我们后面真正部署elasticsearch的pod是通过statefulSet部署的,到时候将其进行关联,另外9200是REST API端口,9300是集群间通信端口。

然后我们创建这个资源对象。

  1. # kubectl apply -f elasticsearch-svc.yaml

  2. service/elasticsearch created

  3. # kubectl get svc -n kube-ops

  4. NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE

  5. elasticsearch ClusterIP None <none> 9200/TCP,9300/TCP 9s

(2)、用StatefulSet部署Elasticsearch,配置清单如下(elasticsearch-elasticsearch.yaml ):

  1. apiVersion: apps/v1

  2. kind: StatefulSet

  3. metadata:

  4. name: es-cluster

  5. namespace: kube-ops

  6. spec:

  7. serviceName: elasticsearch

  8. replicas: 3

  9. selector:

  10. matchLabels:

  11. app: elasticsearch

  12. template:

  13. metadata:

  14. labels:

  15. app: elasticsearch

  16. spec:

  17. containers:

  18. - name: elasticsearch

  19. image: docker.elastic.co/elasticsearch/elasticsearch-oss:6.4.3

  20. resources:

  21. limits:

  22. cpu: 1000m

  23. requests:

  24. cpu: 1000m

  25. ports:

  26. - containerPort: 9200

  27. name: rest

  28. protocol: TCP

  29. - containerPort: 9300

  30. name: inter-node

  31. protocol: TCP

  32. volumeMounts:

  33. - name: data

  34. mountPath: /usr/share/elasticsearch/data

  35. env:

  36. - name: cluster.name

  37. value: k8s-logs

  38. - name: node.name

  39. valueFrom:

  40. fieldRef:

  41. fieldPath: metadata.name

  42. - name: discovery.zen.ping.unicast.hosts

  43. value: "es-cluster-0.elasticsearch,es-cluster-1.elasticsearch,es-cluster-2.elasticsearch"

  44. - name: discovery.zen.minimum_master_nodes

  45. value: "2"

  46. - name: ES_JAVA_OPTS

  47. value: "-Xms512m -Xmx512m"

  48. initContainers:

  49. - name: fix-permissions

  50. image: busybox

  51. command: ["sh", "-c", "chown -R 1000:1000 usr/share/elasticsearch/data"]

  52. securityContext:

  53. privileged: true

  54. volumeMounts:

  55. - name: data

  56. mountPath: /usr/share/elasticsearch/data

  57. - name: increase-vm-max-map

  58. image: busybox

  59. command: ["sysctl", "-w", "vm.max_map_count=262144"]

  60. securityContext:

  61. privileged: true

  62. - name: increase-fd-ulimit

  63. image: busybox

  64. command: ["sh", "-c", "ulimit -n 65536"]

  65. securityContext:

  66. privileged: true

  67. volumeClaimTemplates:

  68. - metadata:

  69. name: data

  70. labels:

  71. app: elasticsearch

  72. annotations:

  73. volume.beta.kubernetes.io/storage-class: es-data-db

  74. spec:

  75. accessModes: [ "ReadWriteOnce" ]

  76. storageClassName: es-data-db

  77. resources:

  78. requests:

  79. storage: 10Gi

配置清单说明:

上面Pod中定义了两种类型的container,普通的container和initContainer。其中在initContainer中它有3个container,它们会在所有容器启动前运行。

  • 名为fix-permissions的container的作用是将 Elasticsearch 数据目录的用户和组更改为1000:1000(Elasticsearch 用户的 UID)。因为默认情况下,Kubernetes 用 root 用户挂载数据目录,这会使得 Elasticsearch 无法读取该数据目录。

  • 名为 increase-vm-max-map 的容器用来增加操作系统对mmap计数的限制,默认情况下该值可能太低,导致内存不足的错误

  • 名为increase-fd-ulimit的容器用来执行ulimit命令增加打开文件描述符的最大数量

在普通container中,我们定义了名为elasticsearch的container,然后暴露了9200和9300两个端口,注意名称要和上面定义的 Service 保持一致。然后通过 volumeMount 声明了数据持久化目录,下面我们再来定义 VolumeClaims。最后就是我们在容器中设置的一些环境变量了:

  • cluster.name:Elasticsearch 集群的名称,我们这里命名成 k8s-logs;

  • node.name:节点的名称,通过metadata.name来获取。这将解析为 es-cluster-[0,1,2],取决于节点的指定顺序;

  • discovery.zen.ping.unicast.hosts:此字段用于设置在 Elasticsearch 集群中节点相互连接的发现方法。我们使用 unicastdiscovery 方式,它为我们的集群指定了一个静态主机列表。由于我们之前配置的无头服务,我们的 Pod 具有唯一的 DNS 域es-cluster-[0,1,2].elasticsearch.logging.svc.cluster.local,因此我们相应地设置此变量。由于都在同一个 namespace 下面,所以我们可以将其缩短为es-cluster-[0,1,2].elasticsearch;

  • discovery.zen.minimummasternodes:我们将其设置为(N/2) + 1,N是我们的群集中符合主节点的节点的数量。我们有3个 Elasticsearch 节点,因此我们将此值设置为2(向下舍入到最接近的整数);

  • ESJAVAOPTS:这里我们设置为-Xms512m -Xmx512m,告诉JVM使用512 MB的最小和最大堆。您应该根据群集的资源可用性和需求调整这些参数;

(3)、定义一个StorageClass(elasticsearch-storage.yaml)

  1. apiVersion: storage.k8s.io/v1

  2. kind: StorageClass

  3. metadata:

  4. name: es-data-db

  5. provisioner: rookieops/nfs

注意:由于我们这里采用的是NFS来存储,所以上面的provisioner需要和我们nfs-client-provisoner中保持一致。

然后我们创建资源:

  1. # kubectl apply -f elasticsearch-storage.yaml

  2. # kubectl apply -f elasticsearch-elasticsearch.yaml

  3. # kubectl get pod -n kube-ops

  4. NAME READY STATUS RESTARTS AGE

  5. dingtalk-hook-8497494dc6-s6qkh 1/1 Running 0 16m

  6. es-cluster-0 1/1 Running 0 10m

  7. es-cluster-1 1/1 Running 0 10m

  8. es-cluster-2 1/1 Running 0 9m20s

  9. # kubectl get pvc -n kube-ops

  10. NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE

  11. data-es-cluster-0 Bound pvc-9f15c0f8-60a8-485d-b650-91fb8f5f8076 10Gi RWO es-data-db 18m

  12. data-es-cluster-1 Bound pvc-503828ec-d98e-4e94-9f00-eaf6c05f3afd 10Gi RWO es-data-db 11m

  13. data-es-cluster-2 Bound pvc-3d2eb82e-396a-4eb0-bb4e-2dd4fba8600e 10Gi RWO es-data-db 10m

  14. # kubectl get svc -n kube-ops

  15. NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE

  16. dingtalk-hook ClusterIP 10.68.122.48 <none> 5000/TCP 18m

  17. elasticsearch ClusterIP None <none> 9200/TCP,9300/TCP 19m

测试:

  1. # kubectl port-forward es-cluster-0 9200:9200 --namespace=kube-ops

  2. Forwarding from 127.0.0.1:9200 -> 9200

  3. Forwarding from [::1]:9200 -> 9200

  4. Handling connection for 9200

如果看到如下结果,就表示服务正常:

  1. # curl http://localhost:9200/_cluster/state?pretty

  2. {

  3. "cluster_name" : "k8s-logs",

  4. "compressed_size_in_bytes" : 337,

  5. "cluster_uuid" : "nzc4y-eDSuSaYU1TigFAWw",

  6. "version" : 3,

  7. "state_uuid" : "6Mvd-WTPT0e7WMJV23Vdiw",

  8. "master_node" : "KRyMrbS0RXSfRkpS0ZaarQ",

  9. "blocks" : { },

  10. "nodes" : {

  11. "XGP4TrkrQ8KNMpH3pQlaEQ" : {

  12. "name" : "es-cluster-2",

  13. "ephemeral_id" : "f-R_IyfoSYGhY27FmA41Tg",

  14. "transport_address" : "172.20.1.104:9300",

  15. "attributes" : { }

  16. },

  17. "KRyMrbS0RXSfRkpS0ZaarQ" : {

  18. "name" : "es-cluster-0",

  19. "ephemeral_id" : "FpTnJTR8S3ysmoZlPPDnSg",

  20. "transport_address" : "172.20.1.102:9300",

  21. "attributes" : { }

  22. },

  23. "Xzjk2n3xQUutvbwx2h7f4g" : {

  24. "name" : "es-cluster-1",

  25. "ephemeral_id" : "FKjRuegwToe6Fz8vgPmSNw",

  26. "transport_address" : "172.20.1.103:9300",

  27. "attributes" : { }

  28. }

  29. },

  30. "metadata" : {

  31. "cluster_uuid" : "nzc4y-eDSuSaYU1TigFAWw",

  32. "templates" : { },

  33. "indices" : { },

  34. "index-graveyard" : {

  35. "tombstones" : [ ]

  36. }

  37. },

  38. "routing_table" : {

  39. "indices" : { }

  40. },

  41. "routing_nodes" : {

  42. "unassigned" : [ ],

  43. "nodes" : {

  44. "KRyMrbS0RXSfRkpS0ZaarQ" : [ ],

  45. "XGP4TrkrQ8KNMpH3pQlaEQ" : [ ],

  46. "Xzjk2n3xQUutvbwx2h7f4g" : [ ]

  47. }

  48. },

  49. "snapshots" : {

  50. "snapshots" : [ ]

  51. },

  52. "restore" : {

  53. "snapshots" : [ ]

  54. },

  55. "snapshot_deletions" : {

  56. "snapshot_deletions" : [ ]

  57. }

  58. }

到此,Elasticsearch部署完成。

部署kibana

对于kibana,它只是一个展示工具,所以我们用Deployment部署即可。

(1)、定义kibana service的配置清单(kibana-svc.yaml)

  1. apiVersion: v1

  2. kind: Service

  3. metadata:

  4. name: kibana

  5. namespace: kube-ops

  6. labels:

  7. app: kibana

  8. spec:

  9. ports:

  10. - port: 5601

  11. type: NodePort

  12. selector:

  13. app: kibana

我们这里配置的Service是采用NodePort类型,当然也可以采用ingress,推荐使用ingress。

(2)、定义kibana Deployment配置清单(kibana-deploy.yaml)

  1. apiVersion: apps/v1

  2. kind: Deployment

  3. metadata:

  4. name: kibana

  5. namespace: kube-ops

  6. labels:

  7. app: kibana

  8. spec:

  9. selector:

  10. matchLabels:

  11. app: kibana

  12. template:

  13. metadata:

  14. labels:

  15. app: kibana

  16. spec:

  17. containers:

  18. - name: kibana

  19. image: docker.elastic.co/kibana/kibana-oss:6.4.3

  20. resources:

  21. limits:

  22. cpu: 1000m

  23. requests:

  24. cpu: 100m

  25. env:

  26. - name: ELASTICSEARCH_URL

  27. value: http://elasticsearch:9200

  28. ports:

  29. - containerPort: 5601

创建配置清单:

  1. # kubectl apply -f kibana.yaml

  2. service/kibana created

  3. deployment.apps/kibana created


  4. # kubectl get svc -n kube-ops

  5. NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE

  6. dingtalk-hook ClusterIP 10.68.122.48 <none> 5000/TCP 47m

  7. elasticsearch ClusterIP None <none> 9200/TCP,9300/TCP 48m

  8. kibana NodePort 10.68.221.60 <none> 5601:26575/TCP 7m29s

  9. [root@ecs-5704-0003 storage]# kubectl get pod -n kube-ops

  10. NAME READY STATUS RESTARTS AGE

  11. dingtalk-hook-8497494dc6-s6qkh 1/1 Running 0 47m

  12. es-cluster-0 1/1 Running 0 41m

  13. es-cluster-1 1/1 Running 0 41m

  14. es-cluster-2 1/1 Running 0 40m

  15. kibana-7fc9f8c964-68xbh 1/1 Running 0 7m41s

如果看到以下界面,以表示你的kibana部署完成。

部署kafka

Apache Kafka是一个分布式发布 - 订阅消息系统和一个强大的队列,可以处理大量的数据,并使您能够将消息从一个端点传递到另一个端点。Kafka适合离线和在线消息消费。Kafka消息保留在磁盘上,并在群集内复制以防止数据丢失。

以下是Kafka的几个好处 :

  • 可靠性 - Kafka是分布式,分区,复制和容错的。

  • 可扩展性 - Kafka消息传递系统轻松缩放,无需停机。

  • 耐用性 - Kafka使用分布式提交日志,这意味着消息会尽可能快地保留在磁盘上,因此它是持久的。

  • 性能 - Kafka对于发布和订阅消息都具有高吞吐量。即使存储了许多TB的消息,它也保持稳定的性能。

Kafka的一个关键依赖是Zookeeper,它是一个分布式配置和同步服务, Zookeeper是Kafka代理和消费者之间的协调接口, Kafka服务器通过Zookeeper集群共享信息。Kafka在Zookeeper中存储基本元数据,例如关于主题,代理,消费者偏移(队列读取器)等的信息。

由于所有关键信息存储在Zookeeper中,并且它通常在其整体上复制此数据,因此Kafka代理/ Zookeeper的故障不会影响Kafka集群的状态,另外Kafka代理之间的领导者选举也通过使用Zookeeper在领导者失败的情况下完成的。

部署zookeeper

(1)、定义ZK的storageClass(zookeeper-storage.yaml)

  1. apiVersion: storage.k8s.io/v1

  2. kind: StorageClass

  3. metadata:

  4. name: zk-data-db

  5. provisioner: rookieops/nfs

(2)、定义ZK的headless service(zookeeper-svc.yaml)

  1. apiVersion: v1

  2. kind: Service

  3. metadata:

  4. name: zk-svc

  5. namespace: kube-ops

  6. labels:

  7. app: zk-svc

  8. spec:

  9. ports:

  10. - port: 2888

  11. name: server

  12. - port: 3888

  13. name: leader-election

  14. clusterIP: None

  15. selector:

  16. app: zk

(3)、定义ZK的configMap(zookeeper-config.yaml)

  1. apiVersion: v1

  2. kind: ConfigMap

  3. metadata:

  4. name: zk-cm

  5. namespace: kube-ops

  6. data:

  7. jvm.heap: "1G"

  8. tick: "2000"

  9. init: "10"

  10. sync: "5"

  11. client.cnxns: "60"

  12. snap.retain: "3"

  13. purge.interval: "0"

(4)、定义ZK的PodDisruptionBudget(zookeeper-pdb.yaml)

  1. apiVersion: policy/v1beta1

  2. kind: PodDisruptionBudget

  3. metadata:

  4. name: zk-pdb

  5. namespace: kube-ops

  6. spec:

  7. selector:

  8. matchLabels:

  9. app: zk

  10. minAvailable: 2

说明:PodDisruptionBudget的作用就是为了保证业务不中断或者业务SLA不降级。通过PodDisruptionBudget控制器可以设置应用POD集群处于运行状态最低个数,也可以设置应用POD集群处于运行状态的最低百分比,这样可以保证在主动销毁应用POD的时候,不会一次性销毁太多的应用POD,从而保证业务不中断或业务SLA不降级。

(5)、定义ZK的statefulSet(zookeeper-statefulset.yaml)

  1. apiVersion: apps/v1beta1

  2. kind: StatefulSet

  3. metadata:

  4. name: zk

  5. namespace: kube-ops

  6. spec:

  7. serviceName: zk-svc

  8. replicas: 3

  9. template:

  10. metadata:

  11. labels:

  12. app: zk

  13. spec:

  14. containers:

  15. - name: k8szk

  16. imagePullPolicy: Always

  17. image: registry.cn-hangzhou.aliyuncs.com/rookieops/zookeeper:3.4.10

  18. resources:

  19. requests:

  20. memory: "2Gi"

  21. cpu: "500m"

  22. ports:

  23. - containerPort: 2181

  24. name: client

  25. - containerPort: 2888

  26. name: server

  27. - containerPort: 3888

  28. name: leader-election

  29. env:

  30. - name : ZK_REPLICAS

  31. value: "3"

  32. - name : ZK_HEAP_SIZE

  33. valueFrom:

  34. configMapKeyRef:

  35. name: zk-cm

  36. key: jvm.heap

  37. - name : ZK_TICK_TIME

  38. valueFrom:

  39. configMapKeyRef:

  40. name: zk-cm

  41. key: tick

  42. - name : ZK_INIT_LIMIT

  43. valueFrom:

  44. configMapKeyRef:

  45. name: zk-cm

  46. key: init

  47. - name : ZK_SYNC_LIMIT

  48. valueFrom:

  49. configMapKeyRef:

  50. name: zk-cm

  51. key: tick

  52. - name : ZK_MAX_CLIENT_CNXNS

  53. valueFrom:

  54. configMapKeyRef:

  55. name: zk-cm

  56. key: client.cnxns

  57. - name: ZK_SNAP_RETAIN_COUNT

  58. valueFrom:

  59. configMapKeyRef:

  60. name: zk-cm

  61. key: snap.retain

  62. - name: ZK_PURGE_INTERVAL

  63. valueFrom:

  64. configMapKeyRef:

  65. name: zk-cm

  66. key: purge.interval

  67. - name: ZK_CLIENT_PORT

  68. value: "2181"

  69. - name: ZK_SERVER_PORT

  70. value: "2888"

  71. - name: ZK_ELECTION_PORT

  72. value: "3888"

  73. command:

  74. - sh

  75. - -c

  76. - zkGenConfig.sh && zkServer.sh start-foreground

  77. readinessProbe:

  78. exec:

  79. command:

  80. - "zkOk.sh"

  81. initialDelaySeconds: 10

  82. timeoutSeconds: 5

  83. livenessProbe:

  84. exec:

  85. command:

  86. - "zkOk.sh"

  87. initialDelaySeconds: 10

  88. timeoutSeconds: 5

  89. volumeMounts:

  90. - name: datadir

  91. mountPath: /var/lib/zookeeper

  92. volumeClaimTemplates:

  93. - metadata:

  94. name: datadir

  95. spec:

  96. accessModes: ["ReadWriteOnce"]

  97. storageClassName: zk-data-db

  98. resources:

  99. requests:

  100. storage: 1Gi

然后创建配置清单:

  1. # kubectl apply -f zookeeper-storage.yaml

  2. # kubectl apply -f zookeeper-svc.yaml

  3. # kubectl apply -f zookeeper-config.yaml

  4. # kubectl apply -f zookeeper-pdb.yaml

  5. # kubectl apply -f zookeeper-statefulset.yaml

  6. # kubectl get pod -n kube-ops

  7. NAME READY STATUS RESTARTS AGE

  8. zk-0 1/1 Running 0 12m

  9. zk-1 1/1 Running 0 12m

  10. zk-2 1/1 Running 0 11m

然后查看集群状态:

  1. # for i in 0 1 2; do kubectl exec -n kube-ops zk-$i zkServer.sh status; done

  2. ZooKeeper JMX enabled by default

  3. Using config: /usr/bin/../etc/zookeeper/zoo.cfg

  4. Mode: follower

  5. ZooKeeper JMX enabled by default

  6. Using config: /usr/bin/../etc/zookeeper/zoo.cfg

  7. Mode: follower

  8. ZooKeeper JMX enabled by default

  9. Using config: /usr/bin/../etc/zookeeper/zoo.cfg

  10. Mode: leader

部署kafka

(1)、制作镜像,Dokcerfile如下:

kafka下载地址:wget https://www-us.apache.org/dist/kafka/2.2.0/kafka_2.11-2.2.0.tgz

  1. FROM centos:centos7

  2. LABEL "auth"="rookieops" \

  3. "mail"="rookieops@163.com"

  4. ENV TIME_ZONE Asia/Shanghai


  5. # install JAVA

  6. ADD jdk-8u131-linux-x64.tar.gz /opt/

  7. ENV JAVA_HOME /opt/jdk1.8.0_131

  8. ENV PATH ${JAVA_HOME}/bin:${PATH}


  9. # install kafka

  10. ADD kafka_2.11-2.3.1.tgz /opt/

  11. RUN mv /opt/kafka_2.11-2.3.1 /opt/kafka

  12. WORKDIR /opt/kafka

  13. EXPOSE 9092

  14. CMD ["./bin/kafka-server-start.sh", "config/server.properties"]

然后docker build,docker push到镜像仓库(操作略)。

(2)、定义kafka的storageClass(kafka-storage.yaml )

  1. apiVersion: storage.k8s.io/v1

  2. kind: StorageClass

  3. metadata:

  4. name: kafka-data-db

  5. provisioner: rookieops/nfs

(3)、定义kafka headless Service(kafka-svc.yaml)

  1. apiVersion: v1

  2. kind: Service

  3. metadata:

  4. name: kafka-svc

  5. namespace: kube-ops

  6. labels:

  7. app: kafka

  8. spec:

  9. selector:

  10. app: kafka

  11. clusterIP: None

  12. ports:

  13. - name: server

  14. port: 9092

(4)、定义kafka的configMap(kafka-config.yaml)

  1. apiVersion: v1

  2. kind: ConfigMap

  3. metadata:

  4. name: kafka-config

  5. namespace: kube-ops

  6. data:

  7. server.properties: |

  8. broker.id=${HOSTNAME##*-}

  9. listeners=PLAINTEXT://:9092

  10. num.network.threads=3

  11. num.io.threads=8

  12. socket.send.buffer.bytes=102400

  13. socket.receive.buffer.bytes=102400

  14. socket.request.max.bytes=104857600

  15. log.dirs=/data/kafka/logs

  16. num.partitions=1

  17. num.recovery.threads.per.data.dir=1

  18. offsets.topic.replication.factor=1

  19. transaction.state.log.replication.factor=1

  20. transaction.state.log.min.isr=1

  21. log.retention.hours=168

  22. log.segment.bytes=1073741824

  23. log.retention.check.interval.ms=300000

  24. zookeeper.connect=zk-0.zk-svc.kube-ops.svc.cluster.local:2181,zk-1.zk-svc.kube-ops.svc.cluster.local:2181,zk-2.zk-svc.kube-ops.svc.cluster.local:2181

  25. zookeeper.connection.timeout.ms=6000

  26. group.initial.rebalance.delay.ms=0

(5)、定义kafka的statefuleSet配置清单(kafka.yaml)

  1. apiVersion: apps/v1

  2. kind: StatefulSet

  3. metadata:

  4. name: kafka

  5. namespace: kube-ops

  6. spec:

  7. serviceName: kafka-svc

  8. replicas: 3

  9. selector:

  10. matchLabels:

  11. app: kafka

  12. template:

  13. metadata:

  14. labels:

  15. app: kafka

  16. spec:

  17. affinity:

  18. podAffinity:

  19. preferredDuringSchedulingIgnoredDuringExecution:

  20. - weight: 1

  21. podAffinityTerm:

  22. labelSelector:

  23. matchExpressions:

  24. - key: "app"

  25. operator: In

  26. values:

  27. - zk

  28. topologyKey: "kubernetes.io/hostname"

  29. terminationGracePeriodSeconds: 300

  30. containers:

  31. - name: kafka

  32. image: registry.cn-hangzhou.aliyuncs.com/rookieops/kafka:2.3.1-beta

  33. imagePullPolicy: Always

  34. resources:

  35. requests:

  36. cpu: 500m

  37. memory: 1Gi

  38. limits:

  39. cpu: 500m

  40. memory: 1Gi

  41. command:

  42. - "/bin/sh"

  43. - "-c"

  44. - "./bin/kafka-server-start.sh config/server.properties --override broker.id=${HOSTNAME##*-}"

  45. ports:

  46. - name: server

  47. containerPort: 9092

  48. volumeMounts:

  49. - name: config

  50. mountPath: /opt/kafka/config/server.properties

  51. subPath: server.properties

  52. - name: data

  53. mountPath: /data/kafka/logs

  54. volumes:

  55. - name: config

  56. configMap:

  57. name: kafka-config

  58. volumeClaimTemplates:

  59. - metadata:

  60. name: data

  61. spec:

  62. accessModes: [ "ReadWriteOnce" ]

  63. storageClassName: kafka-data-db

  64. resources:

  65. requests:

  66. storage: 10Gi

创建配置清单:

  1. # kubectl apply -f kafka-storage.yaml

  2. # kubectl apply -f kafka-svc.yaml

  3. # kubectl apply -f kafka-config.yaml

  4. # kubectl apply -f kafka.yaml

  5. # kubectl get pod -n kube-ops

  6. NAME READY STATUS RESTARTS AGE

  7. kafka-0 1/1 Running 0 13m

  8. kafka-1 1/1 Running 0 13m

  9. kafka-2 1/1 Running 0 10m

  10. zk-0 1/1 Running 0 77m

  11. zk-1 1/1 Running 0 77m

  12. zk-2 1/1 Running 0 76m

测试:

(1)、进入一个container,创建topic,并开启consumer等待producer生产数据

  1. # kubectl exec -it -n kube-ops kafka-0 -- bin/bash

  2. $ cd /opt/kafka

  3. $ ./bin/kafka-topics.sh --create --topic test --zookeeper zk-0.zk-svc.kube-ops.svc.cluster.local:2181,zk-1.zk-svc.kube-ops.svc.cluster.local:2181,zk-2.zk-svc.kube-ops.svc.cluster.local:2181 --partitions 3 --replication-factor 2

  4. Created topic "test".

  5. # 消费

  6. $ ./bin/kafka-console-consumer.sh --topic test --bootstrap-server localhost:9092

(2)、再进入另一个container做producer:

  1. # kubectl exec -it -n kube-ops kafka-1 -- bin/bash

  2. $ cd /opt/kafka

  3. $ ./bin/kafka-console-producer.sh --topic test --broker-list localhost:9092

  4. hello

  5. nihao

可以看到consumer上会产生消费信息:

  1. $ ./bin/kafka-console-consumer.sh --topic test --bootstrap-server localhost:9092

  2. hello

  3. nihao

至此,kafka集群搭建完成。

部署Logstash

在这里部署Logstash的作用是读取kafka中的信息,然后传递给我们的后端存储ES,为了简化,我这里直接使用Deployment部署了。

(1)、定义configMap配置清单(logstash-config.yaml)

  1. apiVersion: v1

  2. kind: ConfigMap

  3. metadata:

  4. name: logstash-k8s-config

  5. namespace: kube-ops

  6. data:

  7. containers.conf: |

  8. input {

  9. kafka {

  10. codec => "json"

  11. topics => ["test"]

  12. bootstrap_servers => ["kafka-0.kafka-svc.kube-ops:9092, kafka-1.kafka-svc.kube-ops:9092, kafka-2.kafka-svc.kube-ops:9092"]

  13. group_id => "logstash-g1"

  14. }

  15. }

  16. output {

  17. elasticsearch {

  18. hosts => ["es-cluster-0.elasticsearch.kube-ops:9200", "es-cluster-1.elasticsearch.kube-ops:9200", "es-cluster-2.elasticsearch.kube-ops:9200"]

  19. index => "logstash-%{+YYYY.MM.dd}"

  20. }

  21. }

(2)、定义Deployment配置清单(logstash.yaml)

  1. kind: Deployment

  2. metadata:

  3. name: logstash

  4. namespace: kube-ops

  5. spec:

  6. replicas: 1

  7. selector:

  8. matchLabels:

  9. app: logstash

  10. template:

  11. metadata:

  12. labels:

  13. app: logstash

  14. spec:

  15. containers:

  16. - name: logstash

  17. image: registry.cn-hangzhou.aliyuncs.com/rookieops/logstash-kubernetes:7.1.1

  18. volumeMounts:

  19. - name: config

  20. mountPath: /opt/logstash/config/containers.conf

  21. subPath: containers.conf

  22. command:

  23. - "/bin/sh"

  24. - "-c"

  25. - "/opt/logstash/bin/logstash -f opt/logstash/config/containers.conf"

  26. volumes:

  27. - name: config

  28. configMap:

  29. name: logstash-k8s-config

然后生成配置:

  1. # kubectl apply -f logstash-config.yaml

  2. # kubectl apply -f logstash.yaml

然后观察状态,查看日志:

  1. # kubectl get pod -n kube-ops

  2. NAME READY STATUS RESTARTS AGE

  3. dingtalk-hook-856c5dbbc9-srcm6 1/1 Running 0 3d20h

  4. es-cluster-0 1/1 Running 0 22m

  5. es-cluster-1 1/1 Running 0 22m

  6. es-cluster-2 1/1 Running 0 22m

  7. kafka-0 1/1 Running 0 3h6m

  8. kafka-1 1/1 Running 0 3h6m

  9. kafka-2 1/1 Running 0 3h6m

  10. kibana-7fc9f8c964-dqr68 1/1 Running 0 5d2h

  11. logstash-678c945764-lkl2n 1/1 Running 0 10m

  12. zk-0 1/1 Running 0 3d21h

  13. zk-1 1/1 Running 0 3d21h

  14. zk-2 1/1 Running 0 3d21h

部署Fluentd

Fluentd 是一个高效的日志聚合器,是用 Ruby 编写的,并且可以很好地扩展。对于大部分企业来说,Fluentd 足够高效并且消耗的资源相对较少,另外一个工具Fluent-bit更轻量级,占用资源更少,但是插件相对 Fluentd 来说不够丰富,所以整体来说,Fluentd 更加成熟,使用更加广泛,所以我们这里也同样使用 Fluentd 来作为日志收集工具。

(1)、安装fluent-plugin-kafka插件

我这里的安装步骤是先起一个fluentd容器,然后安装插件,最后commit一下,再推送到仓库。具体步骤如下:

a、先用docker起一个容器

  1. # docker run -it registry.cn-hangzhou.aliyuncs.com/rookieops/fluentd-elasticsearch:v2.0.4 bin/bash

  2. $ gem install fluent-plugin-kafka --no-document

b、退出容器,重新commit 一下:

  1. # docker commit c29b250d8df9 registry.cn-hangzhou.aliyuncs.com/rookieops/fluentd-elasticsearch:v2.0.4

c、将安装了插件的镜像推向仓库:

  1. # docker push registry.cn-hangzhou.aliyuncs.com/rookieops/fluentd-elasticsearch:v2.0.4

(2)、定义Fluentd的configMap(fluentd-config.yaml)

  1. kind: ConfigMap

  2. apiVersion: v1

  3. metadata:

  4. name: fluentd-config

  5. namespace: kube-ops

  6. labels:

  7. addonmanager.kubernetes.io/mode: Reconcile

  8. data:

  9. system.conf: |-

  10. <system>

  11. root_dir /tmp/fluentd-buffers/

  12. </system>

  13. containers.input.conf: |-

  14. <source>

  15. @id fluentd-containers.log

  16. @type tail

  17. path /var/log/containers/*.log

  18. pos_file /var/log/es-containers.log.pos

  19. time_format %Y-%m-%dT%H:%M:%S.%NZ

  20. localtime

  21. tag raw.kubernetes.*

  22. format json

  23. read_from_head true

  24. </source>

  25. # Detect exceptions in the log output and forward them as one log entry.

  26. <match raw.kubernetes.**>

  27. @id raw.kubernetes

  28. @type detect_exceptions

  29. remove_tag_prefix raw

  30. message log

  31. stream stream

  32. multiline_flush_interval 5

  33. max_bytes 500000

  34. max_lines 1000

  35. </match>

  36. system.input.conf: |-

  37. # Logs from systemd-journal for interesting services.

  38. <source>

  39. @id journald-docker

  40. @type systemd

  41. filters [{ "_SYSTEMD_UNIT": "docker.service" }]

  42. <storage>

  43. @type local

  44. persistent true

  45. </storage>

  46. read_from_head true

  47. tag docker

  48. </source>

  49. <source>

  50. @id journald-kubelet

  51. @type systemd

  52. filters [{ "_SYSTEMD_UNIT": "kubelet.service" }]

  53. <storage>

  54. @type local

  55. persistent true

  56. </storage>

  57. read_from_head true

  58. tag kubelet

  59. </source>

  60. forward.input.conf: |-

  61. # Takes the messages sent over TCP

  62. <source>

  63. @type forward

  64. </source>

  65. output.conf: |-

  66. # Enriches records with Kubernetes metadata

  67. <filter kubernetes.**>

  68. @type kubernetes_metadata

  69. </filter>

  70. <match **>

  71. @id kafka

  72. @type kafka2

  73. @log_level info

  74. include_tag_key true

  75. brokers kafka-0.kafka-svc.kube-ops:9092,kafka-1.kafka-svc.kube-ops:9092,kafka-2.kafka-svc.kube-ops:9092

  76. logstash_format true

  77. request_timeout 30s

  78. <buffer>

  79. @type file

  80. path /var/log/fluentd-buffers/kubernetes.system.buffer

  81. flush_mode interval

  82. retry_type exponential_backoff

  83. flush_thread_count 2

  84. flush_interval 5s

  85. retry_forever

  86. retry_max_interval 30

  87. chunk_limit_size 2M

  88. queue_limit_length 8

  89. overflow_action block

  90. </buffer>

  91. # data type settings

  92. <format>

  93. @type json

  94. </format>

  95. # topic settings

  96. topic_key topic

  97. default_topic test

  98. # producer settings

  99. required_acks -1

  100. compression_codec gzip

  101. </match>

(3)、定义DeamonSet配置清单(fluentd-daemonset.yaml)

  1. apiVersion: v1

  2. kind: ServiceAccount

  3. metadata:

  4. name: fluentd-es

  5. namespace: kube-ops

  6. labels:

  7. k8s-app: fluentd-es

  8. kubernetes.io/cluster-service: "true"

  9. addonmanager.kubernetes.io/mode: Reconcile

  10. ---

  11. kind: ClusterRole

  12. apiVersion: rbac.authorization.k8s.io/v1

  13. metadata:

  14. name: fluentd-es

  15. labels:

  16. k8s-app: fluentd-es

  17. kubernetes.io/cluster-service: "true"

  18. addonmanager.kubernetes.io/mode: Reconcile

  19. rules:

  20. - apiGroups:

  21. - ""

  22. resources:

  23. - "namespaces"

  24. - "pods"

  25. verbs:

  26. - "get"

  27. - "watch"

  28. - "list"

  29. ---

  30. kind: ClusterRoleBinding

  31. apiVersion: rbac.authorization.k8s.io/v1

  32. metadata:

  33. name: fluentd-es

  34. labels:

  35. k8s-app: fluentd-es

  36. kubernetes.io/cluster-service: "true"

  37. addonmanager.kubernetes.io/mode: Reconcile

  38. subjects:

  39. - kind: ServiceAccount

  40. name: fluentd-es

  41. namespace: kube-ops

  42. apiGroup: ""

  43. roleRef:

  44. kind: ClusterRole

  45. name: fluentd-es

  46. apiGroup: ""

  47. ---

  48. apiVersion: apps/v1

  49. kind: DaemonSet

  50. metadata:

  51. name: fluentd-es

  52. namespace: kube-ops

  53. labels:

  54. k8s-app: fluentd-es

  55. version: v2.0.4

  56. kubernetes.io/cluster-service: "true"

  57. addonmanager.kubernetes.io/mode: Reconcile

  58. spec:

  59. selector:

  60. matchLabels:

  61. k8s-app: fluentd-es

  62. version: v2.0.4

  63. template:

  64. metadata:

  65. labels:

  66. k8s-app: fluentd-es

  67. kubernetes.io/cluster-service: "true"

  68. version: v2.0.4

  69. # This annotation ensures that fluentd does not get evicted if the node

  70. # supports critical pod annotation based priority scheme.

  71. # Note that this does not guarantee admission on the nodes (#40573).

  72. annotations:

  73. scheduler.alpha.kubernetes.io/critical-pod: ''

  74. spec:

  75. serviceAccountName: fluentd-es

  76. containers:

  77. - name: fluentd-es

  78. image: registry.cn-hangzhou.aliyuncs.com/rookieops/fluentd-elasticsearch:v2.0.4

  79. command:

  80. - "/bin/sh"

  81. - "-c"

  82. - "/run.sh $FLUENTD_ARGS"

  83. env:

  84. - name: FLUENTD_ARGS

  85. value: --no-supervisor -q

  86. resources:

  87. limits:

  88. memory: 500Mi

  89. requests:

  90. cpu: 100m

  91. memory: 200Mi

  92. volumeMounts:

  93. - name: varlog

  94. mountPath: /var/log

  95. - name: varlibdockercontainers

  96. mountPath: /var/lib/docker/containers

  97. readOnly: true

  98. - name: config-volume

  99. mountPath: /etc/fluent/config.d

  100. nodeSelector:

  101. beta.kubernetes.io/fluentd-ds-ready: "true"

  102. tolerations:

  103. - key: node-role.kubernetes.io/master

  104. operator: Exists

  105. effect: NoSchedule

  106. terminationGracePeriodSeconds: 30

  107. volumes:

  108. - name: varlog

  109. hostPath:

  110. path: /var/log

  111. - name: varlibdockercontainers

  112. hostPath:

  113. path: /var/lib/docker/containers

  114. - name: config-volume

  115. configMap:

  116. name: fluentd-config

创建配置清单:

  1. # kubectl apply -f fluentd-daemonset.yaml

  2. # kubectl apply -f fluentd-config.yaml

  3. # kubectl get pod -n kube-ops

  4. NAME READY STATUS RESTARTS AGE

  5. dingtalk-hook-856c5dbbc9-srcm6 1/1 Running 0 3d21h

  6. es-cluster-0 1/1 Running 0 112m

  7. es-cluster-1 1/1 Running 0 112m

  8. es-cluster-2 1/1 Running 0 112m

  9. fluentd-es-jvhqv 1/1 Running 0 4h29m

  10. fluentd-es-s7v6m 1/1 Running 0 4h29m

  11. kafka-0 1/1 Running 0 4h36m

  12. kafka-1 1/1 Running 0 4h36m

  13. kafka-2 1/1 Running 0 4h36m

  14. kibana-7fc9f8c964-dqr68 1/1 Running 0 5d4h

  15. logstash-678c945764-lkl2n 1/1 Running 0 100m

  16. zk-0 1/1 Running 0 3d23h

  17. zk-1 1/1 Running 0 3d23h

  18. zk-2 1/1 Running 0 3d23h

至此,整套流程搭建完了,然后我们进入一台kafka容器,我们查看consumer消息,如下:

然后进入kibana,先创建索引,由于我们在logstash的配置文件中定义了索引为logstash-%{+YYYY.MM.dd},所以我们创建索引如下:


创建成功后如下:

然后我们查看日志信息,如下:

到此,整个日志收集系统搭建完成。

END
往期精彩回顾
python之反射
python之装饰器
python之生成器
菜鸟运维 运维人的进击之路
长按,识别二维码,加关注

听说转发文章

会给你带来好运


文章转载自乔克的好奇心,如果涉嫌侵权,请发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论