在Prometheus系列4 - 高可用集群thanos一文中向大家介绍了基于 Prometheus 的高可用集群方案thanos。大家对thanos的架构有着一定的了解后,这一节开始将深入讲解每个组件的作用,以及其启动参数的含义讲解。同时,也提供一套在k8s中运行的yaml文件模板。
本小节将会带来store、receive两个组件的讲解。
Minio
thanos可使用对象存储将指标数据做持久化存储,我们先部署一个minio服务做对象存储。部署方式可参考 https://docs.min.io/docs/minio-client-quickstart-guide,在这里就不再讲解。
而在thanos中与对象存储直接交互的组件有:
sidecar - 将prometheus采集的指标写入对象存储;
receive - 将从prometheus上报的数据写入对象存储;
store - query通过store在对象存储中查询指标数据;
compact - 将对象存储里的指标数据压缩处理;
rule - 将新生成的指标数据存储至对象存储。
这些组件通过读取定义的存储文件配置访问Minio,该存储文件的示例如下:
// bucket_config.yamltype: s3config:bucket: thanosendpoint: 10.6.110.11:9000access_key: adminsecret_key: 12345678 #minio的密码8位以上insecure: true
而若部署在k8s中,我们可以定义一个该配置文件的secret,其他组件读取该secret即可:
apiVersion: v1kind: Secretmetadata:name: thanos-objectstoragenamespace: thanostype: OpaquestringData:objectstorage.yaml: |type: s3config:bucket: thanosendpoint: 10.6.110.11:9000access_key: adminsecret_key: 12345678insecure: true
Store
store是供query从对象存储中查询历史指标数据的一个组件。store通过上述定义的bucket_config.yaml配置连接至对象存储。
其启动参数如下:
store--data-dir=/var/thanos/store--grpc-address=0.0.0.0:10901--http-address=0.0.0.0:10902--objstore.config-file=/etc/thanos/objectstorage.yaml
store - 以store组件运行
--data-dir - 指定缓存文件的目录
--grpc-address - 指定grpc服务的启动端口
--http-address - 指定http服务的启动端口
--objstore.config-file - 指定对象存储配置文件路径
在k8s中可定义StoregeClass,使用动态绑定机制生成PVC作为缓存文件的目录:
apiVersion: storage.k8s.io/v1kind: StorageClassmetadata:name: thanos-data-dbprovisioner: fuseim.pri/ifsparameters:archiveOnDelete: "false"
以Statefulset控制器运行store副本:
apiVersion: apps/v1kind: StatefulSetmetadata:name: thanos-storenamespace: thanoslabels:app.kubernetes.io/name: thanos-storespec:replicas: 2selector:matchLabels:app.kubernetes.io/name: thanos-storeserviceName: thanos-storepodManagementPolicy: Paralleltemplate:metadata:labels:app.kubernetes.io/name: thanos-storespec:containers:- args:- store- --log.level=debug- --data-dir=/var/thanos/store- --grpc-address=0.0.0.0:10901- --http-address=0.0.0.0:10902- --objstore.config-file=/etc/thanos/objectstorage.yaml#- --experimental.enable-index-headerimage: registry-dev.uihcloud.cn/library/thanos/thanos:v0.21.1livenessProbe:failureThreshold: 8httpGet:path: /-/healthyport: 10902scheme: HTTPperiodSeconds: 30name: thanos-storeports:- containerPort: 10901name: grpc- containerPort: 10902name: httpreadinessProbe:failureThreshold: 20httpGet:path: /-/readyport: 10902scheme: HTTPperiodSeconds: 5terminationMessagePolicy: FallbackToLogsOnErrorvolumeMounts:- mountPath: var/thanos/storename: datareadOnly: false- name: thanos-objectstoragesubPath: objectstorage.yamlmountPath: /etc/thanos/objectstorage.yamlterminationGracePeriodSeconds: 120volumes:- name: thanos-objectstoragesecret:secretName: thanos-objectstoragevolumeClaimTemplates:- metadata:labels:app.kubernetes.io/name: thanos-storename: dataspec:storageClassName: thanos-data-dbaccessModes:- ReadWriteOnceresources:requests:storage: 20Gi
同时定义一个service,供query在集群内可以访问到store:
apiVersion: v1kind: Servicemetadata:name: thanos-storenamespace: thanoslabels:app.kubernetes.io/name: thanos-storespec:clusterIP: Noneports:- name: grpcport: 10901targetPort: 10901- name: httpport: 10902targetPort: 10902selector:app.kubernetes.io/name: thanos-store
Receive
在理解Receive工作机制之前,我们需要先了解以remote_write、租户这两个概念。
remote_write
prometheus通过remote write机制,将采集到的指标数据以hook机制发送出去,在prometheus的配置文件中增加配置指定hook地址。而这里的hook地址正是receive提供的http接口。
remote_write:- url: http://10.6.118.123:32291/api/v1/receive
租户
receive会集成许多个prometheus(集群)上传上来的指标,每一个prometheus(集群)认为就是一个租户。
由于receive将会收集多个租户的指标数据,那么receive必然是需要支持可集群扩展的。在定义receive集群中,集群的hash配置文件起着关键的作用。我们来结合一个hash文件的样例来了解该配置文件:
[{"hashring":"default","endpoints":["thanos-receive-0.thanos-receive.thanos.svc.cluster.local:10901","thanos-receive-1.thanos-receive.thanos.svc.cluster.local:10901"]},{"hashring":"hashring-0","endpoints":["thanos-receive-2.thanos-receive.thanos.svc.cluster.local:10901"],"tenants":["tenant-a"]},{"hashring":"hashring-1","endpoints":["thanos-receive-3.thanos-receive.thanos.svc.cluster.local:10901"],"tenants":["tenant-b"]}]
该json文件指出,receive集群一共运行了4个副本:thanos-receive-0、thanos-receive-1、thanos-receive-2、thanos-receive-3;同时指定租户tenant-a通过 thanos-receive-2收集指标,tenant-b通过 thanos-receive-3收集指标,其他的租户通过thanos-receive-0、thanos-receive-1收集指标。
在启动receive之前,可以将该配置通过configMap设置到kubernetes中。
apiVersion: v1kind: ConfigMapmetadata:name: thanos-receive-hashringsnamespace: thanosdata:thanos-receive-hashrings.json: |[{"hashring":"default","endpoints":["thanos-receive-0.thanos-receive.thanos.svc.cluster.local:10901","thanos-receive-1.thanos-receive.thanos.svc.cluster.local:10901"]},{"hashring":"hashring-0","endpoints":["thanos-receive-2.thanos-receive.thanos.svc.cluster.local:10901"],"tenants":["tenant-a"]},{"hashring":"hashring-1","endpoints":["thanos-receive-3.thanos-receive.thanos.svc.cluster.local:10901"],"tenants":["tenant-b"]}]
我们来看下启动receive需要指定的参数:
receive--receive.replication-factor=1--grpc-address=0.0.0.0:10901--http-address=0.0.0.0:10902--remote-write.address=0.0.0.0:19291--objstore.config-file=/etc/thanos/objectstorage.yaml--tsdb.path=/var/thanos/receive--tsdb.retention=12h--label=receive_replica="$(NAME)"--label=receive="true"--receive.hashrings-file=/etc/thanos/thanos-receive-hashrings.json--receive.local-endpoint="$(NAME).thanos-receive.thanos.svc.cluster.local:10901"
各参数的含义:
receive - 以receive组件运行
--receive.replication-factor - 采集到的指标备份的数量,若配置大于1则会在多个receive实例中存储相同的一份指标数据
--grpc-address - grpc服务的端口
--http-address - http服务的端口
--remote-write.address - remote_write的接口端口
--objstore.config-file - 对象存储配置文件路径
--tsdb.path - 临时文件暂存路径
--tsdb.retention - 多长时间清理一次临时文件
--label=receive_replica - 当前副本处理的数据需要增加的label
--receive.hashrings-file - 集群配置文件的路径
--receive.local-endpoint- 当前副本在集群配置文件中的地址,在集群文件中解析成当前集群。
同样的,我们使用StatefulSet控制器运行receiver副本。
apiVersion: apps/v1kind: StatefulSetmetadata:labels:app: thanos-receivetenant: default-tenantcontroller.receive.thanos.io: thanos-receive-controllercontroller.receive.thanos.io/hashring: defaultpart-of: thanosname: thanos-receivenamespace: thanosspec:replicas: 4selector:matchLabels:app: thanos-receivetenant: default-tenantcontroller.receive.thanos.io: thanos-receive-controllercontroller.receive.thanos.io/hashring: defaultpart-of: thanosserviceName: thanos-receivetemplate:metadata:labels:app: thanos-receivetenant: default-tenantcontroller.receive.thanos.io: thanos-receive-controllercontroller.receive.thanos.io/hashring: defaultpart-of: thanosspec:affinity: {}containers:- args:- receive- --receive.replication-factor=1- --objstore.config=$(OBJSTORE_CONFIG)- --tsdb.path=/var/thanos/receive- --label=receive_replica="$(NAME)"- --receive.local-endpoint=$(NAME).thanos-receive.$(NAMESPACE).svc.cluster.local:10901- --tsdb.retention=15d- --receive.hashrings-file=/etc/thanos/thanos-receive-hashrings.jsonenv:- name: NAMEvalueFrom:fieldRef:fieldPath: metadata.name- name: NAMESPACEvalueFrom:fieldRef:fieldPath: metadata.namespace- name: OBJSTORE_CONFIGvalueFrom:secretKeyRef:key: objectstorage.yamlname: thanos-objectstorageimage: registry-dev.uihcloud.cn/library/thanos/thanos:v0.22.0livenessProbe:failureThreshold: 8httpGet:path: /-/healthyport: 10902scheme: HTTPperiodSeconds: 30name: thanos-receiveports:- containerPort: 10901name: grpc- containerPort: 10902name: http- containerPort: 19291name: remote-writereadinessProbe:failureThreshold: 20httpGet:path: /-/readyport: 10902scheme: HTTPperiodSeconds: 5terminationMessagePolicy: FallbackToLogsOnErrorvolumeMounts:- mountPath: var/thanos/receivename: datareadOnly: false- mountPath: etc/thanos/thanos-receive-hashrings.jsonname: thanos-receive-hashringssubPath: thanos-receive-hashrings.jsonterminationGracePeriodSeconds: 900volumeClaimTemplates:- metadata:labels:app.kubernetes.io/name: thanos-receivename: dataspec:storageClassName: thanos-receiver-data-dbaccessModes:- ReadWriteOnceresources:requests:storage: 100Gi
定义service供集群内访问:
apiVersion: v1kind: Servicemetadata:labels:app: thanos-receivetenant: default-tenantcontroller.receive.thanos.io/hashring: defaultpart-of: thanosname: thanos-receivenamespace: thanosspec:clusterIP: Noneports:- name: grpcport: 10901protocol: TCPtargetPort: 10901- name: httpport: 10902protocol: TCPtargetPort: 10902- name: remote-writeport: 19291targetPort: 19291protocol: TCPselector:app: thanos-receivetenant: default-tenantcontroller.receive.thanos.io: thanos-receive-controllercontroller.receive.thanos.io/hashring: defaultpart-of: thanos
同时,如果有需要在集群外访问(或许receive的上游prometheus不在一个集群内,设置在不同的局域网内),定义receive的供集群外部访问的端口:
apiVersion: v1kind: Servicemetadata:labels:app: thanos-receivetenant: default-tenantcontroller.receive.thanos.io/hashring: defaultpart-of: thanosname: thanos-receive-nodenamespace: thanosspec:type: NodePortports:- name: grpcport: 10901protocol: TCPtargetPort: 10901- name: httpport: 10902protocol: TCPtargetPort: 10902- name: remote-writeport: 19291targetPort: 19291protocol: TCPnodePort: 32291selector:app: thanos-receivetenant: default-tenantcontroller.receive.thanos.io: thanos-receive-controllercontroller.receive.thanos.io/hashring: defaultpart-of: thanos
这样就提供了一个可供query访问、也可供集群外部访问的receive组件。
receive组件的负载是在服务内部,自己处理的。当prometheus上传指标时,通过service任意访问到某个副本。该副本根据携带的租户信息判断是否是该当前副本处理,如果不是则会根据hash.json文件的定义将数据转发给对应的副本进行处理。




