暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

Prometheus系列8 - thanos组件详解之Sidecar & Ruler

栋总侃技术 2021-09-13
4783

本节将带来Thanos的最后两个组件Sidecar、Ruler的介绍。

Sidecar

Sidecar是与Prometheus部署在一起的一个组件,Sidecar获取prometheus的数据供query查询,同时也将历史数据存储至对象存储。

Sidecar通过实现Prometheus的Remote Read功能,实现能够接收Prometheus的指标数据,同时每两小时会将数据同步到对象存储。

当Sidecar如果出现异常崩溃,若这两小时内的数据没有做持久化存储,那么也将会丢失这部分数据。所以使用容器部署时,将缓存目录挂载出来避免容器重启导致数据丢失。

我们先看下sidecar的常规启动参数:

sidecar--tsdb.path=/prometheus--prometheus.url=http://127.0.0.1:9090--objstore.config-file=/etc/thanos/objectstorage.yaml--web.enable-lifecycle--reloader.config-file=/etc/prometheus/config/prometheus.yaml.tmpl--reloader.config-envsubst-file=/etc/prometheus/config_out/prometheus.yaml--reloader.rule-dir=/etc/prometheus/rules/

  • sidecar - 以sidecar组件运行

  • tsdb.path - sidecar维护的时序数据库路径(缓存文件)

  • prometheus.url - 连接的prometheus地址

  • objectstore.config - 对象存储连接文件

  • web.enable-lifecyle - 开启热加载模式,会监听以reloader.*指定的文件,若文件发生变化则通知prometheus重新加载配置(调用/-/reload接口)

  • reloader.config-file - prometheus的配置文件

  • reloader.config-envsubst-file 输出的环境变量文件

  • reloader.rule-dir - prometheus的rule配置文件

我们来看一个集成至prometheus的sidecar实际的例子,一般prometheus与sidecar在一个pod内:

apiVersion: apps/v1kind: StatefulSetmetadata:  name: prometheus  namespace: thanos  labels:    app.kubernetes.io/name: thanos-prometheus-sidecarspec:  serviceName: prometheus-headless  podManagementPolicy: Parallel  replicas: 2  selector:    matchLabels:      app.kubernetes.io/name: prometheus  template:    metadata:      labels:        app.kubernetes.io/name: prometheus    spec:      serviceAccountName: prometheus      securityContext:        fsGroup: 2000        runAsNonRoot: true        runAsUser: 1000      affinity:        podAntiAffinity:          requiredDuringSchedulingIgnoredDuringExecution:          - labelSelector:              matchExpressions:              - key: app.kubernetes.io/name                operatorIn                values:                - prometheus            topologyKey: kubernetes.io/hostname      containers:      - name: prometheus        image: quay.io/prometheus/prometheus:v2.15.2        args:        - --config.file=/etc/prometheus/config_out/prometheus.yaml        - --storage.tsdb.path=/prometheus        - --storage.tsdb.retention.time=10d        - --web.route-prefix=/        - --web.enable-lifecycle        - --storage.tsdb.no-lockfile        - --storage.tsdb.min-block-duration=2h        - --storage.tsdb.max-block-duration=2h        - --log.level=debug        ports:        - containerPort: 9090          name: web          protocol: TCP        livenessProbe:          failureThreshold: 6          httpGet:            path: /-/healthy            port: web            scheme: HTTP          periodSeconds: 5          successThreshold: 1          timeoutSeconds: 3        readinessProbe:          failureThreshold: 120          httpGet:            path: /-/ready            port: web            scheme: HTTP          periodSeconds: 5          successThreshold: 1          timeoutSeconds: 3        volumeMounts:        - mountPath: /etc/prometheus/config_out          name: prometheus-config-out          readOnly: true        - mountPath: /prometheus          name: prometheus-storage        - mountPath: /etc/prometheus/rules          name: prometheus-rules      - name: thanos        image: quay.io/thanos/thanos:v0.11.0        args:        - sidecar        - --tsdb.path=/prometheus        - --prometheus.url=http://127.0.0.1:9090        - --objstore.config-file=/etc/thanos/objectstorage.yaml    - --web.enable-lifecycle        - --reloader.config-file=/etc/prometheus/config/prometheus.yaml.tmpl        - --reloader.config-envsubst-file=/etc/prometheus/config_out/prometheus.yaml        - --reloader.rule-dir=/etc/prometheus/rules/        env:        - name: POD_NAME          valueFrom:            fieldRef:              fieldPath: metadata.name        ports:        - name: http-sidecar          containerPort: 10902        - name: grpc          containerPort: 10901        livenessProbe:            httpGet:              port: 10902              path: /-/healthy        readinessProbe:          httpGet:            port: 10902            path: /-/ready        volumeMounts:        - name: prometheus-config-tmpl          mountPath: /etc/prometheus/config        - name: prometheus-config-out          mountPath: /etc/prometheus/config_out        - name: prometheus-rules          mountPath: /etc/prometheus/rules        - name: prometheus-storage          mountPath: /prometheus        - name: thanos-objectstorage          subPath: objectstorage.yaml          mountPath: /etc/thanos/objectstorage.yaml      volumes:      - name: prometheus-config-tmpl        configMap:          defaultMode: 420          name: prometheus-config-tmpl      - name: prometheus-config-out        emptyDir: {}      - name: prometheus-rules        configMap:          name: prometheus-rules      - name: thanos-objectstorage        secret:          secretName: thanos-objectstorage  volumeClaimTemplates:  - metadata:      name: prometheus-storage      labels:        app.kubernetes.io/name: thanos-store    spec:      storageClassName: thanos-data-db      accessModes:      - ReadWriteOnce      resources:        requests:          storage: 20Gi

Ruler

Ruler可以根据Prometheus采集的指标计算出新的指标供query查询且存储至对象存储。

新计算出来的指标可以减轻查询压力,例如一组指标是由若干个指标计算出来的结果,例如:

(node_filesystem_size_bytes{instance=~'$node',fstype=~"ext.*|xfs",mountpoint !~".*pod.*"} -node_filesystem_free_bytes{instance=~'$node',fstype=~"ext.*|xfs",mountpoint !~".*pod.*"}) *100/(node_filesystem_avail_bytes {instance=~'$node',fstype=~"ext.*|xfs",mountpoint !~".*pod.*"}+(node_filesystem_size_bytes{instance=~'$node',fstype=~"ext.*|xfs",mountpoint !~".*pod.*"}-node_filesystem_free_bytes{instance=~'$node',fstype=~"ext.*|xfs",mountpoint !~".*pod.*"}))

如果将其计算的结果存储为新的指标,那么query不用每次都去这四个指标进行计算了,只用取新计算出来的指标。同时计算出来的值也可以指定告警规则,提升ALertmanager的效率。

Ruler也是Thanos中相对独立的组件,其存在是否并不影响整体架构的主流功能,起到的是提高效率优化的作用。

Ruler的启动参数如下:

rule--grpc-address=0.0.0.0:10901--http-address=0.0.0.0:10902--rule-file=/etc/thanos/rules/*rules.yaml--objstore.config-file=/etc/thanos/objectstorage.yaml--data-dir=/var/thanos/rule--label=rule_replica="$(NAME)"--alert.label-drop="rule_replica"--query=dnssrv+_http._tcp.thanos-query.thanos.svc.cluster.local

  • rule-file - 告警规则路径(配置的为新计算出来的指标告警规则)

  • objectstore.config-file - 对象存储文件

  • data-dir - 缓存文件路径

  • label - 指定计算的指标label,避免多个rule计算出相同数据,可在query中指定去重 - label条件

  • alert.label-drop - 发送给alertmanager时需要丢弃的label

  • query指定query地址

在kubernetes中运行ruler的yalm模板如下:

apiVersion: v1kind: Servicemetadata:  labels:    app.kubernetes.io/name: thanos-rule  name: thanos-rule  namespace: thanosspec:  clusterIP: None  ports:  - name: grpc    port: 10901    targetPort: grpc  - name: http    port: 10902    targetPort: http  selector:    app.kubernetes.io/name: thanos-rule---apiVersion: apps/v1kind: StatefulSetmetadata:  labels:    app.kubernetes.io/name: thanos-rule  name: thanos-rule  namespace: thanosspec:  replicas: 2  selector:    matchLabels:      app.kubernetes.io/name: thanos-rule  serviceName: thanos-rule  podManagementPolicy: Parallel  template:    metadata:      labels:        app.kubernetes.io/name: thanos-rule    spec:      containers:      - args:        - rule        - --grpc-address=0.0.0.0:10901        - --http-address=0.0.0.0:10902        - --rule-file=/etc/thanos/rules/*rules.yaml        - --objstore.config-file=/etc/thanos/objectstorage.yaml        - --data-dir=/var/thanos/rule        - --label=rule_replica="$(NAME)"        - --alert.label-drop="rule_replica"        - --query=dnssrv+_http._tcp.thanos-query.thanos.svc.cluster.local        env:        - name: NAME          valueFrom:            fieldRef:              fieldPath: metadata.name        image: thanosio/thanos:v0.11.0        livenessProbe:          failureThreshold: 24          httpGet:            path: /-/healthy            port: 10902            scheme: HTTP          periodSeconds: 5        name: thanos-rule        ports:        - containerPort: 10901          name: grpc        - containerPort: 10902          name: http        readinessProbe:          failureThreshold: 18          httpGet:            path: /-/ready            port: 10902            scheme: HTTP          initialDelaySeconds: 10          periodSeconds: 5        terminationMessagePolicy: FallbackToLogsOnError        volumeMounts:        - mountPath: /var/thanos/rule          name: data          readOnly: false        - name: thanos-objectstorage          subPath: objectstorage.yaml          mountPath: /etc/thanos/objectstorage.yaml        - name: thanos-rules          mountPath: /etc/thanos/rules      volumes:      - name: thanos-objectstorage        secret:          secretName: thanos-objectstorage      - name: thanos-rules        configMap:          name: thanos-rules  volumeClaimTemplates:  - metadata:      labels:        app.kubernetes.io/name: thanos-rule      name: data    spec:      storageClassName: thanos-data-db      accessModes:      - ReadWriteOnce      resources:        requests:          storage: 20Gi

到这里,Thanos所有组件的介绍,以及对应运行方式就到这里高一段落了。 Thanos框架提供了一个不侵入Prometheus的高可用架构。熟练掌握每个组件,可以很好的在原有的Prometheus集群中扩展使用。

我在使用Receive模式采集Prometheus的指标数据时,发现remote_write接口还不支持接口鉴权,在公网下时无法使用该模式的。接下来我将尝试对receiver代码进行二次开发,使remote_write接口支持接口鉴权。后续文章中也会带来这部分的讲解。

文章转载自栋总侃技术,如果涉嫌侵权,请发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论