暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

prometheus监控之blackbox url探测

云时代IT运维 2021-02-02
2517

1. 使用prometheus监控站点异常:

  在本篇博文中,主要讲解如下几个知识点和实践经验,供大家参考:
  1. 安装blackbox-exporter软件:

  2. 配置prometheus监控站点:

  3. 导入grafana站点监控Dashboard:

  4. 配置blackbox黑盒监控告警策略:

1. 安装blackbox-exporter软件:

  我们目前使用prometheus已经实现了K8S容器的监控,宿主机的监控、ES集群监控、kafka集群监控、nacos集群监控、mq集群监控、redis集群监控等。但是还有一个监控的需求没有实现,就是关于URL探测。如何通过prometheus实现URL探测,当我们的网站无法访问时,可以第一时间发送告警;配置如下:

# 第一步:需要在任意一台服务器上面安装blackbox-exporter
我选择在test环境logstash这台机器上面安装


cd /usr/local/
wget https://github.com/prometheus/blackbox_exporter/releases/download/v0.18.0/blackbox_exporter-0.18.0.linux-amd64.tar.gz
tar xzvf blackbox_exporter-0.18.0.linux-amd64.tar.gz
ln -sv blackbox_exporter-0.18.0.linux-amd64/ blackbox_exporter


# 第二步:配置blackbox-exporter服务启动文件


vim /usr/lib/systemd/system/blackbox_exporter.service


[Unit]
Description=blackbox_exporter
After=network.target


[Service]
User=root
Type=simple
ExecStart=/usr/local/blackbox_exporter/blackbox_exporter --config.file=/usr/local/blackbox_exporter/blackbox.yml
Restart=on-failure


[Install]
WantedBy=multi-user.target


systemctl daemon-reload
systemctl start blackbox_exporter.service
systemctl enable blackbox_exporter.service


# 查看服务是否启动:
tcp 0 0 0.0.0.0:9115 0.0.0.0:* LISTEN 28239/blackbox_expo


# 如果logstash这台机器和K8S集群没有在一个安全组的话
还需要在logstash安全组里面开放9115端口才可以

  安装完成blackbox-exporter之后,还需要修改blackbox.yml配置文件,配置的内容如下:

modules:
http_2xx:
prober: http
http_post_2xx:
prober: http
http:
method: POST
http_ingore_ca: #这里增加了一个忽略ssl证书的url探测模块
prober: http
http:
method: GET
preferred_ip_protocol: "ip4"
tls_config:
insecure_skip_verify: true
tcp_connect:
prober: tcp
pop3s_banner:
prober: tcp
tcp:
query_response:
- expect: "^+OK"
tls: true
tls_config:
insecure_skip_verify: false
ssh_banner:
prober: tcp
tcp:
query_response:
- expect: "^SSH-2.0-"
irc_banner:
prober: tcp
tcp:
query_response:
- send: "NICK prober"
- send: "USER prober prober prober :prober"
- expect: "PING :([^ ]+)"
send: "PONG ${1}"
- expect: "^:[^ ]+ 001"
icmp:
prober: icmp
icmp_ipv4: #这里增加了一个icmp_ipv4模块
prober: icmp
icmp:
preferred_ip_protocol: "ip4"

修改默认的blackbox.yml的目的是在配置prometheus监控站点的时候,我们网站的证书是自签名证书,如果不增加忽略证书的模块,使用默认的http_2xx模块,就会发现网站是down状态;

2. 配置prometheus监控站点:

  安装完成blackbox-exporter软件之后,还需要配置prometheus自定义资源,把我们需要监控的站点添加进去,配置如下:

# 第一步:需要在monitoring命名空间下面创建一个secret  


vim prometheus-additional.yaml


- job_name: 'blackbox_http_2xx'
scrape_interval: 45s
metrics_path: /probe
params:
module: [http_2xx]
static_configs:
- targets:
- https://www.testcloud.com
- https://web.testcloud.com
- https://zuul.testcloud.com/swagger-ui.html
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: 192.168.142.226:9115


- job_name: 'blackbox_http_2xx_insecuressl'
scrape_interval: 45s
metrics_path: /probe
params:
module: [http_ingore_ca]
static_configs:
- targets:
- https://bssadmin.public.testcloud.net
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: 192.168.142.226:9115


- job_name: 'blackbox_icmp'
scrape_interval: 45s
metrics_path: /probe
params:
module: [icmp_ipv4]
static_configs:
- targets:
- 192.168.142.201
labels:
instance: 'testprod-k8smaster01'
- targets:
- 192.168.142.235
labels:
instance: 'testprod-k8smaster02'
- targets:
- 192.168.142.8
labels:
instance: 'testprod-k8smaster03'
- targets:
- 192.168.142.10
labels:
instance: 'testprod-k8snode01'
- targets:
- 192.168.142.16
labels:
instance: 'testprod-k8snode02'


relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: 192.168.142.226:9115


- job_name: 'blackbox_tcp'
metrics_path: /probe
params:
module: [tcp_connect]
static_configs:
- targets:
- 192.168.142.74:8848
- 192.168.142.7:8848
- 192.168.142.39:8848
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: 192.168.142.226:9115


kubectl create secret generic additional-configs --from-file=prometheus-additional.yaml -n monitoring


# 第二步,创建prometheus自定义资源


vim prometheus-prometheus.yaml


apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
labels:
prometheus: k8s
name: k8s
namespace: monitoring
spec:
additionalScrapeConfigs: #重点是添加了这三行,表示额外的配置,调用secret
key: prometheus-additional.yaml
name: additional-configs
alerting:
alertmanagers:
- name: alertmanager-main
namespace: monitoring
port: web
baseImage: quay.io/prometheus/prometheus
nodeSelector:
kubernetes.io/os: linux
replicas: 2
resources:
requests:
memory: 400Mi
retention: 168h
ruleSelector:
matchLabels:
prometheus: k8s
role: alert-rules
secrets:
- etcd-certs
securityContext:
fsGroup: 2000
runAsNonRoot: true
runAsUser: 1000
serviceAccountName: prometheus-k8s
serviceMonitorNamespaceSelector: {}
serviceMonitorSelector: {}
storage:
volumeClaimTemplate:
spec:
resources:
requests:
storage: 40Gi
storageClassName: nfs
version: v2.11.0


kubectl delete prometheus -n monitoring k8s
kubectl apply -f prometheus-prometheus.yaml




部署完成之后,可以在prometheus的config页面是否有blackbox的配置,已经target里面是否有blackbox 我们此处的配置主要包含了几个方面的监控。一个是url探测,包含clientui、zuul等域名,还有一个adminui因为是自签名证书,所以单独配置了一个job。然后就是ICMP ping监控,监控所有的ECS,出现down机第一时间发邮件;然后就是TCP端口监控,监控数据库、redis、MQ、NACOS等常见服务端口,如果端口不通的话第一时间发邮件告警;

登录prometheus config配置页面,查看是否有新增的blackbox job

登录prometheus target页面,查看是否有新增的站点监控target;

3 . 导入grafana站点监控Dashboard:

   如果你的grafana服务器是可以访问互联网的,导入dashboard的方法非常简单,输入7587 ID号就可以了;

登录grafana查看的最终效果

4. 配置blackbox黑盒监控告警策略:

  部署完成blackbox黑盒监控之后,就可以在grafana的Dashboard上面看到我们的站点是什么情况了,包括我们需要探测的URL网站,返回的状态码是不是200,ssl证书是否快要过期。我们的K8S服务器等是否是通的,我们的nacos连接端口是不是正常的等。但是Dashboard只能是看的,不能告警。我们还需要配置告警策略,当证书即将过期的时候,比如还有7天证书过期,就发邮件给运维人员;设置的告警策略如下:

vim  blackboxrules.yaml




apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
labels:
prometheus: k8s
role: alert-rules
name: blackbox-rules
namespace: monitoring
spec:
groups:
- name: blackbox_url_stats alarm
rules:
- alert: blackbox_url_stats Unable to connect
expr: probe_success == 0
for: 1m
labels:
severity: disaster
annotations:
summary: "Interface/host/port {{ $labels.instance }} Unable to connect"
description: "Please check it as soon as possible"


- alert: SSL Certificate expiration warning
expr: (probe_ssl_earliest_cert_expiry - time())/86400 <7
for: 1h
labels:
severity: disaster
annotations:
summary: 'The certificate of domain name {{$labels.instance}}expire in{{ printf "%.1f" $value }}days,Please update the certificate as soon as possible'
description: "SSL certificate expiration warning"


kubectl apply -f blackboxrules.yaml


部署完成告警策略文件之后,可以登录prometheus的web页面查看是否已经有了告警策略文件;


文章转载自云时代IT运维,如果涉嫌侵权,请发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论