1. 使用prometheus监控站点异常:
在本篇博文中,主要讲解如下几个知识点和实践经验,供大家参考:
1. 安装blackbox-exporter软件:
2. 配置prometheus监控站点:
3. 导入grafana站点监控Dashboard:
4. 配置blackbox黑盒监控告警策略:
1. 安装blackbox-exporter软件:
我们目前使用prometheus已经实现了K8S容器的监控,宿主机的监控、ES集群监控、kafka集群监控、nacos集群监控、mq集群监控、redis集群监控等。但是还有一个监控的需求没有实现,就是关于URL探测。如何通过prometheus实现URL探测,当我们的网站无法访问时,可以第一时间发送告警;配置如下:
# 第一步:需要在任意一台服务器上面安装blackbox-exporter我选择在test环境logstash这台机器上面安装cd /usr/local/wget https://github.com/prometheus/blackbox_exporter/releases/download/v0.18.0/blackbox_exporter-0.18.0.linux-amd64.tar.gztar xzvf blackbox_exporter-0.18.0.linux-amd64.tar.gzln -sv blackbox_exporter-0.18.0.linux-amd64/ blackbox_exporter# 第二步:配置blackbox-exporter服务启动文件vim /usr/lib/systemd/system/blackbox_exporter.service[Unit]Description=blackbox_exporterAfter=network.target[Service]User=rootType=simpleExecStart=/usr/local/blackbox_exporter/blackbox_exporter --config.file=/usr/local/blackbox_exporter/blackbox.ymlRestart=on-failure[Install]WantedBy=multi-user.targetsystemctl daemon-reloadsystemctl start blackbox_exporter.servicesystemctl enable blackbox_exporter.service# 查看服务是否启动:tcp 0 0 0.0.0.0:9115 0.0.0.0:* LISTEN 28239/blackbox_expo# 如果logstash这台机器和K8S集群没有在一个安全组的话还需要在logstash安全组里面开放9115端口才可以
安装完成blackbox-exporter之后,还需要修改blackbox.yml配置文件,配置的内容如下:
modules:http_2xx:prober: httphttp_post_2xx:prober: httphttp:method: POSThttp_ingore_ca: #这里增加了一个忽略ssl证书的url探测模块prober: httphttp:method: GETpreferred_ip_protocol: "ip4"tls_config:insecure_skip_verify: truetcp_connect:prober: tcppop3s_banner:prober: tcptcp:query_response:- expect: "^+OK"tls: truetls_config:insecure_skip_verify: falsessh_banner:prober: tcptcp:query_response:- expect: "^SSH-2.0-"irc_banner:prober: tcptcp:query_response:- send: "NICK prober"- send: "USER prober prober prober :prober"- expect: "PING :([^ ]+)"send: "PONG ${1}"- expect: "^:[^ ]+ 001"icmp:prober: icmpicmp_ipv4: #这里增加了一个icmp_ipv4模块prober: icmpicmp:preferred_ip_protocol: "ip4"
修改默认的blackbox.yml的目的是在配置prometheus监控站点的时候,我们网站的证书是自签名证书,如果不增加忽略证书的模块,使用默认的http_2xx模块,就会发现网站是down状态;
2. 配置prometheus监控站点:
安装完成blackbox-exporter软件之后,还需要配置prometheus自定义资源,把我们需要监控的站点添加进去,配置如下:
# 第一步:需要在monitoring命名空间下面创建一个secretvim prometheus-additional.yaml- job_name: 'blackbox_http_2xx'scrape_interval: 45smetrics_path: /probeparams:module: [http_2xx]static_configs:- targets:- https://www.testcloud.com- https://web.testcloud.com- https://zuul.testcloud.com/swagger-ui.htmlrelabel_configs:- source_labels: [__address__]target_label: __param_target- source_labels: [__param_target]target_label: instance- target_label: __address__replacement: 192.168.142.226:9115- job_name: 'blackbox_http_2xx_insecuressl'scrape_interval: 45smetrics_path: /probeparams:module: [http_ingore_ca]static_configs:- targets:- https://bssadmin.public.testcloud.netrelabel_configs:- source_labels: [__address__]target_label: __param_target- source_labels: [__param_target]target_label: instance- target_label: __address__replacement: 192.168.142.226:9115- job_name: 'blackbox_icmp'scrape_interval: 45smetrics_path: /probeparams:module: [icmp_ipv4]static_configs:- targets:- 192.168.142.201labels:instance: 'testprod-k8smaster01'- targets:- 192.168.142.235labels:instance: 'testprod-k8smaster02'- targets:- 192.168.142.8labels:instance: 'testprod-k8smaster03'- targets:- 192.168.142.10labels:instance: 'testprod-k8snode01'- targets:- 192.168.142.16labels:instance: 'testprod-k8snode02'relabel_configs:- source_labels: [__address__]target_label: __param_target- source_labels: [__param_target]target_label: instance- target_label: __address__replacement: 192.168.142.226:9115- job_name: 'blackbox_tcp'metrics_path: /probeparams:module: [tcp_connect]static_configs:- targets:- 192.168.142.74:8848- 192.168.142.7:8848- 192.168.142.39:8848relabel_configs:- source_labels: [__address__]target_label: __param_target- source_labels: [__param_target]target_label: instance- target_label: __address__replacement: 192.168.142.226:9115kubectl create secret generic additional-configs --from-file=prometheus-additional.yaml -n monitoring# 第二步,创建prometheus自定义资源vim prometheus-prometheus.yamlapiVersion: monitoring.coreos.com/v1kind: Prometheusmetadata:labels:prometheus: k8sname: k8snamespace: monitoringspec:additionalScrapeConfigs: #重点是添加了这三行,表示额外的配置,调用secretkey: prometheus-additional.yamlname: additional-configsalerting:alertmanagers:- name: alertmanager-mainnamespace: monitoringport: webbaseImage: quay.io/prometheus/prometheusnodeSelector:kubernetes.io/os: linuxreplicas: 2resources:requests:memory: 400Miretention: 168hruleSelector:matchLabels:prometheus: k8srole: alert-rulessecrets:- etcd-certssecurityContext:fsGroup: 2000runAsNonRoot: truerunAsUser: 1000serviceAccountName: prometheus-k8sserviceMonitorNamespaceSelector: {}serviceMonitorSelector: {}storage:volumeClaimTemplate:spec:resources:requests:storage: 40GistorageClassName: nfsversion: v2.11.0kubectl delete prometheus -n monitoring k8skubectl apply -f prometheus-prometheus.yaml
部署完成之后,可以在prometheus的config页面是否有blackbox的配置,已经target里面是否有blackbox 我们此处的配置主要包含了几个方面的监控。一个是url探测,包含clientui、zuul等域名,还有一个adminui因为是自签名证书,所以单独配置了一个job。然后就是ICMP ping监控,监控所有的ECS,出现down机第一时间发邮件;然后就是TCP端口监控,监控数据库、redis、MQ、NACOS等常见服务端口,如果端口不通的话第一时间发邮件告警;
登录prometheus config配置页面,查看是否有新增的blackbox job
登录prometheus target页面,查看是否有新增的站点监控target;
3 . 导入grafana站点监控Dashboard:
如果你的grafana服务器是可以访问互联网的,导入dashboard的方法非常简单,输入7587 ID号就可以了;
登录grafana查看的最终效果
4. 配置blackbox黑盒监控告警策略:
部署完成blackbox黑盒监控之后,就可以在grafana的Dashboard上面看到我们的站点是什么情况了,包括我们需要探测的URL网站,返回的状态码是不是200,ssl证书是否快要过期。我们的K8S服务器等是否是通的,我们的nacos连接端口是不是正常的等。但是Dashboard只能是看的,不能告警。我们还需要配置告警策略,当证书即将过期的时候,比如还有7天证书过期,就发邮件给运维人员;设置的告警策略如下:
vim blackboxrules.yamlapiVersion: monitoring.coreos.com/v1kind: PrometheusRulemetadata:labels:prometheus: k8srole: alert-rulesname: blackbox-rulesnamespace: monitoringspec:groups:- name: blackbox_url_stats alarmrules:- alert: blackbox_url_stats Unable to connectexpr: probe_success == 0for: 1mlabels:severity: disasterannotations:summary: "Interface/host/port {{ $labels.instance }} Unable to connect"description: "Please check it as soon as possible"- alert: SSL Certificate expiration warningexpr: (probe_ssl_earliest_cert_expiry - time())/86400 <7for: 1hlabels:severity: disasterannotations:summary: 'The certificate of domain name {{$labels.instance}}expire in{{ printf "%.1f" $value }}days,Please update the certificate as soon as possible'description: "SSL certificate expiration warning"kubectl apply -f blackboxrules.yaml
部署完成告警策略文件之后,可以登录prometheus的web页面查看是否已经有了告警策略文件;




