Prometheus 服务发现_m0_59430185_prometheus 服务发现

大大的周 02-07 8262

文章目录一、Prometheus 部署二、部署监控其他节点1. 主配置文件解析2. server 节点配置3. 加入slave节点监控4. 验证是否加入成功三、表达式浏览器1. 表达式浏览器常规使用2. 内存使用率四、service discover 服务发现1. Prometheus 服务发现2. prometheus 服务发现机制3. 静态配置发现4. 动态发现4.1 基于文件形式的服务发现4.2 文件发现的作用 5. 基于 DNS 自动发现6. 基于 consul 发现6.1 概述6.2 部署安装 7.Grafana 部署及模板展示7.1Grafana 概述7.2 安装

一、Prometheus 部署

环境准备

hostnamectl set-hostname prometheus systemctl stop firewalld systemctl disable firewalld setenforce 0 vim /etc/resolv.conf nameserver 114.114.114.114 ntpdate ntp1.aliyun.com #时间同步必做，否则出问题

解包并启动服务

#安装包拖进去然后解压指定目录 tar zxvf prometheus-2.27.1.linux-amd64.tar.gz -C /usr/local/ cd /usr/local/ cd prometheus-2.27.1.linux-amd64/ ./prometheus

再打开一个终端并查看端口是否已经开启

[root@prometheus ~]#netstat -antp | grep 9090 tcp6 0 0 :::9090 :::* LISTEN 2463/./prometheus tcp6 0 0 ::1:9090 ::1:53170 ESTABLISHED 2463/./prometheus tcp6 0 0 ::1:53170 ::1:9090 ESTABLISHED 2463/./prometheus

访问web页面192.168.74.135:9090（表达式浏览器）访问192.168.74.135:9090/metrics 查看 prometheus 自带的内键指标

二、部署监控其他节点主机名地址所需安装包prometheus192.168.74.135prometheus-2.27.1.linux-amd64.tar.gzserver1192.168.74.122node_exporter-1.1.2.linuz-amd64.tar.gzserver2192.168.74.128node_exporter-1.1.2.linuz-amd64.tar.gzserver3192.168.74.131node_exporter-1.1.2.linuz-amd64.tar.gz

主服务器由于上面已经配置完成了所以就不再重新配置了

1. 主配置文件解析 cd prometheus-2.27.1.linux-amd64/ vim prometheus.yml my global config global: #全局组件 scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute. #每隔多久抓取一次指标，不设置默认1分钟 evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute. #内置告警规则的评估周期 #scrape_timeout is set to the global default (10s). # Alertmanager configuration #对接的altermanager(第三方告警模块) alerting: alertmanagers: - static_configs: - targets: - alertmanager:9093 # Load rules once and periodically evaluate them according to the global 'evaluation_interval'. rule_files: #告警规则；告警规则可以使用yml规则去书写 - "first_rules.yml" - "second_rules.yml" # A scrape configuration containing exactly one endpoint to scrape: # Here it's Prometheus itself. scrape_configs: #数据采集模块 # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config. ##对于所抓取的指标数据采集的来源在意job_name来定义 - job_name: 'prometheus' #对于指标需要打上的标签，对于PrometheusSQL（查询语句）的标签：比如prometheus{target='values'} # metrics_path defaults to '/metrics' #收集数据的路径；展示使用metrics模式 # scheme defaults to 'http'. #默认抓取的方式是http static_configs: #对于Prometheus的静态配置监听端口具体数据收集的位置默认的端口9090 - targets: ['localhost:9090'] 2. server 节点配置上传压缩包加载 node_exporter tar zxvf node_exporter-1.1.2.linux-amd64.tar.gz cd node_exporter-1.1.2.linux-amd64/ cp node_exporter /usr/local/bin/

开启服务

./node_exporter netstat -antp | grep 9100 ./node_exporter --help #可以查看命令可选项服务管理方式utilfile（文件读取工具） [Unit] Description=node_exporter Documentation=https:/prometheus.io/ After=network.targets [serveice] Type=simple User=prometheus ExecStart=/usr/local/bin/node_exporter \ --collector.ntp \ --collector.mountstats \ --collector.systemd \ --collertor.tcpstat ExecReload=/bin/kill -HUP $MAINPID TimeoutStopSec=20s Restart=always [Install] WantedBy=multi-user.target

访问salve服务器节点查看抓取内容访问主节点查看内容 3. 加入slave节点监控

需要在192.168.74.135 prometheus 服务端停止 prometheus ，修改配置文件添加静态 targets 后才能使得server节点加入

cd /usr/local/prometheus-2.27.1.linux-amd64/ vim prometheus.yml #配置文件的最后添加以下内容 - job_name: 'nodes' static_configs: - targets: - 192.168.74.122:9100 - 192.168.74.128:9100 - 192.168.74.131:9100 ./prometheus #启动服务

4. 验证是否加入成功

三、表达式浏览器 1. 表达式浏览器常规使用

在prometheusUI控制台上可以进行数据过滤

查看CPU使用总量 node_cpu_seconds_total

计算过去5分钟内的CPU空闲速率 irate(node_cpu_seconds_total{mode="idle"}[5m])

解析： irate：速率计算函数（灵敏度非常高） node_cpu_seconds_total：node节点CPU使用总量（指标） mode=“idle” 空闲指标（标签） 5m：过去的5分钟内，所有CPU空闲数的样本值，每个数值做速率运算 {mode=“idle”} ：整体称为标签过滤器

每台主机CPU 在5分组内的平均使用率 (1- avg (irate(node_cpu_seconds_total{mode='idle'}[5m]))by (instance))* 100

解析： avg:平均值 avg (irate(node_cpu_seconds_total{mode=‘idle’}[5m])：可以理解为CPU空闲量的百分比 by (instance):表示的是所有节点 (1- avg (irate(node_cpu_seconds_total{mode=‘idle’}[5m]))by (instance))* 100：CPU 5分钟内的平均使用率

查询1分钟平均负载超过主机CPU数量两倍的时间序列 node_load1 > on (instance) 2 * count (node_cpu_ceconds_total{mode='idle'}) by(instance)

2. 内存使用率 node_memory_MemTotal_bytes node_memory_MemFree_bytes node_memory_Buffers_bytes node_memory_Cached_bytes #计算使用率可用空间：以上后三个指标之和己用空间：总空间减去可用空间使用率：已用空间除以总空间四、service discover 服务发现 1. Prometheus 服务发现 ① 基于文件的服务发现：定义一组资源“子”配置文件yaml格式里面只存方需要采集的targets信息，此种方式可以被pro动态获取到，而不需要重启② 基于DNS的服务发现： SRV形式③ 基于API的服务发现： Kubernetes、Consul、Azure、重新标记 target重新打标 metric重新打标④ 基于 K8S 的服务发现 2. prometheus 服务发现机制 ① Prometheus Server的数据抓取工作于Pull模型，因而，它必需要事先知道各 Target 的位置，然后才能从相应的Exporter或Instrumentation中抓取数据② 对于小型的系统环境来说，通过static_configs指定各Target便能解决问题，这也是最简单的配置方法;每个Targets用一个网络端点(ip:port）进行标识;③ 对于中大型的系统环境或具有较强动态性的云计算环境来说，静态配置显然难以适用;因此，Prometheus为此专门设计了一组服务发现机制，以便于能够基于服务注册中心（服务总线）自动发现、检测、分类可被监控的各Target，以及更新发生了变动的Target指标抓取的生命周期④ 在每个scrape_interval期间，Prometheus都会检查执行的作业(Job);这些作业首先会根据Job上指定的发现配置生成target列表，此即服务发现过程;服务发现会返回一个Target列表，其中包含一组称为元数据的标签，这些标签都以" meta_"为前缀;⑤ 服务发现还会根据目标配置来设置其它标签，这些标签带有"“前缀和后缀，b包括"scheme”、" address"和" metrics path_"，分别保存有target支持使用协议(http或https，默认为http) 、 target的地址及指标的URI路径（默认为/metrics) ;⑥ 若URI路径中存在任何参数，则它们的前缀会设置为" param"这些目标列表和标签会返回给Prometheus，其中的一些标签也可以配置中被覆盖;⑦ 配置标签会在抓取的生命周期中被重复利用以生成其他标签，例如，指标上的instance标签的默认值就来自于address标签的值;⑧ 对于发现的各目标，Prometheus提供了可以重新标记（relabel）目标的机会，它定义在job配置段的relabel_config配置中，常用于实现如下功能 3. 静态配置发现 #修改prometheus服务器上的配置为文件，指定targets的端口上面配置过 vim prometheus.yml - job_name: 'nodes' static_configs: - targets: - 192.168.8.19:9100 - 192.168.8.18:9100 - 192.168.8.17:9100 4. 动态发现 4.1 基于文件形式的服务发现基于文件的服务发现仅仅略优于静态配置的服务发现方式，它不依赖于任何平台或第三方服务，因而也是最为简单和通用的实现方式。prometheus server 定期从文件中加载 target 信息（pro-server pull指标发现机制 -job_name 获取我要 pull 的对象target) 文件可以只用 json 和 yaml 格式，它含有定义的 target 列表，以及可选的标签信息以下第一配置，能够将 prometheus 默认的静态配置转换为基于文件的服务发现时所需的配置（prometheus 会周期性的读取、重载此文件中的配置，从而达到动态发现、更新的操作)

① 环境准备

cd /usr/local/prometheus-2.27.1.linux-amd64/ mkdir file_sd cd file_sd mkdir targets #把修改后的Prometheus.yml上传到file_sd目录下 cd targets #把nodes_centos.yaml和Prometheus_server.yaml 上传到targets目录下

匹配的文件解析

② 指定配置文件启动

./prometheus --config.file=./file_sd/prometheus.yml

③ 开启三个slave节点

./node_exporter

④ 浏览器登录查看http://192.168.74.135:9090/targets

⑤ 重开一个终端，添加一个节点信息，并查看这个节点信息是否加入

4.2 文件发现的作用

如果增加 node 或着 prometheus 服务端节点只需更改 nodes_centos.yaml prometheus_server.yaml 两个文件添加地址就行，不需要停止服务

5. 基于 DNS 自动发现基于 DNS 的服务发现针对一组 DNS 域名进行定期查询，以发现待监控的目标查询时使用的 DNS 服务器由 /etc/resolv.conf 文件指定该发现机制依赖于A、AAAA和SRv资源记录，且仅支持该类方法，尚不支持 RFC6763 中的高级 DNS 发现方式 Ps: ##SRV: SRv记录的作用是指明某域名下提供的服务。实例： http._tcp.example.com.SRV 10 5 80. ·SRv后面项目的含义: 10-优先级,类似MX记录 5-权重 80-端口 · -实际提供服务的主机名。同时SRv可以指定在端口上对应哪个service #thprometheus 基于Dws的服务中的SRv记录，让prometheus发现指定target上对应的端口对应的是exporter或instrumentation 6. 基于 consul 发现 6.1 概述

一款基于 golana 开发的开源工具，主要面向分布式，服务化的系统提供服务注册、服务一发现和配置管理的功能提供服务注册/发现、健康检查、Key/value 存储、多数据中心和分布式一致性保证等功能

原理：通过定义 json 文件将可以进行数据采集的服务注册到 consul 中，用于自动发现同时使用 prametheus 做为 client端获取 consul 上注册的服务，从而进行获取数据

6.2 部署安装

prometheus 通过 consul 自动发现主机清单配置

思路：普罗米修斯的 prometheus-servers.json 文件中写的是它的主机信息，主机信息中写有相对应的标签tags: "prometheus"，这个配置文件被 consul 所加载，加载后会显示在8500的端口上，prometheus 在 yml 文件中也定义了二个 job："prometheus" "nodes"，关联了 consul 的位置192.168.74.135:8500，Prometheus 会定期到 consul 8500 上去找标签是prometheus的节点，在8500上就可以获取主机信息，找到以后可以直接到http://192.168.74.135:9090/metrics 上收集信息，最后通过 ui 表达式浏览器显示出来

① 安装consul_1.9.0版本

[root@prometheus ~]#wget http://101.34.22.188/consul/consul_1.9.0_linux_amd64.zip &> /dev/null [root@prometheus ~]#ls consul_1.9.0_linux_amd64.zip [root@prometheus ~]#unzip consul_1.9.0_linux_amd64.zip -d /usr/local/bin/ Archive: consul_1.9.0_linux_amd64.zip inflating: /usr/local/bin/consul

② 启动开发者模式

consul 开发者模式，可以快速开启单节点的 consul 服务，具有完整功能，方便开发测试

[root@prometheus ~]#mkdir -pv /consul/data mkdir: 已创建目录 "/consul" mkdir: 已创建目录 "/consul/data" [root@prometheus ~]#mkdir /etc/consul [root@prometheus ~]#cd /etc/consul/ [root@prometheus /etc/consul]#consul agent -dev -ui -data-dir=/consul/data/ -config-dir=/etc/consul/ -client=0.0.0.0 ...... #参数解析 consul agent #使用agent代理的方式来开启 -dev #开发者模式 -ui #启用ui界面 -data-dir #数据文件的位置 -config-dir #consul的配置文件位置 -client #监听的客户端为所有

-③ 编辑 /etc/consul 目录下的Prometheus-servers.json 配置文件

[root@prometheus ~]#vim /etc/consul/prometheus-servers.json { "services": [ { "id": "prometheus-server-node01", "name": "prom-server-node01", "address": "192.168.74.135", "port": 9090, "tags": ["prometheus"], "checks": [{ "http": "http://192.168.74.135:9090/metrics", "interval": "5s" }] } ] } [root@prometheus ~]#consul reload Configuration reload triggered [root@prometheus ~]#netstat -antp |grep consul tcp 0 0 127.0.0.1:8300 0.0.0.0:* LISTEN 64781/consul tcp 0 0 127.0.0.1:8301 0.0.0.0:* LISTEN 64781/consul tcp 0 0 127.0.0.1:8302 0.0.0.0:* LISTEN 64781/consul tcp 0 0 127.0.0.1:45987 127.0.0.1:8300 ESTABLISHED 64781/consul tcp 0 0 127.0.0.1:8300 127.0.0.1:45987 ESTABLISHED 64781/consul tcp6 0 0 :::8600 :::* LISTEN 64781/consul tcp6 0 0 :::8500 :::* LISTEN 64781/consul tcp6 0 0 :::8502 :::* LISTEN 64781/consul ④ 先终止Prometheus服务，修改配置文件 [root@prometheus ~]#ps aux|grep prometheus root 63526 0.3 2.4 1114924 94196 pts/1 Sl+ 12:10 0:22 ./prometheus --config.file=./file_sd/prometheus.yml root 64823 0.0 0.0 112728 976 pts/2 S+ 14:04 0:00 grep --color=auto prometheus [root@prometheus ~]#kill -9 63526 [root@prometheus ~]#ps aux|grep prometheus root 64826 0.0 0.0 112728 976 pts/2 S+ 14:04 0:00 grep --color=auto prometheus [root@prometheus ~]#cd /usr/local/prometheus-2.27.1.linux-amd64/ [root@prometheus /usr/local/prometheus-2.27.1.linux-amd64]#mkdir consul_sd [root@prometheus /usr/local/prometheus-2.27.1.linux-amd64]#ls console_libraries consoles consul_sd data file_sd LICENSE nohup.out NOTICE prometheus prometheus.yml promtool [root@prometheus /usr/local/prometheus-2.27.1.linux-amd64]#cd consul_sd/ [root@prometheus /usr/local/prometheus-2.27.1.linux-amd64/consul_sd]#wget http://101.34.22.188/consul/prometheus/prometheus.yml [root@prometheus /usr/local/prometheus-2.27.1.linux-amd64/consul_sd]#cat prometheus.yml # my global config # Author: MageEdu <mage@magedu.com> # Repo: http://gitlab.magedu.com/MageEdu/prometheus-configs/ global: scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute. evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute. # scrape_timeout is set to the global default (10s). # Alertmanager configuration alerting: alertmanagers: - static_configs: - targets: # - alertmanager:9093 # Load rules once and periodically evaluate them according to the global 'evaluation_interval'. rule_files: # - "first_rules.yml" # - "second_rules.yml" # A scrape configuration containing exactly one endpoint to scrape: # Here it's Prometheus itself. scrape_configs: # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config. - job_name: 'prometheus' # metrics_path defaults to '/metrics' # scheme defaults to 'http'. consul_sd_configs: - server: "192.168.10.20:8500" tags: - "prometheus" refresh_interval: 2m # All nodes - job_name: 'nodes' consul_sd_configs: - server: "192.168.10.20:8500" tags: - "nodes" refresh_interval: 2m #注意修改 IP #指定配置文件位置运行 Prometheus，可以 nohup ... & 后台运行 [root@prometheus /usr/local/prometheus-2.27.1.linux-amd64]#./prometheus --config.file=./consul_sd/prometheus.yml ......

⑤ 使用浏览器访问http://192.168.8.20:8500 查看节点是否加入进去

[root@prometheus /etc/consul]#wget http://101.34.22.188/consul/prometheus/nodes.json ...... [root@prometheus /etc/consul]#cat nodes.json { "services": [ { "id": "node_exporter-node01", "name": "node01", "address": "192.168.74.131", "port": 9100, "tags": ["nodes"], "checks": [{ "http": "http://192.168.74.131:9100/metrics", "interval": "5s" }] }, { "id": "node_exporter-node02", "name": "node02", "address": "192.168.74.122", "port": 9100, "tags": ["nodes"], "checks": [{ "http": "http://192.168.74.122:9100/metrics", "interval": "5s" }] } ] } [root@prometheus /etc/consul]#consul reload Configuration reload triggered 7.Grafana 部署及模板展示 7.1Grafana 概述

grafana 是一款基于 go 语言开发的通用可视化工具，支持从不同的数据源加载并展示数据，可作为其数据源的部分储存系统如下所示： ① TSDB：Prometheus、InfluxDB、OpenTSDB 和 Graphit ② 日志和文档存储：Loki 和 Elasticsearch ③ 分布式请求跟踪：Zipkin、Jaeger 和 Tenpo ④ SQLDB：MySQL、PostgreSQL 和 Microsoft SQL Server

grafana 基础默认监听于 TCP 协议的 3000 端口，支持集成其他认证服务，且能够通过 /metrics 输出内建指标

支持的展示方式： ① 数据源（Data Source）：提供用于展示数据的储存系统 ② 仪表盘（Dashboard）：组织和管理数据的可视化面板 ③ 团队和用户：提供了面向企业组织层级的管理能力

7.2 安装 wget http://101.34.22.188/grafana/grafana-7.3.6-1.x86_64.rpm yum install -y grafana-7.3.6-1.x86_64.rpm systemctl enable grafana-server && systemctl start grafana-server netstat -nuptl|grep 3000

浏览器访问 http://IP:3000，默认账号密码：admin admin