pushgateway详细说明

prometheus pushgateway 数据监控 target

1 pushgateway介绍

（1）pushgateway是什么==

pushgateway是另一种数据采集的方式，采用被动推送来获取监控数据的prometheus插件，它可以单独运行在任何节点上，并不一定要运行在被监控的客户端。

首先通过用户自定义编写的脚本把需要监控的数据发送给pushgateway，pushgateway再将数据推送给对应的Prometheus服务。

对于短时运行不支持轮询的任务，可以引入 pushgateway，将指标数值以 push 的方式推送到 pushgateway暂存，然后 prometheus 从 pushgateway 中轮询

pushgateway是Prometheus下的一个组件，用来当做采集对象和Prometheus的代理，Prometheus会定时的从gateway上面pull数据。

（2）使用pushgateway的原因

原因一：因为Prometheus 采用 pull 模式，可能由于不在一个子网或者防火墙，导致 Prometheus 无法直接拉取各个 target 数据。

Prometheus 在一些情况下无法直接拉取各个 target 数据

原因二：在监控业务数据的时候，需要将不同数据汇总, 由 Prometheus 统一收集。

（3）弊端

a:将多个节点数据汇总到 pushgateway, 如果pushgateway挂了，受影响比多个target大。

通过单个 Pushgateway 监控多个实例时， Pushgateway 将会成为单点故障和潜在瓶颈

b:Prometheus 拉取状态 up 只针对 pushgateway，无法做到对每个节点有效。

c:Pushgateway可以持久化推送给它的所有监控数据。因此，即使你的监控已经下线，prometheus还会拉取到旧的监控数据，需要手动清理pushgateway不要的数据

（4）数据流程

脚本---push推送数据--->pushgateway<------拉取数pull---prometheus

2 安装配置

（1）解压

tar -zxvf pushgateway-1.2.0.linux-amd64.tar.gz

cp -r pushgateway-1.2.0.linux-amd64 /usr/local/pushgateway

（2）编写systemd管理文件

vi /usr/lib/systemd/system/pushgateway.service

[Unit]

Descriptinotallow=Prometheus Pushgateway daemon

After=network.target

[Service]

Type=simple

User=root

Group=root

ExecStart=/usr/local/pushgateway/pushgateway \

--persistence.file=/usr/local/pushgateway/pushgateway_persist_file \

--persistence.interval=5m \

--web.listen-address=:9091

Restart=on-failure

[Install]

WantedBy=multi-user.target

（3）配置说明

\# --persistence.file=/usr/local/pushgateway/pushgateway_persist_file，指定持久化文件路径或名称。如果没有指定存储，则监

控指标仅保存在内存中，若出现pushgateway重启或意外故障，便会导致数据丢失。默认情况下，持久化文件每5分钟写一次，可以使用

“--persistence.interval”重新设置写入文件的时间间隔。

\# --web.listen-address=:9091，进行端口设置。

（4）重新加载system并启动pushgateway

systemctl daemon-reload

systemctl restart pushgateway

systemctl status pushgateway

（5）访问

http://192.168.10.131:9091

（6）Prometheus配置

因为Prometheus配置pushgateway的时候，指定job和instance,但是它只表示pushgateway实例，不能真正表达收集数据的含义。所以在

prometheus中配置pushgateway的时候，需要添加honor_labels: true参数，从而避免收集数据被push本身的job和instance被覆盖。不加

honor_labels: true，会取gateway的job和instance，设置了的话会取push过来的数据，job必填，instance没有就为""空字符串

- job_name: pushgetway

honor_labels: true

honor_timestamps: true

scrape_interval: 15s

scrape_timeout: 10s

metrics_path: /metrics

scheme: http

static_configs:

- targets:

- 127.0.0.1:9091

labels:

instance: pushgateway

3测试验证

(1) 提交一条数据到 {job=‘some_job’}

echo "some_metric 3.14" | curl --data-binary @- http://pushgateway.example.org:9091/metrics/job/some_job

(2) 下面我们加上instance的值

echo "some_metrics 3.14" | curl --data-binary @- http://pushgateway.example.org:9091/metrics/job/some_job/instance/some_instance

可以看到pushgateway页面上产生了两个group，pgw是以job和instance分组的。用来更细力度的区分。

(3) 可以添加更多的标签，但是只会以job和instance区分

官方的例子如下：

cat <<EOF | curl --data-binary @- http://pushgateway.example.org:9091/metrics/job/some_job/instance/some_instance

\# TYPE some_metric counter

some_metric{label="val1"} 42

\# TYPE another_metric gauge

\# HELP another_metric Just an example.

another_metric 2398.283

EOF

简单写

echo "some_metrics{tag=\"test\"} 3.14" | curl --data-binary @- http://127.0.0.1:9091/metrics/job/some_job/instance/some_instance

可以看到，这次并没有新增一个group，而且在同一个group下也没用多出来，而是把上一个覆盖了。

4 数据推送

（1）推送格式

http://<ip>:9091/metrics/job/<JOBNAME>{/<LABEL_NAME>/<LABEL_VALUE>}，其中 <JOBNAME> 是必填项，为 job 标签值，后边可以跟任意数量的标签对，一般我们会添加一个 instance/<INSTANCE_NAME> 实例名称标签，来方便区分各个指标。

（2）简单命令行数据推送

echo "test_metric 123456" | curl --data-binary @- http://pushgateway.example.org:9091/metrics/job/test_job

刷新pushgateway，点击Metrics

除了test_metric外，同时还新增了push_time_seconds和push_failure_time_seconds两个指标，这两个是PushGateway系统自动生成的相关指标。

同时，在Prometheus UI页面上Graph页面可以查询test_metric的指标了。

test_metric{job="test_job"}

（3）较为复杂数据的推送（命令行数据）

Push 一个复杂一些的，一次写入多个指标，而且每个指标添加 TYPE 及 HELP 说明。

推送命令：

cat <<EOF | curl --data-binary @- http://pushgateway.example.org:9091/metrics/job/test_job/instance/test_instance

\# TYPE test_metrics counter

test_metrics{label="app1",name="demo"} 100.00

\# TYPE another_test_metrics gauge

\# HELP another_test_metrics Just an example.

another_test_metrics 123.45

EOF

指令说明：

/metrics/job/test_job和metrics/job/test_job/instance/test_instance，它们都属于test_job，但是它们属于两个指标值，因为instance对二者做了区分

在Prometheus UI页面上Graph页面可以查询test_metrics的指标

test_metrics{instance="test_instance",job="test_job",label="app1",name="demo"}

（4）大量数据提交（文件提交）

cat pgdata.txt

\# TYPE http_request_total counter

\# HELP http_request_total get interface request count with different code.

http_request_total{code="200",interface="/v1/save"} 276

http_request_total{code="404",interface="/v1/delete"} 0

http_request_total{code="500",interface="/v1/save"} 1

\# TYPE http_request_time gauge

\# HELP http_request_time get core interface http request time.

http_request_time{code="200",interface="/v1/core"} 0.122

curl -XPOST --data-binary @pgdata.txt http://pushgateway.example.org:9091/metrics/job/app/instance/app-172.30.0.0

（5）特殊推送base64 push

在value存在特殊字符（例如路径中存在换行符job="directory_cleaner",path="/var/tmp"）的时候，上报路径会变成/metrics/job/directory_cleaner/path//var/tmp

导致上报路径错误，此时需要做base64转换。

格式：key@base64/value

/metrics/job/directory_cleaner/path@base64/L3Zhci90bXA

（6）推送脚本

#!/bin/bash

PUSHGATEWAY_URL="http://192.168.100.11:14091/metrics/job/lyb/instance/lybanhui"

for i in {1..99}
do
echo $i
cat <<EOF | curl --data-binary @- $PUSHGATEWAY_URL
kfklyb{host="192.168.100.10",name="sjtj"} ${i}
EOF
sleep 10
done

5注意事项

（1 ）指标值只能是数字类型，非数字类型报错

echo "test_metric 12.34.56ff" | curl --data-binary @- http://pushgateway.example.org:9091/metrics/job/test_job_1

"text format parsing error in line 1: expected float as value, got "12.34.56ff""

（2）指标值支持最大长度为 16 位，超过16 位后默认置为 0

echo "test_metric 1234567898765432123456789" | curl --data-binary @- http://pushgateway.example.org:9091/metrics/job/test_job_2

实际获取值 test_metric{job=“test_job_2”} 123456789876543200000000

（3） PushGateway 数据持久化操作

默认PushGateway不做数据持久化操作，当PushGateway重启或者异常挂掉，导致数据的丢失，可以通过启动时添加-persistence.file和-persistence.interval参数来持久化数据。

-persistence.file 表示本地持久化的文件，将 Push 的指标数据持久化保存到指定文件，

-persistence.interval 表示本地持久化的指标数据保留时间，若设置为 5m，则表示 5 分钟后将删除存储的指标数据。

（4） PushGateway推送及Prometheus拉取时间

设置Prometheus每次从PushGateway拉取的数据，并不是拉取周期内用户推送上来的所有数据，而是最后一次Push到PushGateway上的数据，

所以推荐设置推送时间小于或等于Prometheus拉取的时间，这样保证每次拉取的数据是最新Push上来的。

阅读量：2071次，本文由六度创作，采用知识共享署名4.0 国际许可协议进行许可。
本站文章除注明转载/出处外，均为本站原创或翻译，转载前请务必署名。