Prometheus + Grafana
如果购置了多台服务器,就需要一个服务器监控程序来查看它们的运行状态
最开始我是用的是探针,如那吒监控、ServerStatus
但是哪吒监控经常出各种 bug,ServerStatus 依靠大佬们个人维护,有好几个已经停更了
于是想重新找一个开源工具搭建
Grafana + Prometheus + node_exporter 就是一个非常好的服务器状态监控解决方案
- node_exporter 运行在客户极上,将收集到的系统数据按格式整理好放在网页上
- Prometheus 定期到客户机收集数据,按时间序列保存
- Grafana 从 Prometheus 读取数据,将数据按时间序列显示为图表等形式
实现效果
Install
1 | apt update |
(Clients) node_exporter
Latest Release · prometheus/node_exporter
用最新版本替换
在每个客户机上执行以下内容
1
2
3
4
5
6
7wget https://github.com/prometheus/node_exporter/releases/download/v1.8.2/node_exporter-1.8.2.linux-amd64.tar.gz
tar -xzvf node_exporter-1.8.2.linux-amd64.tar.gz
sudo mv node_exporter-1.8.2.linux-amd64/node_exporter /usr/local/bin
rm node_exporter-*.tar.gz
rm -r node_exporter-*.linux-amd64*
sudo useradd -rs /bin/false node_exporter
vim /etc/systemd/system/node_exporter.service1
2
3
4
5
6
7
8
9
10
11
12
13
14
15[Unit]
Description=node_exporter
Wants=network-online.target
After=network-online.target
[Service]
User=node_exporter
Group=node_exporter
Type=simple
Restart=on-failure
RestartSec=5s
ExecStart=/usr/local/bin/node_exporter
[Install]
WantedBy=multi-user.target1
2
3sudo systemctl daemon-reload
sudo systemctl enable --now node_exporter
sudo systemctl status node_exporter此时可以使用
<Server IP>:9100/metrics
查看导出的数据1
ufw allow from <Server IP> to any port 9100 comment 'node_exporter'
(Server) Prometheus
Prometheus 官网:Prometheus
Prometheus Latest Release:Latest Release · prometheus/prometheus
用最新版本替换
1
2
3
4
5
6
7
8
9
10
11
12
13wget https://github.com/prometheus/prometheus/releases/download/v2.53.1/prometheus-2.53.1.linux-amd64.tar.gz
tar -xzvf prometheus-2.53.1.linux-amd64.tar.gz
cd prometheus-2.53.1.linux-amd64
sudo mv prometheus promtool /usr/local/bin/
sudo mkdir -p /etc/prometheus /var/lib/prometheus
sudo mv prometheus.yml /etc/prometheus/prometheus.yml
sudo mv consoles/ console_libraries/ /etc/prometheus/
cd ..
rm -r prometheus-2.53.1.linux-amd64.tar.gz
rm -r prometheus-2.53.1.linux-amd64
sudo useradd -rs /bin/false prometheus
sudo chown -R prometheus: /etc/prometheus /var/lib/prometheus
vim /etc/systemd/system/prometheus.service1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22[Unit]
Description=Prometheus
Wants=network-online.target
After=network-online.target
[Service]
User=prometheus
Group=prometheus
Type=simple
Restart=on-failure
RestartSec=5s
ExecStart=/usr/local/bin/prometheus \
--config.file /etc/prometheus/prometheus.yml \
--storage.tsdb.path /var/lib/prometheus/ \
--web.console.templates=/etc/prometheus/consoles \
--web.console.libraries=/etc/prometheus/console_libraries \
--web.listen-address=0.0.0.0:9090 \
--web.enable-lifecycle \
--log.level=info
[Install]
WantedBy=multi-user.target1
2
3sudo systemctl daemon-reload
sudo systemctl enable --now prometheus
sudo systemctl status prometheus此时可以通过
http://<Server IP>:9090
访问 Prometheus 仪表盘1
ufw allow 9090 comment 'prometheus'
(Server) Add Clients to Server
每次添加客户机时按以下方式更新 Prometheus
1
vim /etc/prometheus/prometheus.yml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: "prometheus"
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.
static_configs:
- targets: ["localhost:9090"]
# 添加以下内容
- job_name: "remote_collector"
scrape_interval: 1m
static_configs:
- targets: ["<Client 1 IP>:9100", "<Client 2 IP>:9100"]
relabel_configs:
- source_labels: [__address__]
target_label: instance
replacement: '<Client 1 Name>'
regex: '<Client 1 IP>:9100'
- source_labels: [__address__]
target_label: instance
replacement: '<Client 2 Name>'
regex: '<Client 2 IP>:9100'1
systemctl restart prometheus
前往
<Server IP>:9090
, Status, Targets,将显示所有 Clients
(Server) Grafana
1 | sudo apt-get install -y apt-transport-https software-properties-common wget |
- 此时可以通过
<Server IP>:3000
访问 Grafana
(Server) Link Grafana and Prometheus
- 浏览器访问
<Server IP>:3000
,初始用户名和初试密码均为admin
,登录成功后修改密码 - 点击左上角三横线,展开
Connections
,点击Data sources
- 点击
Add data source
,Prometheus
- URL:
http://localhost:9090
- URL:
- 点击
Save & test
- 点击左上角三横线,
Dashboards
,New
,Import
- 输入 ID,如
1860
,点击Load
,底部选择数据源为 Prometheus,点击Import
- 完成,现在可以通过
<Server IP>:3000
查看仪表盘
Traffic Statistics: vnstat
使用 Grafana + Prometheus + node_exporter 可以实时监控客户端传递的数据,对各种实时数据的监控效果良好
然而,对于需要进行时间段汇总的任务,如流量统计等,效果非常有限,而且数据和实际值差别较大
由于其仅记录每个时间点的数据,无法像数据库那样根据客户端传输的数据更新每个时段的流量信息
因此,我采用 vnstat 进行流量信息统计,然后导出给 Prometheus,从而在 Grafana 面板上展示
但是我找了一圈没有找到 vnstat Exporter,只好自己手搓了一个,实现的效果略显粗糙,仅供参考
(Clients) vnstat
Install vnstat
1
2apt install vnstat
systemctl enable vnstatvnstat 从安装完成后开始统计流量信息,每五分钟更新一次,如果还没有信息就稍等一会
按小时查看流量
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15root@localhost:~# vnstat -h
eth0 / hourly
hour rx | tx | total | avg. rate
------------------------+-------------+-------------+---------------
2024-08-01
21:00 9.07 MiB | 22.31 MiB | 31.39 MiB | 73.13 kbit/s
22:00 9.36 MiB | 22.78 MiB | 32.14 MiB | 74.90 kbit/s
23:00 10.91 MiB | 44.68 MiB | 55.59 MiB | 129.53 kbit/s
2024-08-02
00:00 118.04 MiB | 6.00 GiB | 6.11 GiB | 14.59 Mbit/s
01:00 124.24 GiB | 7.18 GiB | 131.42 GiB | 313.57 Mbit/s
02:00 45.43 GiB | 8.37 GiB | 53.80 GiB | 128.36 Mbit/s
------------------------+-------------+-------------+---------------此外,可以按 5 分钟
-5
、日-d
、月-m
、年-y
查看、导出为 json--json
创建
vnstat_exporter.py
脚本只成功用
prometheus_client
写了一个 Python 脚本,对资源消耗较高,之后有机会重写成 Shell 脚本1
2
3apt install python3-pip
pip install prometheus-client
vim /usr/local/bin/vnstat_exporter.py1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140from prometheus_client import start_http_server, Gauge
import subprocess
import json
import time
import argparse
import re
# Define metrics
traffic_gauge = Gauge('vnstat_traffic', 'Traffic usage from vnstat',
['interface', 'time_unit', 'type', 'direction'])
available_traffic_gauge = Gauge('available_traffic', 'Available traffic',
['available_traffic_cycle', 'available_traffic_direction'])
def convert_to_bytes(traffic_str):
"""
Converts a traffic string (e.g. '2TB', '500GB', '250MB') to bytes.
"""
unit_multipliers = {
'B': 1,
'KB': 1024,
'MB': 1024**2,
'GB': 1024**3,
'TB': 1024**4,
}
# Match the number and the unit
match = re.match(r'(\d+(?:\.\d+)?)\s*([KMGTP]?B)', traffic_str.strip())
if match:
value = float(match.group(1))
unit = match.group(2)
return value * unit_multipliers[unit]
else:
raise ValueError(f"Invalid traffic string: {traffic_str}")
def parse_vnstat_output(output):
"""
Parses the vnstat JSON output and updates Prometheus metrics.
"""
data = json.loads(output)
for interface in data.get('interfaces', []):
iface_name = interface.get('name', 'unknown')
traffic = interface.get('traffic', {})
# Process traffic data for each time unit
# Process 5-minute data if available
for entry in traffic.get('fiveminute', []):
timestamp = f"{entry['date']['year']}-{entry['date']['month']:02d}-{entry['date']['day']:02d} {entry['time']['hour']:02d}:{entry['time']['minute']:02d}"
rx = entry.get('rx', 0)
tx = entry.get('tx', 0)
total = rx + tx
print(f"5-min data: {iface_name}, {timestamp}, rx={rx}, tx={tx}, total={total}")
traffic_gauge.labels(interface=iface_name, time_unit='five_minute', type='total', direction='in').set(rx)
traffic_gauge.labels(interface=iface_name, time_unit='five_minute', type='total', direction='out').set(tx)
traffic_gauge.labels(interface=iface_name, time_unit='five_minute', type='total', direction='total').set(total)
# Process hourly data if available
for entry in traffic.get('hour', []):
timestamp = f"{entry['date']['year']}-{entry['date']['month']:02d}-{entry['date']['day']:02d} {entry['time']['hour']:02d}:00"
rx = entry.get('rx', 0)
tx = entry.get('tx', 0)
total = rx + tx
print(f"Hour data: {iface_name}, {timestamp}, rx={rx}, tx={tx}, total={total}")
traffic_gauge.labels(interface=iface_name, time_unit='hour', type='total', direction='in').set(rx)
traffic_gauge.labels(interface=iface_name, time_unit='hour', type='total', direction='out').set(tx)
traffic_gauge.labels(interface=iface_name, time_unit='hour', type='total', direction='total').set(total)
# Process daily data if available
for entry in traffic.get('day', []):
date = f"{entry['date']['year']}-{entry['date']['month']:02d}-{entry['date']['day']:02d}"
rx = entry.get('rx', 0)
tx = entry.get('tx', 0)
total = rx + tx
print(f"Day data: {iface_name}, {date}, rx={rx}, tx={tx}, total={total}")
traffic_gauge.labels(interface=iface_name, time_unit='day', type='total', direction='in').set(rx)
traffic_gauge.labels(interface=iface_name, time_unit='day', type='total', direction='out').set(tx)
traffic_gauge.labels(interface=iface_name, time_unit='day', type='total', direction='total').set(total)
# Process monthly data if available
for entry in traffic.get('month', []):
date = f"{entry['date']['year']}-{entry['date']['month']:02d}"
rx = entry.get('rx', 0)
tx = entry.get('tx', 0)
total = rx + tx
print(f"Month data: {iface_name}, {date}, rx={rx}, tx={tx}, total={total}")
traffic_gauge.labels(interface=iface_name, time_unit='month', type='total', direction='in').set(rx)
traffic_gauge.labels(interface=iface_name, time_unit='month', type='total', direction='out').set(tx)
traffic_gauge.labels(interface=iface_name, time_unit='month', type='total', direction='total').set(total)
# Process yearly data if available
for entry in traffic.get('year', []):
date = f"{entry['date']['year']}"
rx = entry.get('rx', 0)
tx = entry.get('tx', 0)
total = rx + tx
print(f"Year data: {iface_name}, {date}, rx={rx}, tx={tx}, total={total}")
traffic_gauge.labels(interface=iface_name, time_unit='year', type='total', direction='in').set(rx)
traffic_gauge.labels(interface=iface_name, time_unit='year', type='total', direction='out').set(tx)
traffic_gauge.labels(interface=iface_name, time_unit='year', type='total', direction='total').set(total)
def update_metrics(available_traffic_cycle, available_traffic_direction, available_traffic):
"""
Fetches vnstat data and updates Prometheus metrics.
"""
try:
output = subprocess.check_output(['vnstat', '--json'], text=True)
print("Raw vnstat JSON output:")
print(output) # Print the raw JSON data for inspection
parse_vnstat_output(output)
# Check if available traffic is unlimited
if available_traffic == '0':
# Set available traffic to infinity or a very high value
available_traffic_bytes = float('inf') # 表示无限流量
else:
# Convert available traffic to bytes
available_traffic_bytes = convert_to_bytes(available_traffic)
# Set available traffic gauge
available_traffic_gauge.labels(available_traffic_cycle=available_traffic_cycle, available_traffic_direction=available_traffic_direction).set(available_traffic_bytes)
except subprocess.CalledProcessError as e:
print(f"Error fetching vnstat data: {e}")
print(f"Command output: {e.output}")
if __name__ == '__main__':
# Argument parsing
parser = argparse.ArgumentParser(description='vnstat exporter for Prometheus')
parser.add_argument('--available_traffic_cycle', required=True, help='Cycle for available traffic (e.g. monthly)')
parser.add_argument('--available_traffic_direction', required=True, help='Direction for available traffic (e.g. total)')
parser.add_argument('--available_traffic', required=True, help='Amount of available traffic (e.g. 2TB or 0 for unlimited)')
args = parser.parse_args()
# Start Prometheus metrics server
start_http_server(9112)
while True:
update_metrics(args.available_traffic_cycle, args.available_traffic_direction, args.available_traffic)
time.sleep(60) # Update every 60 seconds创建 vnstat_exporter 服务
1
vim /etc/systemd/system/vnstat_exporter.service
1
2
3
4
5
6
7
8
9
10
11
12
13
14[Unit]
Description=vnstat exporter
[Service]
ExecStart=/usr/bin/python3 /usr/local/bin/vnstat_exporter.py \
--available_traffic_cycle "Monthly" \
--available_traffic_direction "In/Out" \
--available_traffic "2TB"
WorkingDirectory=/root
Restart=always
User=root
[Install]
WantedBy=multi-user.target- 修改其中的
available_traffic_cycle
available_traffic_direction
available_traffic
available_traffic
0 为无限
1
2systemctl daemon-reload
systemctl enable --now vnstat_exporter- 修改其中的
现在可以前往
<Server IP>:9112/metrics
查看输出信息,如果没有稍等五分钟启用防火墙
1
ufw allow from <Server IP> to any port 9112 comment 'vnstat_exporter'
(Server) Add Clients to Server
编辑 Prometheus 配置
1
vim /etc/prometheus/prometheus.yml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: "prometheus"
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.
static_configs:
- targets: ["localhost:9090"]
- job_name: "remote_collector"
scrape_interval: 1m
static_configs:
- targets: ["<Client 1 IP>:9100", "<Client 2 IP>:9100"]
relabel_configs:
- source_labels: [__address__]
target_label: instance
replacement: '<Client 1 Name>'
regex: '<Client 1 IP>:9100'
- source_labels: [__address__]
target_label: instance
replacement: '<Client 2 Name>'
regex: '<Client 2 IP>:9100'
# 添加以下内容
- job_name: 'vnstat_exporter'
scrape_interval: 1m
static_configs:
- targets: ["<Client 1 IP>:9112", "<Client 2 IP>:9112"]
relabel_configs:
- source_labels: [__address__]
target_label: instance
replacement: '<Client 1 Name>'
regex: '<Client 1 IP>:9112'
- source_labels: [__address__]
target_label: instance
replacement: '<Client 2 Name>'
regex: '<Client 2 IP>:9112'添加 Grafana 变量
在 Dashboard 点击
齿轮
,Variables
,+ New variable
Name:
Traffic_Unit
Label:
Traffic Unit
Query type:
Label values
Label:
time_unit
Metric:
vnstat_traffic
点击
Apply
配置 Grafana 面板
点击
Add
,Visualization
,Query 里选择Code
,输入出口流量
1
vnstat_traffic{time_unit="$Traffic_Unit",type="total",direction="out",instance="$node"}
入口流量
1
vnstat_traffic{time_unit="$Traffic_Unit",type="total",direction="in",instance="$node"}
双向流量
1
vnstat_traffic{time_unit="$Traffic_Unit",type="total",direction="total",instance="$node"}
可用流量
1
available_traffic{instance="$node"}
流量方向
1
available_traffic{instance="$node"}
下方
Options
,Legend
选择Custom
,输入{{available_traffic_direction}}
右侧搜索
Text mode
选择Name
流量周期
1
available_traffic{instance="$node"}
下方
Options
,Legend
选择Custom
,输入{{available_traffic_cycle}}
右侧搜索
Text mode
选择Name
右侧选项
Standard Option
,Unit
选择bytes(IEC)
此时可以在顶上 Host 选择主机、在 Traffic Unit 选择统计周期
服务器续费信息
(Clients) server_exporter
1 | vim /usr/local/bin/server_exporter.py |
1 | from prometheus_client import start_http_server, Gauge |
1 | vim /etc/systemd/system/server_exporter.service |
1 | [Unit] |
1 | systemctl daemon-reload |
(Server) Add Clients to Server
编辑 Prometheus 配置
1
vim /etc/prometheus/prometheus.yml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: "prometheus"
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.
static_configs:
- targets: ["localhost:9090"]
- job_name: "remote_collector"
scrape_interval: 1m
static_configs:
- targets: ["<Client 1 IP>:9100", "<Client 2 IP>:9100"]
relabel_configs:
- source_labels: [__address__]
target_label: instance
replacement: '<Client 1 Name>'
regex: '<Client 1 IP>:9100'
- source_labels: [__address__]
target_label: instance
replacement: '<Client 2 Name>'
regex: '<Client 2 IP>:9100'
- job_name: 'vnstat_exporter'
scrape_interval: 1m
static_configs:
- targets: ["<Client 1 IP>:9112", "<Client 2 IP>:9112"]
relabel_configs:
- source_labels: [__address__]
target_label: instance
replacement: '<Client 1 Name>'
regex: '<Client 1 IP>:9112'
- source_labels: [__address__]
target_label: instance
replacement: '<Client 2 Name>'
regex: '<Client 2 IP>:9112'
# 添加以下内容
- job_name: 'server_exporter'
scrape_interval: 1d
static_configs:
- targets: ["<Client 1 IP>:9113", "<Client 2 IP>:9113"]
relabel_configs:
- source_labels: [__address__]
target_label: instance
replacement: '<Client 1 Name>'
regex: '<Client 1 IP>:9113'
- source_labels: [__address__]
target_label: instance
replacement: '<Client 2 Name>'
regex: '<Client 2 IP>:9113'在 Grafana 中添加面板
参考可用流量的设置
续费日期
1
renewal_date{instance=~"$host"} * 1000
- 下方
Options
,Legend
选择Custom
,输入{{renewal_cycle}}
- 右侧删除面板标题
- 下方
续费价格
1
renewal_price{instance=~"$host"}
- 下方
Options
,Legend
选择Custom
,输入{{renewal_currency}}
- 右侧删除面板标题
- 下方
About this Post
This post is written by OwlllOvO, licensed under CC BY-NC 4.0.