August 2, 2024

Prometheus + Grafana

Prometheus + Grafana

如果购置了多台服务器,就需要一个服务器监控程序来查看它们的运行状态

最开始我是用的是探针,如那吒监控、ServerStatus

但是哪吒监控经常出各种 bug,ServerStatus 依靠大佬们个人维护,有好几个已经停更了

于是想重新找一个开源工具搭建

Grafana + Prometheus + node_exporter 就是一个非常好的服务器状态监控解决方案

实现效果

Install

1
2
3
4
apt update
apt install ufw
ufw allow 22
ufw enable

(Clients) node_exporter

(Server) Prometheus

(Server) Add Clients to Server

(Server) Grafana

1
2
3
4
5
6
7
8
9
10
sudo apt-get install -y apt-transport-https software-properties-common wget
sudo mkdir -p /etc/apt/keyrings/
wget -q -O - https://apt.grafana.com/gpg.key | gpg --dearmor | sudo tee /etc/apt/keyrings/grafana.gpg > /dev/null
echo "deb [signed-by=/etc/apt/keyrings/grafana.gpg] https://apt.grafana.com stable main" | sudo tee -a /etc/apt/sources.list.d/grafana.list
sudo apt-get update
sudo apt-get install grafana
systemctl daemon-reload
systemctl enable --now grafana-server.service
systemctl status grafana-server
ufw allow 3000 comment 'grafana'
  1. 浏览器访问 <Server IP>:3000,初始用户名和初试密码均为 admin,登录成功后修改密码
  2. 点击左上角三横线,展开 Connections,点击 Data sources
  3. 点击 Add data source, Prometheus
    • URL: http://localhost:9090
  4. 点击 Save & test
  5. 点击左上角三横线,Dashboards, New, Import
  6. 输入 ID,如 1860,点击 Load,底部选择数据源为 Prometheus,点击 Import
  7. 完成,现在可以通过 <Server IP>:3000 查看仪表盘

Traffic Statistics: vnstat

使用 Grafana + Prometheus + node_exporter 可以实时监控客户端传递的数据,对各种实时数据的监控效果良好

然而,对于需要进行时间段汇总的任务,如流量统计等,效果非常有限,而且数据和实际值差别较大

由于其仅记录每个时间点的数据,无法像数据库那样根据客户端传输的数据更新每个时段的流量信息

因此,我采用 vnstat 进行流量信息统计,然后导出给 Prometheus,从而在 Grafana 面板上展示

但是我找了一圈没有找到 vnstat Exporter,只好自己手搓了一个,实现的效果略显粗糙,仅供参考

(Clients) vnstat

  1. Install vnstat

    1
    2
    apt install vnstat
    systemctl enable vnstat

    vnstat 从安装完成后开始统计流量信息,每五分钟更新一次,如果还没有信息就稍等一会

    • 按小时查看流量

      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      12
      13
      14
      15
      root@localhost:~# vnstat -h

      eth0 / hourly

      hour rx | tx | total | avg. rate
      ------------------------+-------------+-------------+---------------
      2024-08-01
      21:00 9.07 MiB | 22.31 MiB | 31.39 MiB | 73.13 kbit/s
      22:00 9.36 MiB | 22.78 MiB | 32.14 MiB | 74.90 kbit/s
      23:00 10.91 MiB | 44.68 MiB | 55.59 MiB | 129.53 kbit/s
      2024-08-02
      00:00 118.04 MiB | 6.00 GiB | 6.11 GiB | 14.59 Mbit/s
      01:00 124.24 GiB | 7.18 GiB | 131.42 GiB | 313.57 Mbit/s
      02:00 45.43 GiB | 8.37 GiB | 53.80 GiB | 128.36 Mbit/s
      ------------------------+-------------+-------------+---------------

      此外,可以按 5 分钟 -5、日 -d、月 -m、年 -y 查看、导出为 json --json

  2. 创建 vnstat_exporter.py 脚本

    只成功用 prometheus_client 写了一个 Python 脚本,对资源消耗较高,之后有机会重写成 Shell 脚本

    1
    2
    3
    apt install python3-pip
    pip install prometheus-client
    vim /usr/local/bin/vnstat_exporter.py
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    65
    66
    67
    68
    69
    70
    71
    72
    73
    74
    75
    76
    77
    78
    79
    80
    81
    82
    83
    84
    85
    86
    87
    88
    89
    90
    91
    92
    93
    94
    95
    96
    97
    98
    99
    100
    101
    102
    103
    104
    105
    106
    107
    108
    109
    110
    111
    112
    113
    114
    115
    116
    117
    118
    119
    120
    121
    122
    123
    124
    125
    126
    127
    128
    129
    130
    131
    132
    133
    134
    135
    136
    137
    138
    139
    140
    from prometheus_client import start_http_server, Gauge
    import subprocess
    import json
    import time
    import argparse
    import re

    # Define metrics
    traffic_gauge = Gauge('vnstat_traffic', 'Traffic usage from vnstat',
    ['interface', 'time_unit', 'type', 'direction'])

    available_traffic_gauge = Gauge('available_traffic', 'Available traffic',
    ['available_traffic_cycle', 'available_traffic_direction'])

    def convert_to_bytes(traffic_str):
    """
    Converts a traffic string (e.g. '2TB', '500GB', '250MB') to bytes.
    """
    unit_multipliers = {
    'B': 1,
    'KB': 1024,
    'MB': 1024**2,
    'GB': 1024**3,
    'TB': 1024**4,
    }

    # Match the number and the unit
    match = re.match(r'(\d+(?:\.\d+)?)\s*([KMGTP]?B)', traffic_str.strip())
    if match:
    value = float(match.group(1))
    unit = match.group(2)
    return value * unit_multipliers[unit]
    else:
    raise ValueError(f"Invalid traffic string: {traffic_str}")

    def parse_vnstat_output(output):
    """
    Parses the vnstat JSON output and updates Prometheus metrics.
    """
    data = json.loads(output)

    for interface in data.get('interfaces', []):
    iface_name = interface.get('name', 'unknown')
    traffic = interface.get('traffic', {})

    # Process traffic data for each time unit

    # Process 5-minute data if available
    for entry in traffic.get('fiveminute', []):
    timestamp = f"{entry['date']['year']}-{entry['date']['month']:02d}-{entry['date']['day']:02d} {entry['time']['hour']:02d}:{entry['time']['minute']:02d}"
    rx = entry.get('rx', 0)
    tx = entry.get('tx', 0)
    total = rx + tx
    print(f"5-min data: {iface_name}, {timestamp}, rx={rx}, tx={tx}, total={total}")
    traffic_gauge.labels(interface=iface_name, time_unit='five_minute', type='total', direction='in').set(rx)
    traffic_gauge.labels(interface=iface_name, time_unit='five_minute', type='total', direction='out').set(tx)
    traffic_gauge.labels(interface=iface_name, time_unit='five_minute', type='total', direction='total').set(total)

    # Process hourly data if available
    for entry in traffic.get('hour', []):
    timestamp = f"{entry['date']['year']}-{entry['date']['month']:02d}-{entry['date']['day']:02d} {entry['time']['hour']:02d}:00"
    rx = entry.get('rx', 0)
    tx = entry.get('tx', 0)
    total = rx + tx
    print(f"Hour data: {iface_name}, {timestamp}, rx={rx}, tx={tx}, total={total}")
    traffic_gauge.labels(interface=iface_name, time_unit='hour', type='total', direction='in').set(rx)
    traffic_gauge.labels(interface=iface_name, time_unit='hour', type='total', direction='out').set(tx)
    traffic_gauge.labels(interface=iface_name, time_unit='hour', type='total', direction='total').set(total)

    # Process daily data if available
    for entry in traffic.get('day', []):
    date = f"{entry['date']['year']}-{entry['date']['month']:02d}-{entry['date']['day']:02d}"
    rx = entry.get('rx', 0)
    tx = entry.get('tx', 0)
    total = rx + tx
    print(f"Day data: {iface_name}, {date}, rx={rx}, tx={tx}, total={total}")
    traffic_gauge.labels(interface=iface_name, time_unit='day', type='total', direction='in').set(rx)
    traffic_gauge.labels(interface=iface_name, time_unit='day', type='total', direction='out').set(tx)
    traffic_gauge.labels(interface=iface_name, time_unit='day', type='total', direction='total').set(total)

    # Process monthly data if available
    for entry in traffic.get('month', []):
    date = f"{entry['date']['year']}-{entry['date']['month']:02d}"
    rx = entry.get('rx', 0)
    tx = entry.get('tx', 0)
    total = rx + tx
    print(f"Month data: {iface_name}, {date}, rx={rx}, tx={tx}, total={total}")
    traffic_gauge.labels(interface=iface_name, time_unit='month', type='total', direction='in').set(rx)
    traffic_gauge.labels(interface=iface_name, time_unit='month', type='total', direction='out').set(tx)
    traffic_gauge.labels(interface=iface_name, time_unit='month', type='total', direction='total').set(total)

    # Process yearly data if available
    for entry in traffic.get('year', []):
    date = f"{entry['date']['year']}"
    rx = entry.get('rx', 0)
    tx = entry.get('tx', 0)
    total = rx + tx
    print(f"Year data: {iface_name}, {date}, rx={rx}, tx={tx}, total={total}")
    traffic_gauge.labels(interface=iface_name, time_unit='year', type='total', direction='in').set(rx)
    traffic_gauge.labels(interface=iface_name, time_unit='year', type='total', direction='out').set(tx)
    traffic_gauge.labels(interface=iface_name, time_unit='year', type='total', direction='total').set(total)

    def update_metrics(available_traffic_cycle, available_traffic_direction, available_traffic):
    """
    Fetches vnstat data and updates Prometheus metrics.
    """
    try:
    output = subprocess.check_output(['vnstat', '--json'], text=True)
    print("Raw vnstat JSON output:")
    print(output) # Print the raw JSON data for inspection
    parse_vnstat_output(output)

    # Check if available traffic is unlimited
    if available_traffic == '0':
    # Set available traffic to infinity or a very high value
    available_traffic_bytes = float('inf') # 表示无限流量
    else:
    # Convert available traffic to bytes
    available_traffic_bytes = convert_to_bytes(available_traffic)

    # Set available traffic gauge
    available_traffic_gauge.labels(available_traffic_cycle=available_traffic_cycle, available_traffic_direction=available_traffic_direction).set(available_traffic_bytes)
    except subprocess.CalledProcessError as e:
    print(f"Error fetching vnstat data: {e}")
    print(f"Command output: {e.output}")

    if __name__ == '__main__':
    # Argument parsing
    parser = argparse.ArgumentParser(description='vnstat exporter for Prometheus')
    parser.add_argument('--available_traffic_cycle', required=True, help='Cycle for available traffic (e.g. monthly)')
    parser.add_argument('--available_traffic_direction', required=True, help='Direction for available traffic (e.g. total)')
    parser.add_argument('--available_traffic', required=True, help='Amount of available traffic (e.g. 2TB or 0 for unlimited)')

    args = parser.parse_args()

    # Start Prometheus metrics server
    start_http_server(9112)
    while True:
    update_metrics(args.available_traffic_cycle, args.available_traffic_direction, args.available_traffic)
    time.sleep(60) # Update every 60 seconds
  3. 创建 vnstat_exporter 服务

    1
    vim /etc/systemd/system/vnstat_exporter.service
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    [Unit]
    Description=vnstat exporter

    [Service]
    ExecStart=/usr/bin/python3 /usr/local/bin/vnstat_exporter.py \
    --available_traffic_cycle "Monthly" \
    --available_traffic_direction "In/Out" \
    --available_traffic "2TB"
    WorkingDirectory=/root
    Restart=always
    User=root

    [Install]
    WantedBy=multi-user.target
    • 修改其中的 available_traffic_cycle available_traffic_direction available_traffic
    • available_traffic 0 为无限
    1
    2
    systemctl daemon-reload
    systemctl enable --now vnstat_exporter
  4. 现在可以前往 <Server IP>:9112/metrics 查看输出信息,如果没有稍等五分钟

  5. 启用防火墙

    1
    ufw allow from <Server IP> to any port 9112 comment 'vnstat_exporter'

(Server) Add Clients to Server

  1. 编辑 Prometheus 配置

    1
    vim /etc/prometheus/prometheus.yml
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    scrape_configs:
    # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
    - job_name: "prometheus"

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
    - targets: ["localhost:9090"]
    - job_name: "remote_collector"
    scrape_interval: 1m
    static_configs:
    - targets: ["<Client 1 IP>:9100", "<Client 2 IP>:9100"]
    relabel_configs:
    - source_labels: [__address__]
    target_label: instance
    replacement: '<Client 1 Name>'
    regex: '<Client 1 IP>:9100'
    - source_labels: [__address__]
    target_label: instance
    replacement: '<Client 2 Name>'
    regex: '<Client 2 IP>:9100'

    # 添加以下内容
    - job_name: 'vnstat_exporter'
    scrape_interval: 1m
    static_configs:
    - targets: ["<Client 1 IP>:9112", "<Client 2 IP>:9112"]
    relabel_configs:
    - source_labels: [__address__]
    target_label: instance
    replacement: '<Client 1 Name>'
    regex: '<Client 1 IP>:9112'
    - source_labels: [__address__]
    target_label: instance
    replacement: '<Client 2 Name>'
    regex: '<Client 2 IP>:9112'
  2. 添加 Grafana 变量

    在 Dashboard 点击 齿轮Variables, + New variable

    • Name: Traffic_Unit

    • Label: Traffic Unit

    • Query type: Label values

    • Label: time_unit

    • Metric: vnstat_traffic

    点击 Apply

  3. 配置 Grafana 面板

    点击 Add, Visualization,Query 里选择 Code,输入

    • 出口流量

      1
      vnstat_traffic{time_unit="$Traffic_Unit",type="total",direction="out",instance="$node"}
    • 入口流量

      1
      vnstat_traffic{time_unit="$Traffic_Unit",type="total",direction="in",instance="$node"}
    • 双向流量

      1
      vnstat_traffic{time_unit="$Traffic_Unit",type="total",direction="total",instance="$node"}
    • 可用流量

      1
      available_traffic{instance="$node"}
    • 流量方向

      1
      available_traffic{instance="$node"}
      • 下方 Options, Legend 选择 Custom,输入 {{available_traffic_direction}}

      • 右侧搜索 Text mode 选择 Name

    • 流量周期

      1
      available_traffic{instance="$node"}

      下方 Options, Legend 选择 Custom,输入 {{available_traffic_cycle}}

      右侧搜索 Text mode 选择 Name

    • 右侧选项 Standard Option, Unit 选择 bytes(IEC)

  4. 此时可以在顶上 Host 选择主机、在 Traffic Unit 选择统计周期

服务器续费信息

(Clients) server_exporter

1
vim /usr/local/bin/server_exporter.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
from prometheus_client import start_http_server, Gauge
import time
import argparse

# Define Prometheus metrics
renewal_date_gauge = Gauge('renewal_date', 'Renewal date of the service (timestamp)',
['renewal_cycle'])
renewal_price_gauge = Gauge('renewal_price', 'Renewal price of the service',
['renewal_currency'])

def update_metrics(renewal_date, renewal_cycle, renewal_price, renewal_currency):
"""
Update Prometheus metrics
"""
# Convert renewal date to timestamp
try:
# Here we assume renewal_date is a valid date string, e.g., '2024-12-31'
timestamp = time.mktime(time.strptime(renewal_date, '%Y-%m-%d'))
renewal_date_gauge.labels(renewal_cycle=renewal_cycle).set(timestamp)
except ValueError as e:
print(f"Invalid renewal date format: {renewal_date}. Error: {e}")

# Update renewal price metric
try:
renewal_price_value = float(renewal_price) # Ensure price is a number
renewal_price_gauge.labels(renewal_currency=renewal_currency).set(renewal_price_value)
except ValueError as e:
print(f"Invalid renewal price format: {renewal_price}. Error: {e}")

if __name__ == '__main__':
# Argument parsing
parser = argparse.ArgumentParser(description='Renewal information exporter for Prometheus')
parser.add_argument('--renewal_date', required=True, help='Renewal date of the service (e.g. YYYY-MM-DD)')
parser.add_argument('--renewal_cycle', required=True, help='Renewal cycle (e.g. monthly, yearly)')
parser.add_argument('--renewal_price', required=True, help='Renewal price (e.g. 29.99)')
parser.add_argument('--renewal_currency', required=True, help='Currency for the renewal price (e.g. USD, EUR)')

args = parser.parse_args()

# Start Prometheus metrics server
start_http_server(9113) # Use a different port to avoid conflicts with vnstat_exporter
while True:
update_metrics(args.renewal_date, args.renewal_cycle, args.renewal_price, args.renewal_currency)
time.sleep(60*60*24) # Update once a day
1
vim /etc/systemd/system/server_exporter.service
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
[Unit]
Description=server exporter

[Service]
ExecStart=/usr/bin/python3 /usr/local/bin/server_exporter.py \
--renewal_date "2024-12-31" \
--renewal_cycle "Annually" \
--renewal_price "12.34" \
--renewal_currency "USD"
WorkingDirectory=/root
Restart=always
User=root

[Install]
WantedBy=multi-user.target
1
2
3
systemctl daemon-reload
systemctl enable --now server_exporter
ufw allow from <Server IP> to any port 9113 comment 'server_exporter'

(Server) Add Clients to Server

  1. 编辑 Prometheus 配置

    1
    vim /etc/prometheus/prometheus.yml
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    scrape_configs:
    # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
    - job_name: "prometheus"

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
    - targets: ["localhost:9090"]
    - job_name: "remote_collector"
    scrape_interval: 1m
    static_configs:
    - targets: ["<Client 1 IP>:9100", "<Client 2 IP>:9100"]
    relabel_configs:
    - source_labels: [__address__]
    target_label: instance
    replacement: '<Client 1 Name>'
    regex: '<Client 1 IP>:9100'
    - source_labels: [__address__]
    target_label: instance
    replacement: '<Client 2 Name>'
    regex: '<Client 2 IP>:9100'

    - job_name: 'vnstat_exporter'
    scrape_interval: 1m
    static_configs:
    - targets: ["<Client 1 IP>:9112", "<Client 2 IP>:9112"]
    relabel_configs:
    - source_labels: [__address__]
    target_label: instance
    replacement: '<Client 1 Name>'
    regex: '<Client 1 IP>:9112'
    - source_labels: [__address__]
    target_label: instance
    replacement: '<Client 2 Name>'
    regex: '<Client 2 IP>:9112'

    # 添加以下内容
    - job_name: 'server_exporter'
    scrape_interval: 1d
    static_configs:
    - targets: ["<Client 1 IP>:9113", "<Client 2 IP>:9113"]
    relabel_configs:
    - source_labels: [__address__]
    target_label: instance
    replacement: '<Client 1 Name>'
    regex: '<Client 1 IP>:9113'
    - source_labels: [__address__]
    target_label: instance
    replacement: '<Client 2 Name>'
    regex: '<Client 2 IP>:9113'
  2. 在 Grafana 中添加面板

    • 参考可用流量的设置

    • 续费日期

      1
      renewal_date{instance=~"$host"} * 1000
      • 下方 Options, Legend 选择 Custom,输入 {{renewal_cycle}}
      • 右侧删除面板标题
    • 续费价格

      1
      renewal_price{instance=~"$host"}
      • 下方 Options, Legend 选择 Custom,输入 {{renewal_currency}}
      • 右侧删除面板标题

About this Post

This post is written by OwlllOvO, licensed under CC BY-NC 4.0.

#Server#Monitor