TIL/Monitoring(k8s, grafana)

Prometheus - exporter들의 정확성 테스트

쓱쓱565 2025. 2. 21. 19:04

node-exporter 가 주요 매트릭(cpu 사용량, file system 사용량 등) 을 제대로 수집하고 있는지 테스트하여 결과를 남긴다.

Prometheus - exporter들의 정확성 테스트

1. 테스트 설계

Node exporter, cAdvisor 등으로 metric을 수집한다.
shell script로 서버와 docker container(어플리케이션)의 metric을 수집한다.
shell script 로 직접 수집한 metric값을 시각화, Grafana 에서 확인할 수 있는 값들과 비교한다.
- 참고: python의 matplotlib으로 csv 파일을 간단히 시각화할 수 있다.
- https://matplotlib.org/

2. Node exporter 테스트

1) 테스트 방법

linux top 명령어 출력물을 수집한다.
출력물을 시계열대에 맞춰 csv 로 파싱한다.
시각화 후 결과물을 비교한다.

2) 테스트 결과

3) 테스트 세부사항

(1) TOP to txt shell script

#!/bin/bash

OUTPUT_FILE="top_output.txt"
INTERVAL=5
COUNT=12

echo "Capturing top output every $INTERVAL seconds for $COUNT intervals"

# Clear the output file if it exists
> $OUTPUT_FILE

# Run top command and append output to the file at specified intervals
for ((i=1; i<=COUNT; i++))
do
    echo "Timestamp: $(date +"%Y-%m-%d %H:%M:%S")" >> $OUTPUT_FILE
    top -b -n 1 | head -n 20 >> $OUTPUT_FILE
    sleep $INTERVAL
done

echo "Capture complete."

2) TOP output txt to csv

import re
import csv

# Define the input and output file names
input_file = "top_output.txt"
output_file = "top_output.csv"

# Regex patterns to extract relevant data
timestamp_pattern = re.compile(r"Timestamp: (\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2})")
cpu_pattern = re.compile(r"%Cpu\(s\):\s+(\d+\.\d+) us,\s+(\d+\.\d+) sy,\s+(\d+\.\d+) ni,\s+(\d+\.\d+) id,\s+(\d+\.\d+) wa,\s+(\d+\.\d+) hi,\s+(\d+\.\d+) si,\s+(\d+\.\d+) st")
mem_pattern = re.compile(r"KiB Mem\s+:\s+(\d+)\+?\s*total,\s+(\d+)\+?\s*free,\s+(\d+)\s*used,\s+(\d+)\s*buff/cache")
swap_pattern = re.compile(r"KiB Swap:\s+(\d+)\s+total,\s+(\d+)\s+free,\s+(\d+)\s+used\.\s+(\d+)\+avail Mem")

# Open the output CSV file for writing
with open(output_file, mode='w') as csvfile:
    csvwriter = csv.writer(csvfile)
    # Write the header
    csvwriter.writerow(["Timestamp", "CPU Usage (%)", "System CPU Usage (%)", "Nice CPU Usage (%)",
                        "Idle CPU (%)", "IO Wait CPU (%)", "Hardware Interrupts (%)",
                        "Software Interrupts (%)", "Steal Time (%)",
                        "Memory Total (KiB)", "Memory Free (KiB)", "Memory Used (KiB)",
                        "Swap Total (KiB)", "Swap Free (KiB)", "Swap Used (KiB)"])

    # Read the input file
    with open(input_file, mode='r') as infile:
        lines = infile.readlines()
        i = 0
        while i < len(lines):
            line = lines[i].strip()

            # Check for timestamp
            timestamp_match = timestamp_pattern.match(line)

            if timestamp_match:
                timestamp = timestamp_match.group(1)
                # print('timestamp_match',timestamp_match)
                cpu_usage = None
                system_cpu_usage = None
                nice_cpu_usage = None
                idle_cpu = None
                io_wait_cpu = None
                hardware_interrupts = None
                software_interrupts = None
                steal_time = None
                mem_total = None
                mem_free = None
                mem_used = None
                swap_total = None
                swap_free = None
                swap_used = None

                # Extract CPU usage
                cpu_match = cpu_pattern.search(lines[i + 3])
                if cpu_match:
                    cpu_usage = float(cpu_match.group(1))
                    system_cpu_usage = float(cpu_match.group(2))
                    nice_cpu_usage = float(cpu_match.group(3))
                    idle_cpu = float(cpu_match.group(4))
                    io_wait_cpu = float(cpu_match.group(5))
                    hardware_interrupts = float(cpu_match.group(6))
                    software_interrupts = float(cpu_match.group(7))
                    steal_time = float(cpu_match.group(8))
                else:
                    print("Failed to parse CPU usage from line:", lines[i + 3].strip())

                # Extract memory usage
                mem_match = mem_pattern.search(lines[i + 4])
                if mem_match:
                    # print('mem_match',mem_match )
                    mem_total = int(mem_match.group(1))
                    mem_free = int(mem_match.group(2))
                    mem_used = int(mem_match.group(3))
                else:
                    print("Failed to parse memory usage from line:", lines[i + 4].strip())

                # Extract swap usage
                # swap_match = re.search(pattern2, lines[i + 5])
                swap_match = swap_pattern.search(lines[i + 5])
                if swap_match:
                    swap_total = int(swap_match.group(1))
                    swap_free = int(swap_match.group(2))
                    swap_used = int(swap_match.group(3))
                else:
                    print("Failed to parse swap usage from line:", lines[i + 5].strip())

                # Calculate usage percentages
                if mem_total is not None and mem_total > 0:
                    mem_usage = (float(mem_used) / mem_total) * 100
                else:
                    mem_usage = 0.0

                if swap_total is not None and swap_total > 0:
                    swap_usage = (float(swap_used) / swap_total) * 100
                else:
                    swap_usage = 0.0

                # Write to CSV
                csvwriter.writerow([timestamp, cpu_usage, system_cpu_usage, nice_cpu_usage,
                                    idle_cpu, io_wait_cpu, hardware_interrupts,
                                    software_interrupts, steal_time,
                                    mem_total, mem_free, mem_used,
                                    swap_total, swap_free, swap_used])

                # Move to the next set of lines
                i += 5  # move to the next timestamp block
            else:
                # If line doesn't match timestamp, skip to the next line
                i += 1

print("Data has been written to", output_file)

3. cAdvisor 테스트

docker stats 와 cadvisor 가 수집한 값들을 비교.

3) 상세

(1) docker status 수집 shell script


#!/bin/bash

OUTPUT_FILE="docker_stats_output.txt"
INTERVAL=5
COUNT=12

echo "Capturing Docker stats every $INTERVAL seconds for $COUNT intervals"

# Clear the output file if it exists
> $OUTPUT_FILE

# Run docker stats and append output to the file at specified intervals
for ((i=1; i<=COUNT; i++))
do
    echo "Timestamp: $(date +"%Y-%m-%d %H:%M:%S")" >> $OUTPUT_FILE
    docker stats --no-stream >> $OUTPUT_FILE
    echo "" >> $OUTPUT_FILE
    sleep $INTERVAL
done

echo "Capture complete."

(2) timestamp 단위로 파싱하는 python script

저작자표시 비영리 변경금지 (새창열림)

'TIL > Monitoring(k8s, grafana)' 카테고리의 다른 글

Telegraf 를 활용한 Prometheus exporter metric 수집 및 DB 적재 (0)	2025.03.04
[RKE2] Grafana 의 ingress (0)	2025.02.28
[모니터링, k8s]RKE2 기반 k8s 모니터링 (0)	2025.02.21
[Grafana]pod 생성 시 user provisioning (0)	2025.02.06
[AI, Kubeflow] Kubeflow bootstrap 소개 (0)	2025.01.24

현재글Prometheus - exporter들의 정확성 테스트

BFS, SSAFY, 백준, 스택, 개발자, 쿠버네티스, 모니터링, 그라파나, 알고리즘, 파이썬, prometheus, JavaScript, 회고, grafana, 싸피, 노베이스, 비전공, k8s, monitoring, loki,

Today :
Yesterday :

일	월	화	수	목	금	토
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

개발자 되기(feat.SSAFY)

Prometheus - exporter들의 정확성 테스트

Prometheus - exporter들의 정확성 테스트

1. 테스트 설계

2. Node exporter 테스트

1) 테스트 방법

2) 테스트 결과

3) 테스트 세부사항

(1) TOP to txt shell script

2) TOP output txt to csv

3. cAdvisor 테스트

3) 상세

(1) docker status 수집 shell script

(2) timestamp 단위로 파싱하는 python script

'TIL > Monitoring(k8s, grafana)' 카테고리의 다른 글

'TIL/Monitoring(k8s, grafana)'의 다른글

티스토리툴바

Prometheus - exporter들의 정확성 테스트

Prometheus - exporter들의 정확성 테스트

1. 테스트 설계

2. Node exporter 테스트

1) 테스트 방법

2) 테스트 결과

3) 테스트 세부사항

(1) TOP to txt shell script

2) TOP output txt to csv

3. cAdvisor 테스트

3) 상세

(1) docker status 수집 shell script

(2) timestamp 단위로 파싱하는 python script

'TIL > Monitoring(k8s, grafana)' 카테고리의 다른 글

'TIL/Monitoring(k8s, grafana)'의 다른글

관련글

티스토리툴바