node-exporter 가 주요 매트릭(cpu 사용량, file system 사용량 등) 을 제대로 수집하고 있는지 테스트하여 결과를 남긴다.
Prometheus - exporter들의 정확성 테스트
1. 테스트 설계
- Node exporter, cAdvisor 등으로 metric을 수집한다.
- shell script로 서버와 docker container(어플리케이션)의 metric을 수집한다.
- shell script 로 직접 수집한 metric값을 시각화, Grafana 에서 확인할 수 있는 값들과 비교한다.
- 참고: python의 matplotlib으로 csv 파일을 간단히 시각화할 수 있다.
- https://matplotlib.org/
2. Node exporter 테스트
1) 테스트 방법
- linux
top
명령어 출력물을 수집한다. - 출력물을 시계열대에 맞춰 csv 로 파싱한다.
- 시각화 후 결과물을 비교한다.
2) 테스트 결과
3) 테스트 세부사항
(1) TOP to txt shell script
#!/bin/bash
OUTPUT_FILE="top_output.txt"
INTERVAL=5
COUNT=12
echo "Capturing top output every $INTERVAL seconds for $COUNT intervals"
# Clear the output file if it exists
> $OUTPUT_FILE
# Run top command and append output to the file at specified intervals
for ((i=1; i<=COUNT; i++))
do
echo "Timestamp: $(date +"%Y-%m-%d %H:%M:%S")" >> $OUTPUT_FILE
top -b -n 1 | head -n 20 >> $OUTPUT_FILE
sleep $INTERVAL
done
echo "Capture complete."
2) TOP output txt to csv
import re
import csv
# Define the input and output file names
input_file = "top_output.txt"
output_file = "top_output.csv"
# Regex patterns to extract relevant data
timestamp_pattern = re.compile(r"Timestamp: (\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2})")
cpu_pattern = re.compile(r"%Cpu\(s\):\s+(\d+\.\d+) us,\s+(\d+\.\d+) sy,\s+(\d+\.\d+) ni,\s+(\d+\.\d+) id,\s+(\d+\.\d+) wa,\s+(\d+\.\d+) hi,\s+(\d+\.\d+) si,\s+(\d+\.\d+) st")
mem_pattern = re.compile(r"KiB Mem\s+:\s+(\d+)\+?\s*total,\s+(\d+)\+?\s*free,\s+(\d+)\s*used,\s+(\d+)\s*buff/cache")
swap_pattern = re.compile(r"KiB Swap:\s+(\d+)\s+total,\s+(\d+)\s+free,\s+(\d+)\s+used\.\s+(\d+)\+avail Mem")
# Open the output CSV file for writing
with open(output_file, mode='w') as csvfile:
csvwriter = csv.writer(csvfile)
# Write the header
csvwriter.writerow(["Timestamp", "CPU Usage (%)", "System CPU Usage (%)", "Nice CPU Usage (%)",
"Idle CPU (%)", "IO Wait CPU (%)", "Hardware Interrupts (%)",
"Software Interrupts (%)", "Steal Time (%)",
"Memory Total (KiB)", "Memory Free (KiB)", "Memory Used (KiB)",
"Swap Total (KiB)", "Swap Free (KiB)", "Swap Used (KiB)"])
# Read the input file
with open(input_file, mode='r') as infile:
lines = infile.readlines()
i = 0
while i < len(lines):
line = lines[i].strip()
# Check for timestamp
timestamp_match = timestamp_pattern.match(line)
if timestamp_match:
timestamp = timestamp_match.group(1)
# print('timestamp_match',timestamp_match)
cpu_usage = None
system_cpu_usage = None
nice_cpu_usage = None
idle_cpu = None
io_wait_cpu = None
hardware_interrupts = None
software_interrupts = None
steal_time = None
mem_total = None
mem_free = None
mem_used = None
swap_total = None
swap_free = None
swap_used = None
# Extract CPU usage
cpu_match = cpu_pattern.search(lines[i + 3])
if cpu_match:
cpu_usage = float(cpu_match.group(1))
system_cpu_usage = float(cpu_match.group(2))
nice_cpu_usage = float(cpu_match.group(3))
idle_cpu = float(cpu_match.group(4))
io_wait_cpu = float(cpu_match.group(5))
hardware_interrupts = float(cpu_match.group(6))
software_interrupts = float(cpu_match.group(7))
steal_time = float(cpu_match.group(8))
else:
print("Failed to parse CPU usage from line:", lines[i + 3].strip())
# Extract memory usage
mem_match = mem_pattern.search(lines[i + 4])
if mem_match:
# print('mem_match',mem_match )
mem_total = int(mem_match.group(1))
mem_free = int(mem_match.group(2))
mem_used = int(mem_match.group(3))
else:
print("Failed to parse memory usage from line:", lines[i + 4].strip())
# Extract swap usage
# swap_match = re.search(pattern2, lines[i + 5])
swap_match = swap_pattern.search(lines[i + 5])
if swap_match:
swap_total = int(swap_match.group(1))
swap_free = int(swap_match.group(2))
swap_used = int(swap_match.group(3))
else:
print("Failed to parse swap usage from line:", lines[i + 5].strip())
# Calculate usage percentages
if mem_total is not None and mem_total > 0:
mem_usage = (float(mem_used) / mem_total) * 100
else:
mem_usage = 0.0
if swap_total is not None and swap_total > 0:
swap_usage = (float(swap_used) / swap_total) * 100
else:
swap_usage = 0.0
# Write to CSV
csvwriter.writerow([timestamp, cpu_usage, system_cpu_usage, nice_cpu_usage,
idle_cpu, io_wait_cpu, hardware_interrupts,
software_interrupts, steal_time,
mem_total, mem_free, mem_used,
swap_total, swap_free, swap_used])
# Move to the next set of lines
i += 5 # move to the next timestamp block
else:
# If line doesn't match timestamp, skip to the next line
i += 1
print("Data has been written to", output_file)
3. cAdvisor 테스트
docker stats 와 cadvisor 가 수집한 값들을 비교.
3) 상세
(1) docker status 수집 shell script
#!/bin/bash
OUTPUT_FILE="docker_stats_output.txt"
INTERVAL=5
COUNT=12
echo "Capturing Docker stats every $INTERVAL seconds for $COUNT intervals"
# Clear the output file if it exists
> $OUTPUT_FILE
# Run docker stats and append output to the file at specified intervals
for ((i=1; i<=COUNT; i++))
do
echo "Timestamp: $(date +"%Y-%m-%d %H:%M:%S")" >> $OUTPUT_FILE
docker stats --no-stream >> $OUTPUT_FILE
echo "" >> $OUTPUT_FILE
sleep $INTERVAL
done
echo "Capture complete."
(2) timestamp 단위로 파싱하는 python script
'TIL > Monitoring(k8s, grafana)' 카테고리의 다른 글
Telegraf 를 활용한 Prometheus exporter metric 수집 및 DB 적재 (0) | 2025.03.04 |
---|---|
[RKE2] Grafana 의 ingress (0) | 2025.02.28 |
[모니터링, k8s]RKE2 기반 k8s 모니터링 (0) | 2025.02.21 |
[Grafana]pod 생성 시 user provisioning (0) | 2025.02.06 |
[AI, Kubeflow] Kubeflow bootstrap 소개 (0) | 2025.01.24 |