The Reveal Playbook: Which Chart to Open First (and Which KPI Actually Matters)

Field Notes #2 · TL;DR — Load test analysis isn't "open every chart." It's a pattern: start with the executive KPIs, follow the symptom, pick the visualization built for that question. PerfSage Reveal ships 29 charts — this playbook tells you which 3–5 to open for tail latency, errors, saturation, flakiness, and stakeholder sign-off. All screenshots from a real 3-minute JMeter run against public APIs.

Analysis is a sequence, not a slideshow

Most engineers do this:

Upload .jtl
Scroll every tab
Screenshot whatever looks red
Still not sure what to fix

What senior perf engineers do:

Phase	Question	Where in Reveal
1 · Orient	Pass or fail? What’s the headline?	KPI summary + SLO gauges
2 · Diagnose	Why — tail, errors, saturation, or noise?	Symptom-specific chart
3 · Decide	Ship, investigate, or re-test?	Checklist + export

Reveal is built around that flow. Below is the symptom → chart → KPI map I use on every run.

🧭 Quick reference: pick your chart in 10 seconds

Save this table. Match your stand-up question to a row — open that chart first.

🔍 You need to know…	📊 Open this first	🎯 KPIs to watch
Did we pass release gates?	Executive summary	SLO verdict · p99 · error rate
Are averages hiding pain?	Scatter plot	p99 ÷ median · outlier clusters
Did we fail SLO on latency or errors?	SLO gauges + Apdex by label	Apdex · error % · p99 ms
Which endpoint broke?	Error sunburst	Error count by label + status code
What % of users hit slow responses?	Latency CDF	p90 · p95 · p99 lines
Which transaction has spread/outliers?	Boxplot per label	Whiskers · outlier dots
Is performance flaky vs consistently slow?	CV + IQR outlier scatter	CV by label · IQR fences
Are we hitting throughput ceiling?	RT vs throughput	p90 at knee · req/s plateau
Does latency jitter under fixed load?	RT vs concurrency	p90 band width at steady threads
Server slow or network slow?	Latency components	Connect · TTFB · transfer
Fast failures vs slow successes?	RT by status	Success cloud vs error baseline
At what load do errors start?	Threads vs errors heatmap	Error rate bucket at thread count
Which metrics move together?	Correlation matrix	Pearson r (spot red herrings)
Explain to PM / dev in prose	AI analysis (optional)	Narrative + flagged anomalies

Phase 1 · Orient — start at the summary (always)

Before any deep chart, I need one screen that answers: samples, verdict, headline percentiles, auto-flagged issues.

3,587 Samples

FAIL SLO Verdict

31.7% Error Rate

1,124 ms P99

Pattern: If recommendations mention tail ratio, errors, or variability — jump to that section below. If green across the board, still check tail ratio (Field Notes #1).

Pattern A · “Users say it’s slow” — tail latency

Symptom

Stand-up says latency is fine. Product says checkout feels sluggish. Average looks polite.

Chart → Response time scatter

KPIs → p99, p99 ÷ median, outlier clusters

Response time scatter by transaction showing a second cloud of outliers up to 30 seconds above the main band — Two clouds = two experiences. JSONPlaceholder spikes to ~30,000 ms while the main band sits under 1,000 ms.

What I look for: A second band above the main cluster. That is tail latency — invisible in averages, obvious in scatter.

Next moves:

Note which label owns the upper cloud
Check when it starts (ramp vs steady state)
Open boxplot per label to quantify spread

Boxplots per transaction label showing JSONPlaceholder whiskers to 30k ms and RT heatmap density over time — Boxplot confirms *who* · heatmap shows *when* density shifts into slower buckets.

Pattern B · “Did we pass SLO?” — gates and the Apdex paradox

Symptom

Need a release yes/no. Stakeholders want red/green, not percentiles lecture.

Chart → SLO gauges + Apdex by label

KPIs → SLO verdict, error rate %, Apdex, p99 ms

SLO gauges showing Apdex 0.966, error rate 31.5% in red, p99 1120ms, and Apdex by label bar chart — Apdex 0.966 looks excellent — but error rate 31.5% fails the run. Never read latency satisfaction without availability.

The Apdex paradox: Successful requests were fast (Apdex ~0.94–1.0 per label). Nearly one-third of requests failed. Green latency + red errors = still a no-ship.

When to use Apdex: translating tail latency for PMs (“12% frustrated” beats “p99 = 480 ms”).

When to ignore Apdex: error rate is elevated — fix availability first.

Pattern C · “Something broke” — errors

Symptom

Error rate > 0. Need endpoint + status code in under a minute.

Chart → Error sunburst (+ errors over time)

KPIs → error rate %, count by HTTP code, label ownership

Error sunburst showing HTTP 429 and 403 errors both attributed to GET GitHub Zen endpoint — Both 429 and 403 roll up to one label: GET GitHub Zen. Rate limiting under load — not a mystery.

Pattern: Timeline shows when errors spiked; sunburst shows where. In this demo, the fix is test design (public API limits), not server tuning.

Pattern D · “Define or validate SLO thresholds” — distribution shape

Symptom

Product asks “is 500 ms p99 realistic?” You need the full distribution, not a gut feel.

Chart → Histogram + CDF

KPIs → p90, p95, p99 (CDF lines)

Latency histogram spike near zero and CDF with p90 330ms, p95 672ms, p99 1124ms markers — Histogram: bulk fast, ghost bars at 25 s. CDF: contract lines at p90/p95/p99 for SLO negotiation.

Rule: Set SLOs from CDF, defend them with histogram (proves outliers exist).

Pattern E · “It feels random” — variability and outliers

Symptom

“Sometimes fast, sometimes awful.” Same endpoint, inconsistent experience.

Chart → IQR outlier scatter + CV chart

KPIs → CV (std/mean), outliers beyond 1.5× IQR

IQR outlier scatter with red points to 30s and CV chart showing JSONPlaceholder at 3.83 exceeding threshold 2.0 — CV 3.83 on JSONPlaceholder = performance roulette. GitHub Zen at 0.60 = stable (when it succeeds).

CV	Read
< 1.0	Predictable
1.0 – 2.0	Watch
> 2.0	Flaky — investigate before peak

Pattern F · “How much headroom?” — saturation

Symptom

Planning capacity. Need the knee where throughput stops climbing and latency spikes.

Chart → RT vs throughput + RT vs concurrency

KPIs → p90 at plateau, req/s at knee, jitter width

RT vs throughput showing p90 spikes to 850ms while req/s flatlines near 22, and RT vs concurrency jitter at 15 threads — Throughput stuck ~22 req/s while p90 hits 850 ms — you're at the wall. At 15 threads, p90 swings 250–850 ms (jitter under fixed load).

Use before: Black Friday, launch day, autoscaling tuning.

Pattern G · “Network or server?” — latency decomposition

Symptom

Latency regressed but errors are zero. Is it backend processing or connection overhead?

Chart → Latency components + RT by status

KPIs → TTFB, connect time, transfer time

RT by status scatter and stacked latency components showing TTFB dominates with connect spike at test start — Successes can still hit 30 s (upper scatter). TTFB dominates — transfer is thin. Cold-start connect spike at t=0.

Pattern: Fat TTFB → server/processing. Fat connect → pool/cold start. Fat transfer → payload/size (rare on APIs).

Pattern H · “When do we break?” — load threshold

Symptom

“How many concurrent users until errors?” Need the breaking point, not average error rate.

Chart → Threads vs errors heatmap

KPIs → error rate bucket at thread range

Correlation matrix of JMeter metrics and threads vs errors heatmap showing error rate over 25% at 13-15 threads — 0% errors until 13–15 threads — then >25% error rate. The knee is precise.

Use the correlation matrix beside it to kill red herrings (e.g. bytes ↔ latency ≈ 0 → payload size isn’t your problem).

Pattern I · “Explain it to someone who wasn’t in the war room”

Symptom

Dev needs prose. PM needs a PDF. You need sleep.

Chart → AI-powered analysis (optional, your API key)

KPIs → Same math as above — AI narrates, never decides

AI-powered analysis section listing GitHub Zen 95% error rate, p99/p50 ratio 19.9x, and JSONPlaceholder CV 3.83 with recommendations — Percentile math runs first. AI summarizes for handoff — copy markdown or export PDF with all 29 charts.

Guardrail I insist on: Pass/fail comes from SLO gauges and KPI cards — not from the LLM. AI explains; thresholds decide.

The 5-minute analysis workflow (copy this)

1. Upload .jtl → read executive summary (verdict + recommendations)
2. If errors > target → sunburst → label + status code
3. If SLO fail on latency → scatter → boxplot → CDF
4. If "flaky" gut feel → CV chart + IQR scatter
5. If capacity question → RT vs throughput/concurrency
6. Export PDF → attach to PR / ticket

PerfSage Reveal upload screen — drop JTL, CSV, or XML up to 2GB and click Analyse — One screen in. Full playbook out. No JMeter install required for analysis.

Which KPI for which situation (cheat sheet)

Situation	Primary KPI	Secondary KPI	Chart
Release gate	SLO pass/fail	p99, error %	Summary + gauges
User feels slow, avg OK	p99 ÷ median	p99 ms	Scatter
Stakeholder comms	Apdex	% frustrated (derived)	SLO gauges
Incident: what broke	Error rate by label	HTTP code	Sunburst
SLO negotiation	p99 on CDF	p95 buffer	Histogram/CDF
Flaky endpoint	CV	IQR outlier count	Variability
Capacity plan	req/s at knee	p90 at knee	RT vs throughput
Backend vs network	TTFB share	Connect spike	Latency components
Breaking point	Error % at thread N	—	Threads vs errors

Try it on your last JTL

docker pull aashu3201/reveal:latest

docker run -d \
  --name perfsage-reveal \
  -p 8000:8000 \
  -v perfsage-reveal-data:/app/data \
  -e PERFSAGE_SECRET="change-me-to-a-32-char-random-string" \
  aashu3201/reveal:latest

Open http://localhost:8000, upload any .jtl, and run this playbook against your own run.

GitHub: github.com/perfsage/reveal
Launch story: JMeter Gave Me Reports. I Needed Answers
Tail latency deep dive: The P99 Trap
Book a call: topmate.io/abajpai

Field Notes #2 · Published June 2026 · By Aashish Bajpai