Observe
Parallel collectors across K8s API, logs, metrics, cAdvisor, Prometheus, and optional git history.
SignalPilot correlates deploy diffs, K8s events, metrics-server, logs, cAdvisor, Prometheus, and git —
then ranks findings with copy-paste kubectl fixes. Analysis, not another dashboard.
observe → correlate → explain → recommend → verify → learn
Parallel collectors across K8s API, logs, metrics, cAdvisor, Prometheus, and optional git history.
Deterministic rules fuse cross-source evidence — each finding cites multiple signal types.
Ranked, copy-paste kubectl fixes — not generic advice.
Baseline before a fix, compare after next deploy: Fixed vs Regressed vs Unchanged.
| Tier | Source | Always-on |
|---|---|---|
| 0 | Deploy diff (image, env, resources, probes) | Yes |
| 0 | Git repo correlation (commit SHA → suspect files) | Optional |
| 1 | K8s API: restarts, OOMKilled, CrashLoopBackOff, probes | Yes |
| 1 | K8s Events: FailedScheduling, BackOff, Unhealthy | Yes |
| 1 | metrics-server: CPU/memory saturation vs limits | Yes |
| 1 | Container logs: drain3 clustering, new errors | Yes |
| 2 | cAdvisor: CPU throttling, memory working-set | Yes |
| 2 | Network: endpoint readiness, DNS failures | Yes |
| 4 | Prometheus: p95/p99, error rate, CFS throttle | Optional |
| Rule | Trigger signals | Typical fix |
|---|---|---|
oom_killed | OOMKilled + mem near limit | Raise memory limit |
cpu_throttled | CFS throttle > 30% + latency regression | Raise CPU limit/request |
crash_loop | CrashLoopBackOff + log patterns + config diff | Fix env vars, rollback |
image_pull_error | ImagePullBackOff / ErrImagePull | Fix image tag, rollback |
probe_failure | Readiness/liveness probe failing | Fix probe path/port/timing |
code_regression | New log fingerprints after deploy + git suspect | Rollback, investigate commit |
Python 3.12+, kubectl configured, read-only RBAC applied.
pip install perfsage-signalpilot
kubectl apply -f deploy/signalpilot-rbac.yaml
signalpilot analyze my-namespace --deployment my-app --output report.html
# CI gate (exit 1 on HIGH+ findings)
signalpilot gate my-namespace --deployment my-app --junit-xml results.xml An open-source Kubernetes RCA copilot that answers why errors and performance degradation happened after your last deployment — by correlating deploy diffs, K8s events, metrics, logs, Prometheus, and git into ranked findings with copy-paste kubectl fixes.
kubectl shows one object at a time. Dashboards show metrics without deploy context. SignalPilot fuses cross-source evidence into deterministic rules — e.g. OOMKilled + memory at 94% of limit + git commit touching heap code = undersized memory limit, with a concrete fix.
Read-only RBAC via deploy/signalpilot-rbac.yaml. It uses the Kubernetes API, metrics-server, and optional Prometheus auto-detection. No agents in your app pods.
Yes. signalpilot gate exits non-zero on HIGH+ findings and can export JUnit XML for Jenkins or GitHub Actions — complement your load-test SLO gates from PerfSage SLO Reporter.
Prometheus enriches findings but is optional (auto-detected). LLM narrative polish is optional — core RCA rules and kubectl recommendations run without any API key.
Under 5 minutes for typical post-deploy regressions — deploy diff, events, metrics, and logs correlated into a single ranked report instead of hours of tab-switching across kubectl and dashboards.
Yes — MIT licensed open source. Test-time analysis with PerfSage Reveal; prod-time RCA with SignalPilot.