Skip to content

Alert Explanation

AlertExplanationService turns an alert + its surrounding events + the affected entity’s UEBA baseline into a plain-English explanation an analyst can act on.

Only when something triggers it:

  • Dashboard analyst clicks Explain on an alert detail page
  • POST /api/v1/alerts/{alert_id}/explain is called from a script
  • The service is not invoked automatically on alert fire — that would burn tokens on noise

The result is cached (LRU, size set by llm.explanation_cache_size, default 256). Subsequent calls for the same alert return instantly.

The service builds a redacted context from three sources:

  1. The alert — title, rule, severity, MITRE tags, entities, dedup_count, risk_score
  2. A ±N-second event window around the alert timestamp (events involving the same entities only)
  3. The entity’s UEBA baseline — warm-up status, top templates, source-IP spread — if available

PII-prone fields (raw message, OS users beyond the alert’s named entities) are summarized rather than passed verbatim.

{
"alert_id": "alr_01HZX...",
"summary": "Short one-paragraph explanation",
"root_cause": "Why this likely happened",
"next_steps": [
"Check ...",
"Block ...",
"Page on-call if ..."
],
"confidence": "high",
"model": "claude-sonnet-4-6",
"latency_ms": 1840,
"generated_at_ns": "1715619300000000000"
}

The parser.py step is strict — malformed LLM output is rejected, not silently surfaced. A failed parse propagates as HTTP 502 to the client; the dashboard surfaces a “regenerate” button.

MethodPathPurpose
POST/api/v1/alerts/{alert_id}/explainGenerate (or return cached) explanation. Async — guarded by a per-request wall-clock timeout.
GET/api/v1/alerts/{alert_id}/explanationFetch the cached explanation, if any. 404 when never generated.

Both endpoints require llm.backend to be non-empty; otherwise the routes return 503 Service Unavailable.

llm:
backend: ollama
ollama_url: http://localhost:11434
ollama_model: phi4-mini
# Explanation-specific (defaults shown)
explanation_cache_size: 256

The wall-clock guard uses ollama_timeout_s / cloud_timeout_s depending on the active backend. On timeout the route returns 502 and nothing is cached — a retry will hit the backend again.

  • The cache means repeat clicks on the same alert are free.
  • Set explanation_cache_size higher (e.g. 2048) if you have many analysts triaging the same incidents.
  • For sensitive logs, use backend: llama_cpp or backend: ollama — both are local and incur no per-token cost.