Alert Explanation

AlertExplanationService turns an alert + its surrounding events + the affected entity’s UEBA baseline into a plain-English explanation an analyst can act on.

When it runs

Only when something triggers it:

Dashboard analyst clicks Explain on an alert detail page
POST /api/v1/alerts/{alert_id}/explain is called from a script
The service is not invoked automatically on alert fire — that would burn tokens on noise

The result is cached (LRU, size set by llm.explanation_cache_size, default 256). Subsequent calls for the same alert return instantly.

What the LLM sees

The service builds a redacted context from three sources:

The alert — title, rule, severity, MITRE tags, entities, dedup_count, risk_score
A ±N-second event window around the alert timestamp (events involving the same entities only)
The entity’s UEBA baseline — warm-up status, top templates, source-IP spread — if available

PII-prone fields (raw message, OS users beyond the alert’s named entities) are summarized rather than passed verbatim.

What the LLM returns

{
  "alert_id": "alr_01HZX...",
  "summary": "Short one-paragraph explanation",
  "root_cause": "Why this likely happened",
  "next_steps": [
    "Check ...",
    "Block ...",
    "Page on-call if ..."
  ],
  "confidence": "high",
  "model": "claude-sonnet-4-6",
  "latency_ms": 1840,
  "generated_at_ns": "1715619300000000000"
}

The parser.py step is strict — malformed LLM output is rejected, not silently surfaced. A failed parse propagates as HTTP 502 to the client; the dashboard surfaces a “regenerate” button.

REST API

Method	Path	Purpose
POST	`/api/v1/alerts/{alert_id}/explain`	Generate (or return cached) explanation. Async — guarded by a per-request wall-clock timeout.
GET	`/api/v1/alerts/{alert_id}/explanation`	Fetch the cached explanation, if any. 404 when never generated.

Both endpoints require llm.backend to be non-empty; otherwise the routes return 503 Service Unavailable.

Configuration

llm:
  backend: ollama
  ollama_url: http://localhost:11434
  ollama_model: phi4-mini

  # Explanation-specific (defaults shown)
  explanation_cache_size: 256

The wall-clock guard uses ollama_timeout_s / cloud_timeout_s depending on the active backend. On timeout the route returns 502 and nothing is cached — a retry will hit the backend again.

Cost control

The cache means repeat clicks on the same alert are free.
Set explanation_cache_size higher (e.g. 2048) if you have many analysts triaging the same incidents.
For sensitive logs, use backend: llama_cpp or backend: ollama — both are local and incur no per-token cost.