Architecture Overview

Processing Pipeline

All five stages run in a single Python process. No JVM, no broker, no cluster.

1. Ingestion

Concurrent receivers, all feeding one bounded async queue (default 10K events, backpressure built in):

Syslog UDP/TCP (RFC 3164 + RFC 5424)
OTLP gRPC and OTLP HTTP (protobuf + JSON)
File tailing (glob, rotation, checkpointed offsets, debounce)
Webhook HTTP (JSON / form, per-endpoint auth token, field mapping)

2. Parsing

Drain3 for streaming log template extraction (~120K msgs/sec).
Regex-based entity extraction (IPs, users, hosts, files, domains, processes).
UUID5 entity resolution — the same entity yields the same ID across all sources, no matter which receiver saw it first.

3. Detection

All algorithms are online/streaming — they update with every event, no batch retraining:

Algorithm	Detects	Notes
Half-Space Trees	Content anomalies	River ensemble, constant memory
Holt-Winters	Volume anomalies	Trend + seasonal decomposition
CUSUM	Change points	Bidirectional cumulative sum
Markov chains	Sequence anomalies	Per-entity transition matrices
biDSPOT	Auto-thresholds	Bidirectional EVT (scipy GPD)
pySigma	Known threat patterns	63 bundled rules, logsource-indexed dispatch

The DetectionEnsemble blends the four ML scores with z-normalization and per-source weights (weights_content, weights_volume, weights_sequence, weights_pattern).

4. Correlation

The correlation engine fuses signals across entities and time:

Sliding window per entity with watermark-based late-arrival tolerance
Risk accumulation — per-entity risk register with exponential decay; catches slow-burn attacks
Graph-structural scoring — community-crossing edges, betweenness centrality, fan-out outliers
Kill-chain detector — MITRE ATT&CK tactic progression (default: ≥3 distinct tactics within 24h)

5. Entity Graph

Three pluggable backends, switchable via storage.graph_backend:

Backend	When to use	Extras install
`igraph` (default)	Single-process, in-memory, fastest	(none)
`falkordb`	External, shared graph, query via Cypher	`uv sync --extra graph-falkordb`
`postgres_age`	Co-locate graph with relational store	`uv sync --extra graph-postgres-age`

Migrate between backends with seerflow graph migrate --from <a> --to <b>.

6. UEBA + Threat Intel

UEBA — per-user / per-host behavioural baselines with a configurable warm-up (default 7 days / 50 events) before scoring.
Threat intelligence — pull IoCs from TAXII feeds and match them against ingested events with a Bloom-filter matcher tuned for low false-positive rate.

7. Security Rules

pySigma with 63 bundled SigmaHQ rules; add custom directories via detection.sigma_rules_dirs.
MITRE ATT&CK tagging on every rule (tactic + technique).
Logsource-indexed dispatch keeps throughput high even with thousands of rules.

8. Alerting

Outbound channels:

Generic webhooks (raw JSON)
Slack and Microsoft Teams (formatted)
PagerDuty Events API v2
Dedup window (default 15 min) on the alert dedup key

9. Surface (Dashboard + API)

Bundled with the wheel — no separate uvicorn process needed:

React SPA on http://127.0.0.1:8080/
REST API on http://127.0.0.1:8080/api/v1/
WebSocket live stream on ws://127.0.0.1:8080/api/v1/ws

The CLI’s seerflow start boots receivers, the detection engines, the correlation engine, and the FastAPI dashboard in the same process.

Data Model

The core SeerflowEvent is a frozen msgspec.Struct that unifies four log schema standards:

OpenTelemetry LogRecord (ns timestamps, trace context, severity 1-24)
Elastic Common Schema (event.kind / category / type / outcome)
OCSF (numeric taxonomy: category_uid / class_uid / type_uid)
Sigma (logsource.category / product / service)

Key fields: dual timestamps (event vs observed), Drain3 template metadata, entity references, MITRE ATT&CK mapping, risk and anomaly scores.

Storage

Protocol-based interfaces — backend switchable via one config line. See the Storage page for SQLite vs PostgreSQL details and the schema layout.

Protocol	Methods	Purpose
`LogStore`	`write_events`, `query_events`, `search_text`	Event persistence + FTS
`AlertStore`	`write_alert`, `query_alerts`, `update_feedback`	Alert management + analyst feedback
`ModelStore`	`save_state`, `load_state`	ML model checkpoints across restarts
`EntityStore`	`get_timeline`, `get_related`	Entity exploration

Backends: SQLite (zero-config default, WAL + FTS5) and PostgreSQL (asyncpg pool, production scale).