Architecture Overview
Processing Pipeline
Section titled “Processing Pipeline”All five stages run in a single Python process. No JVM, no broker, no cluster.
1. Ingestion
Section titled “1. Ingestion”Concurrent receivers, all feeding one bounded async queue (default 10K events, backpressure built in):
- Syslog UDP/TCP (RFC 3164 + RFC 5424)
- OTLP gRPC and OTLP HTTP (protobuf + JSON)
- File tailing (glob, rotation, checkpointed offsets, debounce)
- Webhook HTTP (JSON / form, per-endpoint auth token, field mapping)
2. Parsing
Section titled “2. Parsing”- Drain3 for streaming log template extraction (~120K msgs/sec).
- Regex-based entity extraction (IPs, users, hosts, files, domains, processes).
- UUID5 entity resolution — the same entity yields the same ID across all sources, no matter which receiver saw it first.
3. Detection
Section titled “3. Detection”All algorithms are online/streaming — they update with every event, no batch retraining:
| Algorithm | Detects | Notes |
|---|---|---|
| Half-Space Trees | Content anomalies | River ensemble, constant memory |
| Holt-Winters | Volume anomalies | Trend + seasonal decomposition |
| CUSUM | Change points | Bidirectional cumulative sum |
| Markov chains | Sequence anomalies | Per-entity transition matrices |
| biDSPOT | Auto-thresholds | Bidirectional EVT (scipy GPD) |
| pySigma | Known threat patterns | 63 bundled rules, logsource-indexed dispatch |
The DetectionEnsemble blends the four ML scores with z-normalization and per-source weights (weights_content, weights_volume, weights_sequence, weights_pattern).
4. Correlation
Section titled “4. Correlation”The correlation engine fuses signals across entities and time:
- Sliding window per entity with watermark-based late-arrival tolerance
- Risk accumulation — per-entity risk register with exponential decay; catches slow-burn attacks
- Graph-structural scoring — community-crossing edges, betweenness centrality, fan-out outliers
- Kill-chain detector — MITRE ATT&CK tactic progression (default: ≥3 distinct tactics within 24h)
5. Entity Graph
Section titled “5. Entity Graph”Three pluggable backends, switchable via storage.graph_backend:
| Backend | When to use | Extras install |
|---|---|---|
igraph (default) | Single-process, in-memory, fastest | (none) |
falkordb | External, shared graph, query via Cypher | uv sync --extra graph-falkordb |
postgres_age | Co-locate graph with relational store | uv sync --extra graph-postgres-age |
Migrate between backends with seerflow graph migrate --from <a> --to <b>.
6. UEBA + Threat Intel
Section titled “6. UEBA + Threat Intel”- UEBA — per-user / per-host behavioural baselines with a configurable warm-up (default 7 days / 50 events) before scoring.
- Threat intelligence — pull IoCs from TAXII feeds and match them against ingested events with a Bloom-filter matcher tuned for low false-positive rate.
7. Security Rules
Section titled “7. Security Rules”- pySigma with 63 bundled SigmaHQ rules; add custom directories via
detection.sigma_rules_dirs. - MITRE ATT&CK tagging on every rule (tactic + technique).
- Logsource-indexed dispatch keeps throughput high even with thousands of rules.
8. Alerting
Section titled “8. Alerting”Outbound channels:
- Generic webhooks (raw JSON)
- Slack and Microsoft Teams (formatted)
- PagerDuty Events API v2
- Dedup window (default 15 min) on the alert dedup key
9. Surface (Dashboard + API)
Section titled “9. Surface (Dashboard + API)”Bundled with the wheel — no separate uvicorn process needed:
- React SPA on
http://127.0.0.1:8080/ - REST API on
http://127.0.0.1:8080/api/v1/ - WebSocket live stream on
ws://127.0.0.1:8080/api/v1/ws
The CLI’s seerflow start boots receivers, the detection engines, the correlation engine, and the FastAPI dashboard in the same process.
Data Model
Section titled “Data Model”The core SeerflowEvent is a frozen msgspec.Struct that unifies four log schema standards:
- OpenTelemetry LogRecord (ns timestamps, trace context, severity 1-24)
- Elastic Common Schema (
event.kind/category/type/outcome) - OCSF (numeric taxonomy:
category_uid/class_uid/type_uid) - Sigma (
logsource.category/product/service)
Key fields: dual timestamps (event vs observed), Drain3 template metadata, entity references, MITRE ATT&CK mapping, risk and anomaly scores.
Storage
Section titled “Storage”Protocol-based interfaces — backend switchable via one config line. See the Storage page for SQLite vs PostgreSQL details and the schema layout.
| Protocol | Methods | Purpose |
|---|---|---|
LogStore | write_events, query_events, search_text | Event persistence + FTS |
AlertStore | write_alert, query_alerts, update_feedback | Alert management + analyst feedback |
ModelStore | save_state, load_state | ML model checkpoints across restarts |
EntityStore | get_timeline, get_related | Entity exploration |
Backends: SQLite (zero-config default, WAL + FTS5) and PostgreSQL (asyncpg pool, production scale).