Architecture
This page contains the full architecture blueprint for SeeBOM.
TL;DR
Kubernetes-native SBOM platform as a monorepo. Go backend with four binaries (CronJob Ingestion-Watcher, scalable Parsing-Workers, stateless API-Gateway, background CVE-Refresher). ClickHouse as the analytical database with MergeTree tables and array-based dependency storage. Angular frontend with virtual scrolling, OnPush change detection, full-text search, dark-mode toggle, and custom CSS theming.
Components
| Binary | Type | Purpose |
|---|---|---|
ingestion-watcher | K8s CronJob | Scans SBOM/VEX directory, hash-dedup, enqueues jobs |
parsing-worker | Deployment (N replicas) | Processes SBOMs (SPDX→ClickHouse), VEX files, OSV lookups, license resolution, compliance checks |
api-gateway | Deployment | Stateless REST API (24 endpoints) |
cve-refresher | K8s CronJob (daily) | Checks all known PURLs for newly disclosed CVEs |
Data Flow
┌─────────────────────────────────────────────────────────┐
│ SBOM Sources │
│ S3 (default): │
│ s3://cncf-subproject-sboms/k3s-io/...spdx.json │
│ Local (alternative): │
│ sboms/*.spdx.json + *.openvex.json │
└──────────────────────┬──────────────────────────────────┘
│ S3 ListObjects (streamed) + filepath.Walk (local)
│ SHA256 hashing + file-type detection (sbom|vex)
▼
Ingestion Watcher (CronJob)
│ Hash dedup → batch INSERT INTO ingestion_queue (500/batch)
▼
ClickHouse: ingestion_queue (status='pending')
│ SELECT + Claim (status='processing')
▼
Parsing Workers (N replicas)
├── Local files: os.Open(filepath.Join(sbomDir, sourceFile))
├── S3 files: s3.GetObject(bucket, key) → io.ReadCloser
├── job_type=sbom:
│ 1. Auto-detect format (SPDX / CycloneDX / in-toto envelope)
│ 2. Parse via appropriate backend (built-in or protobom)
│ 2. Resolve unknown licenses via GitHub API
│ (well-known Go module mappings + API fallback + static overrides)
│ 3. Batch INSERT sboms + sbom_packages (with resolved licenses)
│ 4. OSV Batch Query → INSERT vulnerabilities
│ 5. License Compliance Check → INSERT license_compliance
└── job_type=vex: OpenVEX Parse → INSERT vex_statements
▼
ClickHouse: sboms, sbom_packages, vulnerabilities, license_compliance, vex_statements
│
│ ┌──────────────────────────────────┐
│ │ CVE Refresher (CronJob, daily) │
│ │ OSV BatchQuery (1000/chunk) │
│ │ Dedup + reverse-lookup + INSERT │
│ └──────────────────────────────────┘
▼
API Gateway (REST) → 24 Endpoints → Angular UI
ClickHouse Schema
| Table | Engine | Purpose |
|---|---|---|
sboms | ReplacingMergeTree | SBOM metadata |
sbom_packages | MergeTree | Parallel arrays (names, PURLs, licenses, relationships) |
vulnerabilities | MergeTree | OSV results |
license_compliance | SummingMergeTree | License compliance per SBOM |
ingestion_queue | ReplacingMergeTree | Job queue (job_type: sbom/vex) |
dashboard_stats_mv | SummingMergeTree (MV) | Pre-aggregated daily stats |
vex_statements | ReplacingMergeTree | OpenVEX statements |
cve_refresh_log | MergeTree | CVE refresh run history |
github_license_cache | ReplacingMergeTree | Resolved GitHub licenses cache |
github_repo_metadata | ReplacingMergeTree | GitHub repo metadata (archived, fork, stars) |
All core tables (sboms, sbom_packages, vulnerabilities, license_compliance, ingestion_queue, vex_statements) include a cluster LowCardinality(String) DEFAULT '' column for multi-cluster support.
Multi-Cluster Data Model
SeeBOM supports tagging all ingested data with a cluster identifier for multi-cluster deployments. This is fully optional — single-instance deployments work without any configuration.
How it works
┌────────────────────────────────────┐
│ S3 Buckets with per-bucket cluster │
│ │
│ bucket: prod-eu-sboms │
│ cluster: "prod-eu" │
│ │
│ bucket: staging-sboms │
│ cluster: "staging" │
│ │
│ bucket: other-sboms │
│ cluster: "" (inherits CLUSTER_NAME) │
└────────────────┬─────────────────────┘
│
▼
Ingestion Watcher
(resolves cluster per object)
│
▼
ingestion_queue.cluster = "prod-eu" | "staging" | ""
│
▼
Parsing Worker
(propagates job.Cluster → all inserts)
│
▼
sboms.cluster / vulnerabilities.cluster / etc.
Configuration
| Method | Use case |
|---|---|
| No config (default) | Single instance, no cluster differentiation |
CLUSTER_NAME=prod-eu | All data from this instance tagged as prod-eu |
Per-bucket "cluster" in S3_BUCKETS JSON | One watcher instance ingests from multiple clusters |
Mix: per-bucket + CLUSTER_NAME fallback | Buckets without explicit cluster inherit the global value |
Priority
- Per-bucket
clusterfield in S3 config (highest) - Global
CLUSTER_NAMEenvironment variable (fallback) - Empty string
""(no cluster, single-instance mode)
API Endpoints
| Method | Endpoint | Description |
|---|---|---|
| GET | /healthz | Health check |
| GET | /livez | Liveness probe |
| GET | /readyz | Readiness probe (checks ClickHouse) |
| GET | /api/v1/stats/dashboard | Dashboard statistics |
| GET | /api/v1/stats/dependencies?limit=N | Top-N dependencies cross-project |
| GET | /api/v1/stats/version-skew?page=&page_size=&search= | Version skew detection |
| GET | /api/v1/sboms?page=&page_size= | Paginated SBOM list |
| GET | /api/v1/sboms/{id}/detail | SBOM detail with severity breakdown |
| GET | /api/v1/sboms/{id}/vulnerabilities | Vulnerabilities for an SBOM |
| GET | /api/v1/sboms/{id}/licenses | License breakdown for an SBOM |
| GET | /api/v1/sboms/{id}/dependencies | Dependency tree |
| GET | /api/v1/vulnerabilities?page=&vex_filter= | Paginated vulnerabilities |
| GET | /api/v1/vulnerabilities/{id}/affected-projects | CVE impact across projects |
| GET | /api/v1/licenses/compliance | Global license compliance |
| GET | /api/v1/projects?page=&page_size=&search= | Grouped project listing |
| GET | /api/v1/projects/license-compliance | Projects with license violations |
| GET | /api/v1/license-exceptions | Active license exceptions |
| GET | /api/v1/license-policy | Active license policy |
| GET | /api/v1/vex/statements?page=&page_size= | Paginated VEX statements |
| GET | /api/v1/packages/archived | Archived GitHub repo packages |
| GET | /api/v1/packages/search?q=&page=&page_size= | Fuzzy package name search |
| GET | /api/v1/packages/detail?name=&page=&page_size= | All projects using a specific package |
| GET | /api/v1/clusters | List all clusters with summary stats |
| GET | /api/v1/clusters/{name}/stats | Per-cluster dashboard statistics |
| GET | /api/v1/clusters/{name}/sboms?page=&page_size= | SBOMs for a specific cluster |
VEX Architecture
- Format: OpenVEX (JSON, Spec v0.2.0)
- File Detection:
*.openvex.jsonor*.vex.json - Statuses:
not_affected,affected,fixed,under_investigation - URL Normalization: VEX vulnerability
@idURLs are reduced to plain IDs - Dashboard:
effective_vulnerabilities = total - suppressed_by_vex
CVE Refresher
Lightweight daily CronJob that queries all unique PURLs (~20k) against the OSV API in 1000-PURL batch chunks, deduplicates against existing vulnerabilities, and inserts new findings — without re-scanning all SBOMs.
OSV Integration
- Endpoint:
POST https://api.osv.dev/v1/querybatch - Batch Limit: 1000 PURLs per request
- Rate Limiting: Token bucket (10 req/s, burst 5)
- Retry: Exponential backoff on HTTP 429/503
License Governance
- License Policy (
license-policy.json): Defines permissive vs. copyleft classifications - License Exceptions (
license-exceptions.json): CNCF format, blanket + specific - Permissive licenses (MIT, Apache-2.0, BSD) are never tracked as non-compliant
- Visual: Green = exempted copyleft, Red = violation, Orange = exempted in dependency tree
SBOM Parsers
SeeBOM supports multiple SBOM formats through a format-detection dispatch layer (internal/sbom):
| Format | Detection | Parser |
|---|---|---|
| SPDX 2.3 JSON | spdxVersion field present | Built-in (internal/spdx) |
| In-toto envelope (SPDX) | predicateType contains “spdx” | Built-in (internal/spdx) |
| CycloneDX 1.0–1.7 JSON | bomFormat: "CycloneDX" | Built-in (internal/cyclonedx) |
| All above via protobom | (opt-in) | internal/protobomparser |
File extensions recognized: .spdx.json, .cdx.json, .json (any JSON file — format auto-detected at parse time)
Files starting with a configurable prefix (SBOM_IGNORE_PREFIX, default _) are skipped during local filesystem scanning. Config files (license-policy.json, license-exceptions.json) are always excluded.
Two parser backends are available:
- Built-in (default) — Lightweight, high-performance parsers using
goccy/go-json. Zero additional dependencies. Best for production with known formats. - Protobom (opt-in) — Uses github.com/protobom/protobom for maximum format coverage. Enable with
USE_PROTOBOM=true.
See the Parsers documentation for configuration details and trade-offs.
GitHub License Resolution
For packages with NOASSERTION or empty licenses (common in container-image SBOMs generated by Syft), the parsing worker resolves licenses via the GitHub API using multiple strategies:
- Direct PURL extraction —
pkg:golang/github.com/{owner}/{repo}→github.com/{owner}/{repo} - Well-known Go module mappings (50+ entries) — Maps non-GitHub import paths to their GitHub repos:
golang.org/x/*→github.com/golang/*gopkg.in/yaml.v3→github.com/go-yaml/yamlgo.uber.org/zap→github.com/uber-go/zapk8s.io/client-go→github.com/kubernetes/client-gooras.land/oras-go→github.com/oras-project/oras-godario.cat/mergo→github.com/darccio/mergo- And many more (see
internal/github/purl.go)
- Fallback to
/licenseendpoint — If the repo API returnsNOASSERTION, the dedicated/repos/{owner}/{repo}/licenseendpoint is tried (it does deeper file analysis) - Static overrides — For repos where even GitHub’s license detection fails (returns “Other”), manually verified overrides are applied (e.g.,
opencontainers/go-digest→ Apache-2.0,shopspring/decimal→ MIT)
Results are cached in-memory per worker and persisted to the github_license_cache and github_repo_metadata ClickHouse tables for cross-worker reuse.
License resolution runs before the ClickHouse insert so that sbom_packages.package_licenses contains the resolved values from the start. This ensures the dependency tree API returns correct licenses without requiring a separate join or lookup.
Angular UI
13 lazy-loaded routes with virtual scrolling, OnPush change detection, dark mode toggle, and CSS custom properties theming. Includes package search with fuzzy name matching and paginated detail views. External custom-theme.css and ui-config.json are mountable without rebuild.