Architecture

System architecture, data flow, ClickHouse schema, and component overview.

This page contains the full architecture blueprint for SeeBOM.

TL;DR

Kubernetes-native SBOM platform as a monorepo. Go backend with four binaries (CronJob Ingestion-Watcher, scalable Parsing-Workers, stateless API-Gateway, background CVE-Refresher). ClickHouse as the analytical database with MergeTree tables and array-based dependency storage. Angular frontend with virtual scrolling, OnPush change detection, full-text search, dark-mode toggle, and custom CSS theming.

Components

BinaryTypePurpose
ingestion-watcherK8s CronJobScans SBOM/VEX directory, hash-dedup, enqueues jobs
parsing-workerDeployment (N replicas)Processes SBOMs (SPDX→ClickHouse), VEX files, OSV lookups, license resolution, compliance checks
api-gatewayDeploymentStateless REST API (24 endpoints)
cve-refresherK8s CronJob (daily)Checks all known PURLs for newly disclosed CVEs

Data Flow

┌─────────────────────────────────────────────────────────┐
│                    SBOM Sources                          │
│  S3 (default):                                           │
│    s3://cncf-subproject-sboms/k3s-io/...spdx.json       │
│  Local (alternative):                                    │
│    sboms/*.spdx.json + *.openvex.json                   │
└──────────────────────┬──────────────────────────────────┘
       │ S3 ListObjects (streamed) + filepath.Walk (local)
       │ SHA256 hashing + file-type detection (sbom|vex)
       ▼
Ingestion Watcher (CronJob)
       │ Hash dedup → batch INSERT INTO ingestion_queue (500/batch)
       ▼
ClickHouse: ingestion_queue (status='pending')
       │ SELECT + Claim (status='processing')
       ▼
Parsing Workers (N replicas)
       ├── Local files: os.Open(filepath.Join(sbomDir, sourceFile))
       ├── S3 files:    s3.GetObject(bucket, key) → io.ReadCloser
       ├── job_type=sbom:
       │     1. Auto-detect format (SPDX / CycloneDX / in-toto envelope)
       │     2. Parse via appropriate backend (built-in or protobom)
       │     2. Resolve unknown licenses via GitHub API
       │        (well-known Go module mappings + API fallback + static overrides)
       │     3. Batch INSERT sboms + sbom_packages (with resolved licenses)
       │     4. OSV Batch Query → INSERT vulnerabilities
       │     5. License Compliance Check → INSERT license_compliance
       └── job_type=vex:  OpenVEX Parse → INSERT vex_statements
       ▼
ClickHouse: sboms, sbom_packages, vulnerabilities, license_compliance, vex_statements
       │
       │         ┌──────────────────────────────────┐
       │         │ CVE Refresher (CronJob, daily)   │
       │         │  OSV BatchQuery (1000/chunk)      │
       │         │  Dedup + reverse-lookup + INSERT  │
       │         └──────────────────────────────────┘
       ▼
API Gateway (REST) → 24 Endpoints → Angular UI

ClickHouse Schema

TableEnginePurpose
sbomsReplacingMergeTreeSBOM metadata
sbom_packagesMergeTreeParallel arrays (names, PURLs, licenses, relationships)
vulnerabilitiesMergeTreeOSV results
license_complianceSummingMergeTreeLicense compliance per SBOM
ingestion_queueReplacingMergeTreeJob queue (job_type: sbom/vex)
dashboard_stats_mvSummingMergeTree (MV)Pre-aggregated daily stats
vex_statementsReplacingMergeTreeOpenVEX statements
cve_refresh_logMergeTreeCVE refresh run history
github_license_cacheReplacingMergeTreeResolved GitHub licenses cache
github_repo_metadataReplacingMergeTreeGitHub repo metadata (archived, fork, stars)

All core tables (sboms, sbom_packages, vulnerabilities, license_compliance, ingestion_queue, vex_statements) include a cluster LowCardinality(String) DEFAULT '' column for multi-cluster support.

Multi-Cluster Data Model

SeeBOM supports tagging all ingested data with a cluster identifier for multi-cluster deployments. This is fully optional — single-instance deployments work without any configuration.

How it works

┌────────────────────────────────────┐
│  S3 Buckets with per-bucket cluster │
│                                      │
│  bucket: prod-eu-sboms               │
│  cluster: "prod-eu"                  │
│                                      │
│  bucket: staging-sboms               │
│  cluster: "staging"                  │
│                                      │
│  bucket: other-sboms                 │
│  cluster: "" (inherits CLUSTER_NAME) │
└────────────────┬─────────────────────┘
                 │
                 ▼
    Ingestion Watcher
    (resolves cluster per object)
                 │
                 ▼
    ingestion_queue.cluster = "prod-eu" | "staging" | ""
                 │
                 ▼
    Parsing Worker
    (propagates job.Cluster → all inserts)
                 │
                 ▼
    sboms.cluster / vulnerabilities.cluster / etc.

Configuration

MethodUse case
No config (default)Single instance, no cluster differentiation
CLUSTER_NAME=prod-euAll data from this instance tagged as prod-eu
Per-bucket "cluster" in S3_BUCKETS JSONOne watcher instance ingests from multiple clusters
Mix: per-bucket + CLUSTER_NAME fallbackBuckets without explicit cluster inherit the global value

Priority

  1. Per-bucket cluster field in S3 config (highest)
  2. Global CLUSTER_NAME environment variable (fallback)
  3. Empty string "" (no cluster, single-instance mode)

API Endpoints

MethodEndpointDescription
GET/healthzHealth check
GET/livezLiveness probe
GET/readyzReadiness probe (checks ClickHouse)
GET/api/v1/stats/dashboardDashboard statistics
GET/api/v1/stats/dependencies?limit=NTop-N dependencies cross-project
GET/api/v1/stats/version-skew?page=&page_size=&search=Version skew detection
GET/api/v1/sboms?page=&page_size=Paginated SBOM list
GET/api/v1/sboms/{id}/detailSBOM detail with severity breakdown
GET/api/v1/sboms/{id}/vulnerabilitiesVulnerabilities for an SBOM
GET/api/v1/sboms/{id}/licensesLicense breakdown for an SBOM
GET/api/v1/sboms/{id}/dependenciesDependency tree
GET/api/v1/vulnerabilities?page=&vex_filter=Paginated vulnerabilities
GET/api/v1/vulnerabilities/{id}/affected-projectsCVE impact across projects
GET/api/v1/licenses/complianceGlobal license compliance
GET/api/v1/projects?page=&page_size=&search=Grouped project listing
GET/api/v1/projects/license-complianceProjects with license violations
GET/api/v1/license-exceptionsActive license exceptions
GET/api/v1/license-policyActive license policy
GET/api/v1/vex/statements?page=&page_size=Paginated VEX statements
GET/api/v1/packages/archivedArchived GitHub repo packages
GET/api/v1/packages/search?q=&page=&page_size=Fuzzy package name search
GET/api/v1/packages/detail?name=&page=&page_size=All projects using a specific package
GET/api/v1/clustersList all clusters with summary stats
GET/api/v1/clusters/{name}/statsPer-cluster dashboard statistics
GET/api/v1/clusters/{name}/sboms?page=&page_size=SBOMs for a specific cluster

VEX Architecture

  • Format: OpenVEX (JSON, Spec v0.2.0)
  • File Detection: *.openvex.json or *.vex.json
  • Statuses: not_affected, affected, fixed, under_investigation
  • URL Normalization: VEX vulnerability @id URLs are reduced to plain IDs
  • Dashboard: effective_vulnerabilities = total - suppressed_by_vex

CVE Refresher

Lightweight daily CronJob that queries all unique PURLs (~20k) against the OSV API in 1000-PURL batch chunks, deduplicates against existing vulnerabilities, and inserts new findings — without re-scanning all SBOMs.

OSV Integration

  • Endpoint: POST https://api.osv.dev/v1/querybatch
  • Batch Limit: 1000 PURLs per request
  • Rate Limiting: Token bucket (10 req/s, burst 5)
  • Retry: Exponential backoff on HTTP 429/503

License Governance

  • License Policy (license-policy.json): Defines permissive vs. copyleft classifications
  • License Exceptions (license-exceptions.json): CNCF format, blanket + specific
  • Permissive licenses (MIT, Apache-2.0, BSD) are never tracked as non-compliant
  • Visual: Green = exempted copyleft, Red = violation, Orange = exempted in dependency tree

SBOM Parsers

SeeBOM supports multiple SBOM formats through a format-detection dispatch layer (internal/sbom):

FormatDetectionParser
SPDX 2.3 JSONspdxVersion field presentBuilt-in (internal/spdx)
In-toto envelope (SPDX)predicateType contains “spdx”Built-in (internal/spdx)
CycloneDX 1.0–1.7 JSONbomFormat: "CycloneDX"Built-in (internal/cyclonedx)
All above via protobom(opt-in)internal/protobomparser

File extensions recognized: .spdx.json, .cdx.json, .json (any JSON file — format auto-detected at parse time)

Files starting with a configurable prefix (SBOM_IGNORE_PREFIX, default _) are skipped during local filesystem scanning. Config files (license-policy.json, license-exceptions.json) are always excluded.

Two parser backends are available:

  • Built-in (default) — Lightweight, high-performance parsers using goccy/go-json. Zero additional dependencies. Best for production with known formats.
  • Protobom (opt-in) — Uses github.com/protobom/protobom for maximum format coverage. Enable with USE_PROTOBOM=true.

See the Parsers documentation for configuration details and trade-offs.

GitHub License Resolution

For packages with NOASSERTION or empty licenses (common in container-image SBOMs generated by Syft), the parsing worker resolves licenses via the GitHub API using multiple strategies:

  1. Direct PURL extractionpkg:golang/github.com/{owner}/{repo}github.com/{owner}/{repo}
  2. Well-known Go module mappings (50+ entries) — Maps non-GitHub import paths to their GitHub repos:
    • golang.org/x/*github.com/golang/*
    • gopkg.in/yaml.v3github.com/go-yaml/yaml
    • go.uber.org/zapgithub.com/uber-go/zap
    • k8s.io/client-gogithub.com/kubernetes/client-go
    • oras.land/oras-gogithub.com/oras-project/oras-go
    • dario.cat/mergogithub.com/darccio/mergo
    • And many more (see internal/github/purl.go)
  3. Fallback to /license endpoint — If the repo API returns NOASSERTION, the dedicated /repos/{owner}/{repo}/license endpoint is tried (it does deeper file analysis)
  4. Static overrides — For repos where even GitHub’s license detection fails (returns “Other”), manually verified overrides are applied (e.g., opencontainers/go-digest → Apache-2.0, shopspring/decimal → MIT)

Results are cached in-memory per worker and persisted to the github_license_cache and github_repo_metadata ClickHouse tables for cross-worker reuse.

Angular UI

13 lazy-loaded routes with virtual scrolling, OnPush change detection, dark mode toggle, and CSS custom properties theming. Includes package search with fuzzy name matching and paginated detail views. External custom-theme.css and ui-config.json are mountable without rebuild.