SCANOSS KB | SCANOSS

SCANOSS knowledge base

334,630,255

URLs indexed

Updated 13 Jul 2026 · across the world's open source

PURLs

—

49,271,694

Versioned PURLs

—

220,006,227

Total files

100 billion

4 billion unique

Lines of code

3 trillion

— sources

Vulnerability layer · the KB cross-references 31,673 GitHub Security Advisories and 303,359 NVD CVEs, mapping known vulnerabilities directly to the code where they live.

Breakdown by source

Switch dimensions to see how each repository contributes to the index.

How the KB is built

The SCANOSS KB indexes every file from every public open source repository we can reach — package registries, source forges, code hosts, and OS-level distribution mirrors. Each file is fingerprinted at the snippet level using winnowing, which means we don’t just identify whole files: we can match a function, a copied code block, or a single suspect line back to its origin in the wider open source ecosystem.

That’s the difference between SCA tools that match by package name and tools that match by content. Package-level matching tells you which dependencies you’ve declared. Snippet-level matching tells you what’s actually in your code, including code that was copy-pasted, vendored, modified, or AI-generated without attribution.

Each file is also tied to its origin URL and its package coordinates (PURLs), so the KB resolves cleanly into SBOMs, license obligations, encryption inventories, and vulnerability mappings. New sources are added continuously, and the index is refreshed monthly.

Four datasets, one foundation

Licence dataset

Every file in the KB resolved to its declared and detected licenses, with obligation metadata, compatibility classifications, and SPDX mappings. The substrate for SBOM generation and license-risk analysis.

Explore the license dataset →

Encryption dataset

Every cryptographic algorithm, primitive, and library detected across the index, including their post-quantum readiness status. The foundation for cryptographic asset inventories and quantum-migration planning.

Explore the encryption dataset →

Security dataset

Vulnerabilities mapped directly to the code where they live, drawing on 28,735 GitHub Security Advisories and 295,218 NVD CVEs. Resolves to the file and version level, not just the package.

Explore the security dataset →

Geo provenance dataset

eographic origin metadata for code contributions across the index, derived from commit signals and repository hosting. Built for export-control compliance and supply-chain due diligence.

Explore the geo provenance dataset →

SCANOSS knowledge base

Breakdown by source

How the KB is built

Four datasets, one foundation

Evaluating SCANOSS?