SCANOSS knowledge base

320,670,643
URLs indexed
Updated 26 Apr 2026 · across the world's open source
PURLs
85,128,886
Versioned PURLs
210,967,672
Total files
100 billion
3.8 billion unique
Lines of code
3 trillion
— sources
Vulnerability layer · the KB cross-references 28,735 GitHub Security Advisories and 295,218 NVD CVEs, mapping known vulnerabilities directly to the code where they live.

Breakdown by source

Switch dimensions to see how each repository contributes to the index.

How the KB is built

The SCANOSS KB indexes every file from every public open source repository we can reach — package registries, source forges, code hosts, and OS-level distribution mirrors. Each file is fingerprinted at the snippet level using winnowing, which means we don’t just identify whole files: we can match a function, a copied code block, or a single suspect line back to its origin in the wider open source ecosystem.

That’s the difference between SCA tools that match by package name and tools that match by content. Package-level matching tells you which dependencies you’ve declared. Snippet-level matching tells you what’s actually in your code, including code that was copy-pasted, vendored, modified, or AI-generated without attribution.

Each file is also tied to its origin URL and its package coordinates (PURLs), so the KB resolves cleanly into SBOMs, license obligations, encryption inventories, and vulnerability mappings. New sources are added continuously, and the index is refreshed monthly.

Four datasets, one foundation

Licence dataset

Every file in the KB resolved to its declared and detected licenses, with obligation metadata, compatibility classifications, and SPDX mappings. The substrate for SBOM generation and license-risk analysis.

Explore the license dataset →

Encryption dataset

Every cryptographic algorithm, primitive, and library detected across the index, including their post-quantum readiness status. The foundation for cryptographic asset inventories and quantum-migration planning.

Explore the encryption dataset →

Security dataset

Vulnerabilities mapped directly to the code where they live, drawing on 28,735 GitHub Security Advisories and 295,218 NVD CVEs. Resolves to the file and version level, not just the package.

Explore the security dataset →

Geo provenance dataset

eographic origin metadata for code contributions across the index, derived from commit signals and repository hosting. Built for export-control compliance and supply-chain due diligence.

Explore the geo provenance dataset →

Evaluating SCANOSS?

Frame (1)