SCANOSS knowledge base
Breakdown by source
Switch dimensions to see how each repository contributes to the index.
How the KB is built
The SCANOSS KB indexes every file from every public open source repository we can reach — package registries, source forges, code hosts, and OS-level distribution mirrors. Each file is fingerprinted at the snippet level using winnowing, which means we don’t just identify whole files: we can match a function, a copied code block, or a single suspect line back to its origin in the wider open source ecosystem.
That’s the difference between SCA tools that match by package name and tools that match by content. Package-level matching tells you which dependencies you’ve declared. Snippet-level matching tells you what’s actually in your code, including code that was copy-pasted, vendored, modified, or AI-generated without attribution.
Each file is also tied to its origin URL and its package coordinates (PURLs), so the KB resolves cleanly into SBOMs, license obligations, encryption inventories, and vulnerability mappings. New sources are added continuously, and the index is refreshed monthly.
Four datasets, one foundation
Licence dataset
Every file in the KB resolved to its declared and detected licenses, with obligation metadata, compatibility classifications, and SPDX mappings. The substrate for SBOM generation and license-risk analysis.
Encryption dataset
Every cryptographic algorithm, primitive, and library detected across the index, including their post-quantum readiness status. The foundation for cryptographic asset inventories and quantum-migration planning.
Security dataset
Vulnerabilities mapped directly to the code where they live, drawing on 28,735 GitHub Security Advisories and 295,218 NVD CVEs. Resolves to the file and version level, not just the package.
Geo provenance dataset
eographic origin metadata for code contributions across the index, derived from commit signals and repository hosting. Built for export-control compliance and supply-chain due diligence.