Vulnerability database that connects: Vulners data overview

This post thumbnail

Vulnerability data is only useful when it’s connected. A CVE ID by itself doesn’t help much until you can see its metrics, what products and versions are affected, whether it’s exploited in the wild, and whether exploit code is floating around. That’s the core idea behind how we structure Vulners: we turn “a string identifier” into a record that’s already tied to the signals you’ll use for triage, automation, and audits.

Vulners records are designed to be usable, not just readable: CVE Fusion links scoring (CVSS/CWE), prioritization signals (EPSS/SSVC), exploited-in-the-wild flags (KEV/telemetry), exploit PoCs, web applicability hints, and affected-product scope (CNA/NVD plus SCAP-compatible CPE configs). This post walks through the main collections — CVE, vendor and Linux bulletins, and OSV — and shows the key pivots and queries that make the data practical for audits, SBOM workflows, and automation.

This post is a walkthrough of what we store today and how those pieces relate. It’s also intentionally link-heavy — the links are the point — so you can jump straight into the database and explore each part yourself.


CVE Fusion: one record that pulls the important context together

If you start with CVE IDs, we have what we call CVE Fusion — a collection that links most of the related metrics in one data structure.

One detail that matters in practice: CVE records in this collection can be created before they are published by the CVE program. This happens when we first spot a new CVE ID in any of our monitored sources — meaning you can often track “what’s coming” earlier than you’d expect from the canonical publication timeline.

Once the CVE exists in Vulners, we connect the usual scoring and classification signals. That includes CVSS and CWE, pulled from both CNA and NVD when available. We also add probability and prioritization-style signals such as EPSS and SSVC.

A note on SSVC: it’s one of the “exploitation maturity” signals, but the source tends to provide the value at publication time and does not reliably update it afterward.


Exploited in the wild as a first-class signal

From an operator’s point of view, “is it exploited?” is often more important than “how high is the score?”. That’s why we maintain a dedicated flag: wildExploited:true.

This flag isn’t a single feed — it’s an aggregation of sources that represent either authoritative listings or direct observations.

A common starting point is CISA KEV: we set the flag when a CVE appears in the KEV listing, and we also keep KEV as a standalone source so you can query it independently.

We also source exploitation signals from AttackerKB, from ShadowServer observations (these keys will be reorganized soon), and from CIRCL observations.

The practical intent here is simple: if you filter your triage queue down to wildExploited:true, you’re starting from “things that matter in real life”, not just “things that look scary on paper”.


Exploit PoCs: collected, scored, and tied back to CVEs

Another “real life” signal is exploit code. It’s not perfect — PoC availability doesn’t always mean widespread exploitation — but it’s a strong indicator for validation and detection work.

We collect exploit PoCs from 15 sources, including GitHub and Gitee. For those platforms we use a proprietary algorithm to score repositories and identify which ones are likely to be actual exploits (as opposed to references, mirrors, or unrelated content).

What matters for workflow automation is that these PoCs aren’t floating separately — they are directly linked to CVEs. We also analyze CVE record references with our algorithm and flag many “vulnerability confirmation” repos as exploit PoCs too.

And because exploit repos tend to disappear (DMCA, account deletions, repo cleanup), we keep direct links to sources, and we don’t delete our copy even if the original repo gets removed.


Web Applicability: is this exploitable on the web surface?

Some signals are less about severity and more about where this matters. One example is our internal web applicability enrichment for CVEs.

When a vulnerability is web-relevant, this includes application paths and parameters that can be used by attackers. This is typically what a responder ends up extracting manually when they’re rushing: what endpoint, what parameter, what request shape? Having it structured means faster validation, faster rule-writing, and faster “does this apply to my app?” filtering.


Affected products and versions: verbatim where possible, normalized where it helps

Once you’ve decided a vulnerability matters, the next problem is always: do we have it? That boils down to affected products and versions.

We carry CNA affected and cpeApplicablity containers of the CVE Record verbatim.

When available, we also include NVD CPE configurations, and CNA CPE Configurations (the later are based on cpeApplicablity containers).

Then comes the part that improves coverage in practice: for approximately 50 CNAs we generate Vulners CPE configurations in CVE records.

All these allow us to make approximately 85% of CVEs searchable with our Software Audit API. And because these configurations are in SCAP format, you can reuse code built for NVD-style matching on them.


Not just CVEs: making vendor bulletins searchable too

If you’ve ever tracked a vendor advisory that never got a CVE (or got one months later), you know why this matters.

We also generate SCAP compatible CPE configurations for security bulletins from approximately 50 vendors (Software family). That means you can search and match bulletins even when no CVE exists, or when the CVE mapping is incomplete, or when the bulletin is the only place where affected scope is clearly described.

When a CVE ID appears in a bulletin, the record stops being a standalone advisory and becomes a node in a larger graph — connected to scoring, exploitation signals, and PoC data contributed by other sources around the same identifier.


Linux distributions: affected packages, normalized correctly

Linux advisories are a separate world: packages, epochs, distro-specific versioning, backports, and "fixed" not meaning "upstream fixed".

For over 30 Linux distributions (Unix family) we carry security bulletins with affected packages configurations normalized using distribution versioning rules. These configurations are used by our Linux Audit API, because without normalization you can't reliably automate "is my installed package version affected?".

Distro advisories almost always reference CVEs, and that reference runs in both directions: from a package fix notice you can follow the CVE ID outward to scoring and exploitation context; from a CVE you can follow it inward to see how each distribution resolved it and at which version.


OSV: package-native vulnerabilities and SBOM workflows

CVE-centric workflows work well for classic vendor software and operating systems. For open-source dependency chains, OSV is often a better fit. That's why another entry into the database is the OSV collection, which is based on osv.dev.

We use the affectedLibraries container to normalize affected data across the OSV records. This container includes PURL and registry/ecosystem/package values as well as affected versions ranges. This container powers our Package Audit and SBOM Audit APIs.

Where an OSV record carries a CVE alias, it inherits the connections that CVE has already accumulated elsewhere in the database — exploitation flags, scoring, vendor bulletins. The alias is a lens, not a copy: the same vulnerability, rendered in the vocabulary that package ecosystems and SBOM tooling actually speak.


Timestamps: understanding the lifecycle of a record

Starting July 2025, all records in Vulners database records have a timestamp structure which alongside published and updated records from sources indicate when the record has been created, updated, enriched from other sources, compared to original record and so on.

This becomes valuable as soon as you ask questions like: did this record change because the upstream advisory changed, or because an enrichment arrived? It’s also essential for users who want reproducibility in pipelines and want to reason about the “age” of a signal inside a record.


Closing thoughts: records should behave like a graph, not a page

If you take one idea from this post, let it be this: a vulnerability database is most useful when records behave like a graph of signals — identifiers connected to metrics, exploitation, PoCs, applicability and affected scope — and when those connections are queryable.

A good way to explore this in Vulners is to start from CVE Fusion, then pivot depending on the question you’re trying to answer:

That’s the model in practice: start with the identifier, enrich it into a fusion, and keep pivots cheap — because the workflow cost is usually what hurts, not the data itself.