Skip to main content

1.5 Vulnerability Repository

Vulnerability Repository is a collection of Vulnerability Sources (VulSource).
Vulnerability Repository scans public vulnerability databases periodically and builds Supported Open-Source Projects from source. The public vulnerability databases are first processed into a baseline Vulnerability structure:

FieldTypeDescriptionExample
IDstringCNVD/CNNVD/CVE IDCNVD-2025-03269
TitlestringShort description of the vulnerabilitySAP NetWeaver Application Server Java 跨站脚本漏洞
Products[]stringList of products affected by the vulnerabilitySAP SAP NetWeaver Application Server Java null
RiskLevelintRisk level (-1: Unknown, 0: Critical, 1: High, 2: Medium, 3: Low)2
CVEIDstringRelated CVE ID (if applicable)CVE-2023-12345
SubmittedDatestringDate of submission (optional)2025-02-19
PublishedDatestringDate of publication2025-02-20
UpdatedDatestringDate of update (optional)2025-02-21
DescriptionstringDescription of the vulnerability (HTML escaped)SAP NetWeaver Application Server Java...
TypestringType or category of the vulnerability通用型漏洞
PatchstringPatch information (if applicable)目前厂商已发布升级补丁以修复漏洞...

1.5.1 National Vulnerability Database (NVD)

CVE: Common Vulnerabilities and Exposures is hosted by NIST under NVD - Vulnerabilities.
They are available in the GitHub repo CVEProject/cvelistV5: CVE cache of the official CVE List in CVE JSON 5 format. This repo contains CVE from project start (1999) and updated regularly.

Each CVE consists of its CVE-ID, publish date, description, associate reference links, vulnerable product configuration, CWE weakness categorization, Common Vulnerability Scoring System (CVSS) and other metadata.

1.5.1.1 Archival Data

cvelistV5 is cloned and processed to generate VulSource.

1.5.1.2 Live Data

Vulnerability Repository synchronize cvelistV5 with upstream to get the latest CVE listing.
Then the updated CVEs are processed to generate VulSource.

This task is scheduled to run once a week.

1.5.2 China National Vulnerability Repository of Information Security (CNNVD)

There's no consistent method to directly associate open-source projects from CNNVD listing. As most of CNNVD references to CVE, project filter is not applied during scraping and use CNNVD info to enrich the corresponding CVE vulnerability.

1.5.2.1 Archival Data

Archival data is available as daily, monthly or yearly zip from 国家信息安全漏洞库.
This download requires login.

Findings

1.5.2.2 Live Data

An automated browser is used to visit the public listing at https://www.cnnvd.org.cn/home/loophole and fetch CNNVD vulnerabilities newer than a "CNNVD last fetch date" stored in Vulnerability Repository's database.

This task is scheduled to run once a week.

1.5.3 China National Vulnerability Repository (CNVD)

There's no consistent method to directly associate open-source projects from CNNVD listing. As most of CNNVD references to CVE, we do not apply project filter during scraping and use CNNVD info to enrich the corresponding CVE vulnerability.

1.5.3.1 Archival Data

Archival data is available as weekly XML from 国家信息安全漏洞共享平台. This download requires login and solving captcha.

Findings

1.5.3.2 Live Data

The customized browser is used to visit the public listing at https://www.cnvd.org.cn/flaw/list and fetch CNVD vulnerabilities newer than a "CNVD last fetch date" stored in Vulnerability Repository's database.

This task is scheduled to run once a week.

1.5.3.3 Anti-bot measures

The CNVD website employs anti-bot protections, including a script that checks for the presence of webdriver in the window object. To enable automated browsing, we need to customize the browser to remove property.

1.5.4 VulSource

A VulSource for a Supported Open-Source Project consists of:

  • corresponding pseudo assembly codes before and after the vulnerability fix
  • syntactic features for code blocks related to the vulnerability
  • meta-data for reporting:
    • path of source code in project tree
    • mapping of source code to pseudo assembly code (in line numbers)

for each vulnerability in the CVE.

VulSource's are is deployed manually to SourceGuard, via offline archive or downloading from remote Vulnerability Repository. It can be customized according to license agreement or the use case of the customer.

SourceGuard's /vulsource/ API query the local VulSource snapshot.

1.5.4.1 Vulnerability Source Preparation

  1. Generate baseline Vulnerabilitys for NVD.
  2. We use Products field to filter the Supported Open-Source Project.
  3. We developed a Vulnerability Enrichment Tool to scrape GitHub using both GitHub API and web scraping technology to enrich a vulnerability.
  4. For each Supported Open-Source Project we:
    1. enrich the vulnerabilities with associate reference links, vulnerable product configuration, affected versions, commit IDs and version to date mappings using Vulnerability Enrichment Tool
    2. enrich the vulnerabilities with related vulnerabilities in CNNVD and CNVD
    3. do Vulnerability Feature Extraction (see below) for the vulnerabilities
    4. add the pseudo assembly code and source code diff to the code blocks relating to the vulnerability fix to this vulnerability
  5. Expose the database and Vulnerability Features with FastAPI-based web service

Sample of an enriched vulnerability for Linux kernel:

{
"id": "CVE-2022-23222",
"note": "NULL Pointer Dereference",
"poc_collected": true,
"references": {
"Ubuntu": "https://ubuntu.com/security/CVE-2022-23222",
"Debian": "https://security-tracker.debian.org/tracker/CVE-2022-23222",
"SUSE": "https://www.suse.com/security/cve/CVE-2022-23222",
"Red Hat": "",
"ExploitDB": "https://www.exploit-db.com/search?cve=2022-23222",
"NVD": "https://nvd.nist.gov/vuln/detail/CVE-2022-23222"
},
"format": "sourceguard-collected",
"description": "kernel/bpf/verifier.c in the Linux kernel through 5.15.14 allows local users to gain privileges because of the availability of pointer arithmetic via certain *_OR_NULL pointer types.",

"poc_location": "https://github.com/JlSakuya/Linux-Privilege-Escalation-Exploits/tree/main/2022/CVE-2022-23222",
"fix_info": {
"breaks": "",
"guidance": "v5.17",
"message": "",
"fixes": "c25b2ae136039ffa820c26138ed4a5e5f3ab3841"
},
"repositories": [
{
"repo_label": "CNNVD",
"note": "Linux kernel 代码问题漏洞",
"description": "Linux kernel是美国Linux基金会的开源操作系统Linux所使用的内核。 Linux kernel 5.15.14及之前版本存在代码问题漏洞,攻击者可利用该漏洞获得特权。",
"id": "CNNVD-202201-1165"
},
{
"repo_label": "CNVD",
"note": "Linux kernel代码问题漏洞(CNVD-2022-06892)",
"description": "Linux kernel是美国Linux基金会的开源操作系统Linux所使用的内核。\nLinux kernel 5.15.14及之前版本存在安全漏洞,攻击者可利用该漏洞获得特权。",
"id": "CNVD-2022-06892"
},
{
"repo_label": "NVD",
"note": "NULL Pointer Dereference",
"description": "kernel/bpf/verifier.c in the Linux kernel through 5.15.14 allows local users to gain privileges because of the availability of pointer arithmetic via certain *_OR_NULL pointer types.",
"id": "CVE-2022-23222"
}
],
"affected_versions": {
"date_mappings": {
"v5.17-rc1": "2022-01-22T16:00:00Z",
"v2.6.12-rc2": "2005-04-15T16:00:00Z"
},
"last_affected": "[UNKNOWN]",
"from": "v2.6.12-rc2",
"to": "v5.17-rc1"
},
"cvss": {
"risk_level": 1,
"score": 7.800000190734863,
"v2": 7.199999809265137,
"v3": 7.800000190734863
}
}

Pseudo assembly code diff for a kernel vulnerability:
Pseudo assembly code diff

Source code diff for a kernel vulnerability:
Source code diff

1.5.4.2 Vulnerability Feature Extraction

To enhance vulnerability identification, we further process Supported Open-Source Projects listed in CVE records to extract detailed vulnerability features. For each specific vulnerability, the following steps are performed:

1. Source Code Retrieval

  • Clone the source code of the relevant open-source project locally.

2. Binary Compilation

  • Build the project's binaries before and after vulnerability fix using commit IDs provided in the patch information.

  • If no fix is available:

    • Use the latest commit as the before-fix version.
    • Leave the after-fix version empty.

3. Disassembly

  • Disassemble the binaries into architecture-independent pseudo assembly code.
  • This removes variations caused by different CPU architectures and compiler settings.

4. Syntactic Feature Extraction

For each code block, analyze the pseudo assembly code to extract the following features:

  • Unmangled Function Name
    Since the binaries are built from source, we can reconstruct the un-mangled function name using its scope and symbol information.
    Example: std::ostream& std::operator<< <std::char_traits<char>>(std::ostream&, char const*)
  • Function Parameter Lists
    The type and order of input and output parameters are extracted.
  • Constant Values
    The declaration site and constant names and values are extracted.
  • Global Variable Usage
    The calling site and global variable called are extracted.
  • External Function Calls
    The calling site and the function called are extracted.
  • Local Variable Count
    The calling site and the number of local variables are extracted.

Vulnerability Feature Extraction

5. Integration with Hybrid Vulnerability Identification Engine

  • The extracted syntactic features are used in the Static Analysis phase to quickly identify code blocks related to the vulnerability.
  • The pseudo assembly code from both before and after vulnerability fix is also used in the Dynamic Analysis phase to determine if the vulnerability if found in the binary.

For more details, refer to Hybrid Vulnerability Identification Engine.