Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add vulnerability location + dna #120

Closed
wants to merge 1 commit into from
Closed

Conversation

ErebusZ
Copy link
Member

@ErebusZ ErebusZ commented Feb 27, 2025

This PR computes and adds the dna attribute to the whatweb agent:
dna schema:

{
    "location": "location vulnerability showcasing the file path and bundle/package_name",
    "vuln_data": "vulnerability info containing target, library_type, library name."
}

@ErebusZ ErebusZ requested a review from a team as a code owner February 27, 2025 18:41
@ErebusZ ErebusZ added the github label Feb 27, 2025
Copy link

codecov bot commented Feb 27, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 97.58%. Comparing base (f4e84d5) to head (e5fb8ca).

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #120      +/-   ##
==========================================
+ Coverage   97.51%   97.58%   +0.07%     
==========================================
  Files           3        3              
  Lines         683      703      +20     
==========================================
+ Hits          666      686      +20     
  Misses         17       17              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@oussamaessaji oussamaessaji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see the location in this PR

Copy link

@ostorlab-ai-pr-review ostorlab-ai-pr-review bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review Summary

Total Issues Found: 4
Critical Issues: 1
Suggestions: 3

Key Findings:
The review identified four issues across the codebase. In 'agent/whatweb_agent.py', there are suggestions to add inline documentation to clarify the structure of 'vulnerable_target_data', update the type hint for 'vulnerability_location' to Optional[str], and importantly, implement inline validations to ensure required keys are present—this validation issue is critical as it can lead to runtime errors if data is malformed. In 'tests/whatweb_test.py', replacing the hard-coded JSON string with a dictionary constant (and comparing with json.loads) was suggested to improve readability and ensure order-insensitive comparisons. Overall, the code demonstrates a good base structure but would benefit significantly from improved documentation, stricter type handling, and robust validations to prevent potential issues.

@@ -477,6 +477,9 @@ def _send_detected_fingerprints(
elif isinstance(target, IPTarget) and target.version == 6:
self.emit(selector=IP_V6_LIB_SELECTOR, data=msg_data)

dna = _prepare_vulnerability_dna(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Add documentation and inline comments to clarify the expected structure of vulnerable_target_data. 2. Update the type hint for vulnerability_location to Optional[str] to denote that it may be None. 3. Add inline validations to ensure required keys are present in vulnerable_target_data.
Suggested change
dna = _prepare_vulnerability_dna(
from typing import Dict, Optional
def _prepare_vulnerability_dna(self, vulnerable_target_data: Dict, vulnerability_location: Optional[str]) -> str:
"""
Compute the DNA attribute for the vulnerability.
Parameters:
vulnerable_target_data (Dict): A dictionary representing the target vulnerability data. Expected structure:
{
"id": <vulnerability identifier>,
"name": <vulnerability name>,
"details": <vulnerability details>,
// Optionally, "version": <version string> if applicable
}
vulnerability_location (Optional[str]): The location of the vulnerability. If None, an empty string is used.
Returns:
str: The computed DNA string.
"""
# Validate required keys in vulnerable_target_data
required_keys = ["id", "name", "details"]
for key in required_keys:
if key not in vulnerable_target_data:
raise ValueError(f"Missing required key '{key}' in vulnerable_target_data.")
# Ensure vulnerability_location is a valid string
if vulnerability_location is None:
vulnerability_location = ""
# Compute the DNA attribute by concatenating key components
dna_components = [
str(vulnerable_target_data.get("id")),
vulnerable_target_data.get("name", ""),
vulnerability_location
]
dna_str = "-".join(dna_components)
return dna_str

@@ -118,6 +118,11 @@ def testWhatWebAgent_withLinkMsgAndAllChecksEnabled_emitsFingerprints(
assert any(
vuln_msg.data.get("security_issue") is True for vuln_msg in agent_mock
)
assert any(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replace the hard-coded JSON string with a dictionary constant and compare using json.loads to ensure order-insensitive comparisons. This also improves readability by breaking long string literals.

Suggested change
assert any(
import json
# Define the expected DNA output as a constant dictionary
EXPECTED_DNA = {
"info": {
"subinfo": "value",
"another": "value"
}
}
# Use json.loads to parse the actual output and compare with the expected dictionary
self.assertEqual(json.loads(result["dna"]), EXPECTED_DNA)

Comment on lines +122 to +125
vuln_msg.data.get("dna")
== '{"location": {"domain_name": {"name": "ostorlab.co"}, "metadata": [{"type": "PORT", "value": "80"}]}, "vuln_data": {"detail": "Found fingerprint `cloudflare`, of type `BACKEND_COMPONENT` in target `ostorlab.co`", "library_name": "cloudflare", "library_type": "BACKEND_COMPONENT", "name": "ostorlab.co", "port": 80, "schema": "http"}}'
for vuln_msg in agent_mock
)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do explicit check please. Same for the other asserts.

vuln_msg.data.get("dna")
== '{"location": {"ipv6": {"host": "2a00:1450:4006:80c::2004", "mask": "128", "version": 6}, "metadata": [{"type": "PORT", "value": "443"}]}, "vuln_data": {"detail": "Found fingerprint `lighttpd/1.4.28`, of type `BACKEND_COMPONENT` in target `2a00:1450:4006:80c::2004`", "host": "2a00:1450:4006:80c::2004", "library_name": "lighttpd/1.4.28", "library_type": "BACKEND_COMPONENT", "mask": "128", "port": 443, "version": 6}}'
for vuln_msg in agent_mock
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

explicit please

@@ -495,6 +498,7 @@ def _send_detected_fingerprints(
f"of type `{fingerprint_type}` in target `{target.name}`",
risk_rating=vuln_mixin.RiskRating.INFO,
vulnerability_location=vulnerable_target_data,
dna=dna,
Copy link
Member

@deadly-panda deadly-panda Feb 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an info finding, skip it please, we don't need to compute its DNA..
Skip the whole agent.. you can close the PR.

@ErebusZ ErebusZ closed this Feb 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants