Lightfuzz cleanup #2300

liquidsec · 2025-02-20T18:23:22Z

Final cleanup and preparation for move to dev

codecov · 2025-02-20T19:14:07Z

Codecov Report

Attention: Patch coverage is 88.60759% with 18 lines in your changes missing coverage. Please review.

Project coverage is 93%. Comparing base (7800fb3) to head (95bf6e1).
Report is 52 commits behind head on lightfuzz.

Files with missing lines	Patch %	Lines
bbot/modules/lightfuzz/lightfuzz.py	80%	5 Missing ⚠️
bbot/modules/lightfuzz/submodules/crypto.py	87%	4 Missing ⚠️
bbot/modules/lightfuzz/submodules/serial.py	72%	4 Missing ⚠️
bbot/modules/lightfuzz/submodules/cmdi.py	67%	2 Missing ⚠️
bbot/modules/lightfuzz/submodules/xss.py	75%	1 Missing ⚠️
bbot/test/test_step_1/test__module__tests.py	67%	1 Missing ⚠️
.../test_step_2/module_tests/test_module_lightfuzz.py	93%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff            @@
##           lightfuzz   #2300   +/-   ##
=========================================
+ Coverage         93%     93%   +1%     
=========================================
  Files            395     396    +1     
  Lines          32528   32586   +58     
=========================================
+ Hits           30088   30161   +73     
+ Misses          2440    2425   -15

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

…curity/bbot into lightfuzz-cleanup

liquidsec · 2025-02-24T20:07:32Z

pyahocorasick benchmark

Total time for string_scan with text size 10000 and 10 substrings: 0.07 seconds
Total time for string_scan_yara with text size 10000 and 10 substrings: 1.39 seconds
Total time for Python 'in' check with text size 10000 and 10 substrings: 0.02 seconds


Total time for string_scan with text size 10000 and 100 substrings: 0.21 seconds
Total time for string_scan_yara with text size 10000 and 100 substrings: 2.97 seconds
Total time for Python 'in' check with text size 10000 and 100 substrings: 0.19 seconds


Total time for string_scan with text size 10000 and 1000 substrings: 0.97 seconds
Total time for string_scan_yara with text size 10000 and 1000 substrings: 64.44 seconds
Total time for Python 'in' check with text size 10000 and 1000 substrings: 1.88 seconds


Total time for string_scan with text size 100000 and 10 substrings: 0.57 seconds
Total time for string_scan_yara with text size 100000 and 10 substrings: 1.69 seconds
Total time for Python 'in' check with text size 100000 and 10 substrings: 0.25 seconds


Total time for string_scan with text size 100000 and 100 substrings: 1.39 seconds
Total time for string_scan_yara with text size 100000 and 100 substrings: 3.43 seconds
Total time for Python 'in' check with text size 100000 and 100 substrings: 2.40 seconds


Total time for string_scan with text size 100000 and 1000 substrings: 2.59 seconds
Total time for string_scan_yara with text size 100000 and 1000 substrings: 65.54 seconds
Total time for Python 'in' check with text size 100000 and 1000 substrings: 23.56 seconds


Total time for string_scan with text size 1000000 and 10 substrings: 5.84 seconds
Total time for string_scan_yara with text size 1000000 and 10 substrings: 3.29 seconds
Total time for Python 'in' check with text size 1000000 and 10 substrings: 2.41 seconds


Total time for string_scan with text size 1000000 and 100 substrings: 13.23 seconds
Total time for string_scan_yara with text size 1000000 and 100 substrings: 5.54 seconds
Total time for Python 'in' check with text size 1000000 and 100 substrings: 23.45 seconds



Total time for string_scan with text size 1000000 and 1000 substrings: 18.80 seconds
Total time for string_scan_yara with text size 1000000 and 1000 substrings: 69.83 seconds
Total time for Python 'in' check with text size 1000000 and 1000 substrings: 231.80 seconds

Benchmark Code:

import time
import random
import string
from bbot.core.helpers.misc import string_scan, string_scan_yara

def generate_random_text(length):
    """Generate a random string of specified length."""
    return "".join(random.choices(string.ascii_letters + string.digits, k=length))

def generate_random_substrings(count, length):
    """Generate a list of random substrings."""
    return [generate_random_text(length) for _ in range(count)]

def stress_test(iterations=1000, substring_counts=[10, 100, 1000], text_sizes=[10000, 100000, 1000000]):
    for text_size in text_sizes:
        for count in substring_counts:
      #      print(f"\nStarting stress test with {iterations} iterations, text size {text_size}, and {count} substrings...")

            # Initialize accumulators for total time
            total_ahocorasick_time = 0
            total_yara_time = 0
            total_python_in_time = 0

            for i in range(1, iterations + 1):
                # Generate a large random text and a list of random substrings
                text = generate_random_text(text_size)
                substrings = generate_random_substrings(count, 10)  # Substrings of length 10

                # Test string_scan
                start_time = time.time()
                result_ahocorasick = string_scan(substrings, text)
                ahocorasick_time = time.time() - start_time
                total_ahocorasick_time += ahocorasick_time

                # Test string_scan_yara
                start_time = time.time()
                result_yara = string_scan_yara(substrings, text)
                yara_time = time.time() - start_time
                total_yara_time += yara_time

                # Test Python 'in' keyword
                start_time = time.time()
                result_python_in = [s for s in substrings if s in text]
                python_in_time = time.time() - start_time
                total_python_in_time += python_in_time

                # Verify that all methods return the same results
                assert set(result_ahocorasick) == set(result_yara) == set(result_python_in), "Results differ between methods!"

                # Print progress every 100 iterations
            #    if i % 100 == 0:
            #        print(f"Completed {i} out of {iterations} iterations ({(i / iterations) * 100:.2f}%)")

            # Print total times
            print("\n")
            print(f"Total time for string_scan with text size {text_size} and {count} substrings: {total_ahocorasick_time:.2f} seconds")
            print(f"Total time for string_scan_yara with text size {text_size} and {count} substrings: {total_yara_time:.2f} seconds")
            print(f"Total time for Python 'in' check with text size {text_size} and {count} substrings: {total_python_in_time:.2f} seconds")

if __name__ == "__main__":
    stress_test()

string_scan_yara function used in benchmark:

def string_scan_yara(substrings, text, case_insensitive=True):
    # Create YARA rules for each substring
    rules = []
    for idx, substring in enumerate(substrings):
        rule_name = f"rule_{idx}"
        condition = f'"{substring}"'
        if case_insensitive:
            condition = f"/{substring}/ nocase"
        rule = f"rule {rule_name} {{ strings: $a = {condition} condition: $a }}"
        rules.append(rule)

    # Compile the YARA rules
    compiled_rules = yara.compile(source="\n".join(rules))

    # Scan the text
    matches = compiled_rules.match(data=text)
    found_substrings = [match.rule.split("_")[1] for match in matches]

    return found_substrings

My conclusions:

For the one use case we have currently, just using standard python x in y is probably fine
If we got above around the 100 string mark, we'd want to avoid using standard python
Any yara solution where we compile per run is not a good one, as around the time the native python solution starts to get poor so does yara because of the increased compile time
THe ahocorasick solution is clearly the best for large number of strings, and/or large text where we don't know the strings ahead of time

Id say the only viable options are:

Remove it entirely and use native python, since we still have a low string count
Keep the current pyahocorasick solution in, and keep things the same

Btt there is very little cost of any kind to having it in there

…eparately for every handle_event

This reverts commit 6548209.

This reverts commit 0d784b6.

This reverts commit c8f444c.

Use yara

liquidsec added 5 commits February 20, 2025 11:31

better naming

db06d1e

readibility

967d42e

ruff format

cbeef61

Merge branch 'lightfuzz' into lightfuzz-cleanup

fbe5916

Merge branch 'lightfuzz' into lightfuzz-cleanup

2576321

liquidsec and others added 8 commits February 20, 2025 19:08

lightfuzz refactor

95a0e97

Merge branch 'lightfuzz-cleanup' of https://github.com/blacklanternse…

369cbc0

…curity/bbot into lightfuzz-cleanup

poetry.lock

60ab11d

exception for submodules dir

34e474f

remove temp fix

c7f8755

fix tests

2622039

fix?

fbb3f20

fix weird error

88128b9

TheTechromancer mentioned this pull request Feb 21, 2025

Unhandled exception in dnsbrute_mutations.finish() #2201

Closed

liquidsec requested a review from TheTechromancer February 21, 2025 20:11

liquidsec added 2 commits February 24, 2025 14:52

remediating serial false positives

a64eef1

Merge branch 'lightfuzz-cleanup' of https://github.com/blacklanternse…

9643d17

…curity/bbot into lightfuzz-cleanup

liquidsec and others added 11 commits February 25, 2025 15:57

removing pyahocorasick function

6548209

use yara

aed8fa2

cache compiled yara rules globally, since submodule is instantiated s…

979943b

…eparately for every handle_event

Revert "removing pyahocorasick function"

0d784b6

This reverts commit 6548209.

add yara helper

630d9c5

fix conflict

dec87da

remove unused yara import

6cafa26

Reapply "removing pyahocorasick function"

3061c83

This reverts commit 0d784b6.

removing old lightfuzz log message calls

5d96d95

Merge branch 'lightfuzz-cleanup' into use-yara

732056e

bbot/modules/lightfuzz/submodules/crypto.py

c8f444c

liquidsec added 8 commits February 27, 2025 15:07

Revert "bbot/modules/lightfuzz/submodules/crypto.py"

5948108

This reverts commit c8f444c.

ruff format

4e2ea40

wtf

70c4306

Merge pull request #2317 from blacklanternsecurity/use-yara

1b2560c

Use yara

Merge branch 'lightfuzz' into lightfuzz-cleanup

db8df37

removing deadly folder, making lightfuzz deadly

6ff9326

fixing deadly folder stuff

5635113

poetry.lock

95bf6e1

liquidsec merged commit 3021ee9 into lightfuzz Feb 27, 2025
12 of 13 checks passed

liquidsec deleted the lightfuzz-cleanup branch February 27, 2025 20:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lightfuzz cleanup #2300

Lightfuzz cleanup #2300

liquidsec commented Feb 20, 2025

codecov bot commented Feb 20, 2025 •

edited

Loading

liquidsec commented Feb 24, 2025

Lightfuzz cleanup #2300

Lightfuzz cleanup #2300

Conversation

liquidsec commented Feb 20, 2025

codecov bot commented Feb 20, 2025 • edited Loading

Codecov Report

liquidsec commented Feb 24, 2025

codecov bot commented Feb 20, 2025 •

edited

Loading