Skip to content

Commit

Permalink
Added chunk_count to hld, fixed bs4 warning suppression
Browse files Browse the repository at this point in the history
  • Loading branch information
0x41424142 committed Jul 24, 2024
1 parent acbe04f commit 3a4efe3
Show file tree
Hide file tree
Showing 4 changed files with 78 additions and 57 deletions.
36 changes: 18 additions & 18 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# qualyspy - A Python Package for Interacting With Qualys APIs
# qualyspy - A Python Package for Interacting With Qualys APIs
```
··············································
: ____ _ :
Expand Down Expand Up @@ -188,7 +188,7 @@ You can use any of the VMDR endpoints currently supported:
## Host List Detection
```vmdr.get_hld()``` is the main API for extracting vulnerabilities out of the Qualys platform. It is one of the slowest APIs to return data due to Qualys taking a while to gather all the necessary data, but is arguably the most important. Pagination is controlled via the ```page_count``` parameter. By default, this is set to ```"all"```, pulling all pages. You can specify an int to limit pagination, as well as ```truncation_limit``` to specify how many hosts should be returned per page.

This function implements threading to significantly speed up data pulls. The number of threads is controlled by the ```threads``` parameter, which defaults to 5. A ```Queue``` object is created, containing chunks of hostIDs (pulled via ```get_host_list``` with ```details=None```) that the threads pop from. The threads then call the ```hld_backend``` function with the hostIDs they popped from the queue. The user can control how many IDs are in a chunk via the ```chunk_size``` parameter, which defaults to 3000. You should create a combination of ```threads``` and ```chunk_size``` that keeps all threads busy, while respecting your Qualys concurrency limit.
This function implements threading to significantly speed up data pulls. The number of threads is controlled by the ```threads``` parameter, which defaults to 5. A ```Queue``` object is created, containing chunks of hostIDs (pulled via ```get_host_list``` with ```details=None```) that the threads pop from. The threads then call the ```hld_backend``` function with the hostIDs they popped from the queue. The user can control how many IDs are in a chunk via the ```chunk_size``` parameter, which defaults to 3000. You should create a combination of ```threads``` and ```chunk_size``` that keeps all threads busy, while respecting your Qualys concurrency limit. There is also the ```chunk_count``` parameter, which controls how many chunks a thread will pull out of the ```Queue``` before it exits.

Some important kwargs this API accepts:
|Kwarg| Possible Values |Description|
Expand Down Expand Up @@ -260,8 +260,8 @@ This collection of APIs allows for the management of IP addresses/ranges in VMDR
---
### Get IP List API

The ```get_ip_list()``` API returns a list of all IP addresses or ranges in VMDR, matching the given kwargs. Acceptable args/kwargs are:
|Arg/Kwarg| Possible Values |Description|Required|
The ```get_ip_list()``` API returns a list of all IP addresses or ranges in VMDR, matching the given kwargs. Acceptable params are:
|Parameter| Possible Values |Description|Required|
|--|--|--|--|
|```auth```|```qualyspy.auth.BasicAuth```|The authentication object.||
|```ips```|```str(<ip_address/range>)``` or ```BaseList[str, IPV4Address, IPV4Network, IPV6Address, IPV6Network]```|The IP address or range to search for.||
Expand All @@ -287,8 +287,8 @@ external_ips = [i for i in get_ip_list(auth) if not i.is_private]
```
---
### Add IPs API
The ```add_ips()``` API allows for the addition of IP addresses or ranges to VMDR. Acceptable args/kwargs are:
|Arg/Kwarg| Possible Values |Description|Required|
The ```add_ips()``` API allows for the addition of IP addresses or ranges to VMDR. Acceptable params are:
|Parameter| Possible Values |Description|Required|
|--|--|--|--|
|```auth```|```qualyspy.auth.BasicAuth```|The authentication object.||
|```ips```|```str(<ip_address/range>)``` or ```BaseList[str, IPV4Address, IPV4Network, IPV6Address, IPV6Network]```|The IP address or range to add.||
Expand Down Expand Up @@ -318,8 +318,8 @@ add_ips(auth, ips='1.2.3.4', enable_vm=True)
```
---
### Update IPs API
The ```update_ips()``` API allows for the modification of IP addresses or ranges in VMDR. Acceptable args/kwargs are:
|Arg/Kwarg| Possible Values |Description|Required|
The ```update_ips()``` API allows for the modification of IP addresses or ranges in VMDR. Acceptable params are:
|Parameter| Possible Values |Description|Required|
|--|--|--|--|
|```auth```|```qualyspy.auth.BasicAuth```|The authentication object.||
|```ips```|```str(<ip_address/range>)``` or ```BaseList[str, IPV4Address, IPV4Network, IPV6Address, IPV6Network]```|The IP address or range to update.||
Expand Down Expand Up @@ -352,8 +352,8 @@ This collection of APIs allows for the management of asset groups (AGs) in VMDR,

### Get Asset Group List API

The ```get_ag_list()``` API returns a list of all AGs in VMDR, matching the given kwargs. Acceptable args/kwargs are:
|Arg/Kwarg| Possible Values |Description|Required|
The ```get_ag_list()``` API returns a list of all AGs in VMDR, matching the given kwargs. Acceptable params are:
|Parameter| Possible Values |Description|Required|
|--|--|--|--|
|```auth```|```qualyspy.auth.BasicAuth```|The authentication object.||
|```page_count```|```Literal['all']``` (default), ```int >= 0```| How many pages to pull. Note that ```page_count``` does not apply if ```truncation_limit``` is set to 0, or not specified.||
Expand All @@ -377,8 +377,8 @@ ag_list = get_ag_list(auth)
```

### Add Asset Group API
The ```add_ag()``` API allows for the addition of asset groups to VMDR. Acceptable args/kwargs are:
|Arg/Kwarg| Possible Values |Description|Required|
The ```add_ag()``` API allows for the addition of asset groups to VMDR. Acceptable params are:
|Parameter| Possible Values |Description|Required|
|--|--|--|--|
|```auth```|```qualyspy.auth.BasicAuth```|The authentication object.||
|```title```|```str```|The title of the asset group.||
Expand Down Expand Up @@ -410,9 +410,9 @@ add_ag(auth, title='My New Asset Group')
```

### Edit Asset Group API
The ```edit_ag()``` API allows for the modification of asset groups in VMDR. Acceptable args/kwargs are:
The ```edit_ag()``` API allows for the modification of asset groups in VMDR. Acceptable params are:

|Arg/Kwarg| Possible Values |Description|Required|
|Parameter| Possible Values |Description|Required|
|--|--|--|--|
|```auth```|```qualyspy.auth.BasicAuth```|The authentication object.||
|```id```|```Union[AssetGroup, BaseList[AssetGroup, int, str], str]```|The ID of the asset group to edit.||
Expand Down Expand Up @@ -457,9 +457,9 @@ edit_ag(auth, id=12345, set_title='My New Asset Group Title')
---

### Delete Asset Group API
The ```delete_ag()``` API allows for the deletion of asset groups in VMDR. Acceptable args/kwargs are:
The ```delete_ag()``` API allows for the deletion of asset groups in VMDR. Acceptable params are:

|Arg/Kwarg| Possible Values |Description|Required|
|Parameter| Possible Values |Description|Required|
|--|--|--|--|
|```auth```|```qualyspy.auth.BasicAuth```|The authentication object.||
|```id```|```Union[AssetGroup, BaseList[AssetGroup, int, str], str]```|The ID of the asset group to delete.||
Expand Down Expand Up @@ -502,9 +502,9 @@ The ```VMScan``` dataclass is used to store the various fields that the VMDR VM
---
### Get Scan List API

The ```get_scan_list()``` API returns a list of all VM scans in VMDR, matching the given kwargs. Acceptable args/kwargs are:
The ```get_scan_list()``` API returns a list of all VM scans in VMDR, matching the given kwargs. Acceptable params are:

|Arg/Kwarg| Possible Values |Description|Required|
|Parameter| Possible Values |Description|Required|
|--|--|--|--|
|```auth```|```qualyspy.auth.BasicAuth```|The authentication object.||
|```scan_ref```|```str```|The reference string of the scan to search for. Formatted like: ```scan/123455677```||
Expand Down
24 changes: 10 additions & 14 deletions qualyspy/vmdr/data_classes/detection.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,20 +4,14 @@

from dataclasses import dataclass, field
from typing import *
from warnings import filterwarnings
from datetime import datetime
from warnings import catch_warnings, simplefilter

from bs4 import BeautifulSoup, MarkupResemblesLocatorWarning
from bs4 import BeautifulSoup

from .qds_factor import QDSFactor
from .qds import QDS as qds

filterwarnings(
"ignore", category=MarkupResemblesLocatorWarning, module="bs4"
) # supress bs4 warnings

filterwarnings("ignore", category=UserWarning, module="bs4")


@dataclass(order=True)
class Detection:
Expand Down Expand Up @@ -148,12 +142,14 @@ def __post_init__(self):
setattr(self, field, datetime.fromisoformat(getattr(self, field)))

# clean up fields that have html tags
for field in HTML_FIELDS:
setattr(
self,
field,
BeautifulSoup(getattr(self, field), "html.parser").get_text(),
)
with catch_warnings():
simplefilter("ignore") # ignore the warning about the html.parser
for field in HTML_FIELDS:
setattr(
self,
field,
BeautifulSoup(getattr(self, field), "html.parser").get_text(),
)

# convert the BOOL_FIELDS to bool
for field in BOOL_FIELDS:
Expand Down
23 changes: 11 additions & 12 deletions qualyspy/vmdr/data_classes/kb_entry.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,9 @@
from dataclasses import dataclass, field
from typing import *
from datetime import datetime
from warnings import filterwarnings
from warnings import catch_warnings, simplefilter

from bs4 import BeautifulSoup, MarkupResemblesLocatorWarning
from bs4 import BeautifulSoup

from .lists import BaseList

Expand All @@ -21,9 +21,6 @@
from .compliance import Compliance
from .tag import Tag, CloudTag

# disable the warning for the bs4 module
filterwarnings("ignore", category=MarkupResemblesLocatorWarning, module="bs4")


@dataclass(order=True)
class KBEntry:
Expand Down Expand Up @@ -201,13 +198,15 @@ def __post_init__(self):
):
setattr(self, field, bool(getattr(self, field)))

for field in HTML_FIELDS:
if getattr(self, field) is not None:
setattr(
self,
field,
BeautifulSoup(getattr(self, field), "html.parser").get_text(),
)
with catch_warnings():
simplefilter("ignore") # ignore the warning about the html.parser
for field in HTML_FIELDS:
if getattr(self, field) is not None:
setattr(
self,
field,
BeautifulSoup(getattr(self, field), "html.parser").get_text(),
)

def __str__(self):
return f"{self.QID}"
Expand Down
52 changes: 39 additions & 13 deletions qualyspy/vmdr/get_host_list_detections.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,6 @@

LOCK = Lock()


def normalize_id_list(id_list):
"""
normalize_id_list - formats the kwarg ids, if it is passed.
Expand Down Expand Up @@ -163,8 +162,8 @@ def hld_backend(
Returns:
List: A list of VMDRHost objects, with their DETECTIONS attribute populated.
"""
# Set the kwargs

# Set the kwargs
kwargs["action"] = "list"
kwargs["echo_request"] = 0
kwargs["show_results"] = 1
Expand All @@ -176,7 +175,7 @@ def hld_backend(
while True:
with LOCK:
print(
f"{current_thread().name} - Pulling page for ids {kwargs.get('ids')}. KWARGS: {kwargs}"
f"{current_thread().name} - Pulling page {pulled+1} for ids {kwargs.get('ids')}. KWARGS: {kwargs}"
)

# make the request:
Expand Down Expand Up @@ -277,14 +276,19 @@ def create_id_queue(
else:
id_list = pull_id_set(auth)

if not id_list:
raise QualysAPIError("No IDs returned from API.")

print(f"ID set pulled. Total IDs: {len(id_list)}")

id_queue = Queue()

for i in range(0, len(id_list), chunk_size):
id_queue.put(id_list[i : i + chunk_size])

print(f"Queue created with {id_queue.qsize()} chunks of ~{chunk_size} IDs each.")
singular_chunk = True if id_queue.qsize() == 1 else False

print(f"Queue created with {id_queue.qsize()} {'chunks' if not singular_chunk else 'chunk'} of ~{chunk_size} IDs{' each.' if not singular_chunk else '.'}")

return id_queue

Expand All @@ -294,6 +298,7 @@ def get_hld(
chunk_size: int = 3000,
threads: int = 5,
page_count: Union[int, "all"] = "all",
chunk_count: Union[int, "all"] = "all",
**kwargs,
) -> List:
"""
Expand All @@ -305,13 +310,18 @@ def get_hld(
chunk_size (int): The size of each chunk. Defaults to 3000.
threads (int): The number of threads to use. Defaults to 5.
page_count (Union[int, "all"]): The number of pages to retrieve. Defaults to "all".
chunk_count (Union[int, "all"]): The number of chunks to retrieve. Defaults to "all".
**kwargs: Additional keyword arguments to pass to the API. See qualyspy.vmdr.get_host_list_detections.hld_backend() for details.
Returns:
BaseList: A list of VMDRHost objects, with their DETECTIONS attribute populated.
"""

# First, make sure the user hasnt set threads to more than the cpu count
#Ensure that threads, chunk_size, and page_count (if not 'all') are all integers above 0
if any([threads < 1, chunk_size < 1, (page_count != "all" and page_count < 1), (chunk_count != "all" and chunk_count < 1)]):
raise ValueError("threads, chunk_size, page_count (if not 'all') and chunk_count (if not 'all') must all be integers above 0.")

# Make sure the user hasn't set threads to more than the cpu count
if threads > cpu_count():
print(
f"Warning: The number of threads ({threads}) is greater than the number of CPUs ({cpu_count()}). This may cause performance issues."
Expand All @@ -325,9 +335,12 @@ def get_hld(
)
threads = rl["X-Concurrency-Limit-Limit"]

print("Pulling ID set...")
print(f"Pulling/creating queue for full ID list...") if not kwargs.get("ids") else print(
f"Pulling/creating queue for user-specified IDs: {kwargs.get('ids')}..."
)

id_queue = create_id_queue(auth, chunk_size=chunk_size, ids=kwargs.get("ids"))
print(f"Starting get_hld with {threads} threads.")
print(f"Starting get_hld with {threads} {'threads.' if threads > 1 else 'thread.'}")

threads_list = []

Expand All @@ -336,7 +349,7 @@ def get_hld(
for i in range(threads):
thread = Thread(
target=threaded_hld_worker,
args=(auth, id_queue, responses, page_count, kwargs),
args=(auth, id_queue, responses, page_count, chunk_count, kwargs),
)
threads_list.append(thread)
thread.start()
Expand All @@ -353,6 +366,7 @@ def threaded_hld_worker(
id_queue: Queue,
responses: BaseList,
page_count: Union[int, "all"],
chunk_count: Union[int, "all"],
kwargs,
):
"""
Expand All @@ -363,29 +377,41 @@ def threaded_hld_worker(
id_queue (Queue): The queue of host IDs to pull.
responses (BaseList): The list of responses to append to.
page_count (Union[int, "all"]): The number of pages to retrieve. Defaults to "all".
chunk_count (Union[int, "all"]): The number of chunks to retrieve. Defaults to "all".
**kwargs: Additional keyword arguments to pass to the API. See get_hld() for details.
"""
while True:
pulled = 0
pages_pulled = 0
chunks_pulled = 0
try:
ids = id_queue.get()
ids = id_queue.get_nowait() #nowait allows us to check if the queue is empty without blocking
except Empty:
with LOCK:
print(f"{current_thread().name} - Queue is empty. Terminating thread.")
break

if not ids:
with LOCK:
print(f"{current_thread().name} - No IDs to pull. Terminating thread.")
break

kwargs["ids"] = f"{ids[0]}-{ids[-1]}"
responses.extend(hld_backend(auth, page_count=page_count, **kwargs))
id_queue.task_done()
with LOCK:
print(f"{current_thread().name} - Chunk complete.")
pulled += 1
pages_pulled += 1
chunks_pulled += 1
# check if the queue is empty, or if the threads are done (via pulled var)
if id_queue.empty():
with LOCK:
print(f"{current_thread().name} - Queue is empty. Terminating thread.")
break
if pulled == page_count:
if pages_pulled == page_count:
with LOCK:
print(f"{current_thread().name} - Thread has pulled all pages. Terminating thread.")
break
if chunks_pulled == chunk_count:
with LOCK:
print(f"{current_thread().name} - Pulled all pages. Returning.")
print(f"{current_thread().name} - Thread has pulled all chunks. Terminating thread.")
break

0 comments on commit 3a4efe3

Please sign in to comment.