-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Add dataset format Yolov8 #44
Conversation
@@ -0,0 +1,213 @@ | |||
# Copyright (C) 2019-2022 Intel Corporation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update all copyright headers please. See how it is done in other files.
META_FILE = "data.yaml" | ||
|
||
@staticmethod | ||
def _parse_config(path: str) -> Dict[str, str]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you start the method with underscore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is also used only internally in the scope of this plugin folder. I also took inspiration from the existing implementation of yolo_format
in the repo datumaro/plugins/yolo_format
. Should I remove it ?
|
||
config = {} | ||
|
||
for line in config_lines: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use pyyaml to parse YAML file. Don't reinvent the wheel: https://pyyaml.org/wiki/PyYAMLDocumentation
Use safe_load method please.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resolved. Used yaml.safe_load
@@ -0,0 +1,33 @@ | |||
# Copyright (C) 2024 CVAT.ai Corporation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please avoid code cloning. I see the same code in 4 format.py files.
|
||
|
||
class YoloDetectionImporter(Importer): | ||
META_FILE = YoloDetectionPath.META_FILE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is the pain. Please find a way to redefine the default config file name. We always should be able to treat it as a configuration parameter. If it isn't specified, I would recommend for looking for *.yaml files inside. If only one file is found, you can proceed. If several files with the same name are found, you need to report an error.
Algorithm and recommendations:
- please add a CLI argement to specify config file name
- if you don't have a hint from the command line, please try to find all *.yaml files
- If you find only one yaml file, it is the config
- if you find several yaml files, check that data.yaml exists. If it is the case, use it.
- Otherwise please report an error.
AR
# # classes = 2 | ||
# # train = data/train.txt | ||
# # valid = data/test.txt | ||
# # names = data/obj.names |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove irrelevant comments. Don't copy and past. Try to rethink each line of code.
|
||
@classmethod | ||
def find_sources(cls, path: str) -> List[Dict[str, Any]]: | ||
# Check obj.names first |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you check here obj.names?
I tried to convert coco8 to coco and back. There are some difference in coordinates. In general, it is normal. But could you please try to minimize it? $ wdiff coco8/labels/train/000000000009.txt test3/labels/train/000000000009.txt
45 0.479492 0.688771 0.955609 [-0.5955-] {+0.595500+}
45 0.736516 0.247188 0.498875 0.476417
50 0.637063 0.732938 0.494125 0.510583
45 0.339438 0.418896 0.678875 [-0.7815-] {+0.781500+}
49 0.646836 0.132552 0.118047 [-0.0969375-] {+0.096937+}
49 0.773148 0.129802 [-0.0907344 0.0972292-] {+0.090734 0.097229+}
49 0.668297 0.226906 0.131281 0.146896
49 0.642859 [-0.0792187 0.148063-] {+0.079219+} 0.148062 {+0.148063+} |
Please run locally all linters and check results. You can see how to do that in https://github.com/cvat-ai/datumaro/blob/develop/.github/workflows/linter.yml file |
To convert a YOLO-OrientedBox dataset to other formats, use the following commands: | ||
|
||
```bash | ||
datum convert -if yolo_orientedbox -i <path/to/dataset> -f coco_instances -o <path/to/dataset> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
``` | ||
or | ||
```bash | ||
datum convert -if yolo_detection -i <path/to/dataset> -f coco_instances -o <path/to/dataset> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and with renaming default to one of the proposed names too, it shows like the command is accomplished, but the folder with results is empty
@@ -0,0 +1,674 @@ | |||
GNU GENERAL PUBLIC LICENSE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you need the file here?
|
@ChanBong , linters are filed. Please look at comments from SonarCloud. It reports a huge amount of code duplication. Need to fix that. |
@CodeRabbit review |
@coderabbitai review |
Actions performedReview triggered.
|
WalkthroughThis update introduces comprehensive support for various YOLO formats (detection, segmented, pose, and oriented box) within the Datumaro framework. It includes new converters, extractors, tests, and user documentation. The changes focus on enabling datasets to be exported, imported, and tested in YOLOv8-compatible formats, significantly enhancing Datumaro's dataset handling capabilities. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Converter
participant Extractor
participant YOLOFormatHandler
User->>+Converter: Apply conversion
Converter->>+YOLOFormatHandler: Convert datasets to YOLO format
YOLOFormatHandler-->>-Converter: Converted dataset
Converter-->>-User: Converted dataset
User->>+Extractor: Extract dataset
Extractor->>+YOLOFormatHandler: Handle extraction
YOLOFormatHandler-->>-Extractor: Extracted data
Extractor-->>-User: Extracted data
Poem
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (invoked as PR comments)
Additionally, you can add CodeRabbit Configration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 23
Outside diff range and nitpick comments (2)
site/content/en/docs/formats/yolo_orientedbox.md (1)
88-88
: Consider adding a comma for clarity.In the sentence discussing the normalization of bounding box coordinates, a comma after "image width" might improve readability and prevent potential confusion.
Tools
LanguageTool
[uncategorized] ~88-~88: Possible missing comma found. (AI_HYDRA_LEO_MISSING_COMMA)
Context: ...image widthand
yby
image height`. Internally datumaro processes these oriented bound...site/content/en/docs/user-manual/supported_formats.md (1)
Line range hint
204-204
: Suggesting More Expressive Language in DocumentationConsider using a more engaging and informative phrase instead of "To do this," which might appear too direct and less informative.
- To do this, use `dataset_meta.json`. + To achieve this, refer to the `dataset_meta.json` configuration as follows.
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files ignored due to path filters (1)
tests/assets/yolo_detection_dataset/images/train/1.jpg
is excluded by!**/*.jpg
Files selected for processing (27)
- datumaro/plugins/yolo_detection_format/converter.py (1 hunks)
- datumaro/plugins/yolo_detection_format/extractor.py (1 hunks)
- datumaro/plugins/yolo_detection_format/format.py (1 hunks)
- datumaro/plugins/yolo_detection_format/importer.py (1 hunks)
- datumaro/plugins/yolo_orientedbox_format/converter.py (1 hunks)
- datumaro/plugins/yolo_orientedbox_format/extractor.py (1 hunks)
- datumaro/plugins/yolo_orientedbox_format/format.py (1 hunks)
- datumaro/plugins/yolo_orientedbox_format/importer.py (1 hunks)
- datumaro/plugins/yolo_pose_format/converter.py (1 hunks)
- datumaro/plugins/yolo_pose_format/extractor.py (1 hunks)
- datumaro/plugins/yolo_pose_format/format.py (1 hunks)
- datumaro/plugins/yolo_pose_format/importer.py (1 hunks)
- datumaro/plugins/yolo_segmentation_format/converter.py (1 hunks)
- datumaro/plugins/yolo_segmentation_format/extractor.py (1 hunks)
- datumaro/plugins/yolo_segmentation_format/format.py (1 hunks)
- datumaro/plugins/yolo_segmentation_format/importer.py (1 hunks)
- datumaro/util/os_util.py (1 hunks)
- site/content/en/docs/formats/yolo_detection.md (1 hunks)
- site/content/en/docs/formats/yolo_orientedbox.md (1 hunks)
- site/content/en/docs/formats/yolo_pose.md (1 hunks)
- site/content/en/docs/formats/yolo_segmentation.md (1 hunks)
- site/content/en/docs/user-manual/supported_formats.md (1 hunks)
- tests/assets/yolo_detection_dataset/LICENSE (1 hunks)
- tests/assets/yolo_detection_dataset/README.md (1 hunks)
- tests/assets/yolo_detection_dataset/data.yaml (1 hunks)
- tests/assets/yolo_detection_dataset/labels/train/1.txt (1 hunks)
- tests/test_yolo_detection_format.py (1 hunks)
Files not reviewed due to errors (2)
- datumaro/plugins/yolo_pose_format/converter.py (no review received)
- datumaro/plugins/yolo_segmentation_format/extractor.py (no review received)
Files skipped from review due to trivial changes (6)
- datumaro/plugins/yolo_orientedbox_format/format.py
- site/content/en/docs/formats/yolo_pose.md
- site/content/en/docs/formats/yolo_segmentation.md
- tests/assets/yolo_detection_dataset/LICENSE
- tests/assets/yolo_detection_dataset/README.md
- tests/assets/yolo_detection_dataset/labels/train/1.txt
Additional context used
yamllint
tests/assets/yolo_detection_dataset/data.yaml
[error] 19-19: no new line character at the end of file (new-line-at-end-of-file)
LanguageTool
site/content/en/docs/formats/yolo_orientedbox.md
[uncategorized] ~88-~88: Possible missing comma found. (AI_HYDRA_LEO_MISSING_COMMA)
Context: ...image widthand
yby
image height`. Internally datumaro processes these oriented bound...site/content/en/docs/user-manual/supported_formats.md
[grammar] ~58-~58: This phrase is duplicated. You should probably use “Dataset example” only once. (PHRASE_REPETITION)
Context: ...et (classification
,detection
) - Dataset example - [Dataset example (txt for classification)](https://githu...
[style] ~204-~204: Consider a more expressive alternative. (DO_ACHIEVE)
Context: ...that are not original to the format. To do this, usedataset_meta.json
. ```json...
Additional comments not posted (8)
site/content/en/docs/formats/yolo_detection.md (1)
1-178
: Ensure documentation completeness and clarity.
- Consistency: Ensure that the documentation style and format are consistent with other format documentation in Datumaro.
- Error Handling: Include common errors and troubleshooting steps to help users resolve issues they might encounter while using the YOLO-Detection format.
- Examples: The examples provided should be tested to ensure they work as expected, especially commands and code snippets.
datumaro/plugins/yolo_detection_format/converter.py (2)
159-181
: Optimize annotation export logic.The method
_export_item_annotation
processes each item's annotations. Consider optimizing by pre-calculating values that don't change within the loop, such as image width and height.- for bbox in item.annotations: + height, width = item.media.size + for bbox in item.annotations:Likely invalid or redundant comment.
94-158
: Verify dataset integrity and error handling.The
apply
method orchestrates the dataset conversion process. It's crucial to ensure that all expected errors are handled gracefully and that the dataset's integrity is verified before proceeding with the conversion.Verification successful
Verify dataset integrity and error handling.
The
apply
method indatumaro/plugins/yolo_detection_format/converter.py
includes several error-handling mechanisms and dataset integrity checks:
- Dataset Integrity Check: The method
_check_dataset()
is called within a try-except block to ensure the dataset's integrity. If aDatumaroError
is raised, it is handled by the error policy.- Error Handling:
- Raises
MediaTypeError
if the media type is not an image.- Raises
DatasetExportError
if the subset name is reserved.- Handles exceptions in
_export_item_annotation
method and reports item errors.The code appears to handle errors gracefully and checks dataset integrity before proceeding with the conversion.
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify dataset integrity and handling in the apply method. # Test: Check if all necessary checks and error handlings are implemented. rg --type python 'apply' datumaro/plugins/yolo_detection_format/converter.pyLength of output: 107
Script:
#!/bin/bash # Description: Verify dataset integrity and handling in the apply method. # Test: Check if all necessary checks and error handling are implemented. rg 'def apply' -A 50 datumaro/plugins/yolo_detection_format/converter.pyLength of output: 1976
Script:
#!/bin/bash # Description: Verify dataset integrity and handling in the apply method. # Test: Check if all necessary checks and error handling are implemented. rg 'def apply' -A 100 datumaro/plugins/yolo_detection_format/converter.pyLength of output: 4134
datumaro/plugins/yolo_orientedbox_format/converter.py (2)
171-191
: Optimize annotation export logic for oriented boxes.The method
_export_item_annotation
processes each item's annotations specifically for oriented boxes. Consider optimizing by pre-calculating values that don't change within the loop, such as image width and height.- for bbox in item.annotations: + height, width = item.media.size + for bbox in item.annotations:Likely invalid or redundant comment.
106-170
: Verify dataset integrity and error handling in theapply
method.Ensure that all expected errors are handled gracefully and that the dataset's integrity is verified before proceeding with the conversion.
Verification successful
The
apply
method indatumaro/plugins/yolo_orientedbox_format/converter.py
includes integrity checks and error handling mechanisms.The following error handling and dataset integrity checks are present:
raise MediaTypeError("Media type is not an image")
self._check_dataset()
self._ctx.error_policy.fail(e)
raise DatasetExportError(...)
self._ctx.error_policy.report_item_error(e, item_id=(item.id, item.subset))
These cover various aspects such as media type validation, dataset checking, and handling exceptions during the export process.
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify dataset integrity and handling in the apply method. # Test: Check if all necessary checks and error handlings are implemented. rg --type python 'apply' datumaro/plugins/yolo_orientedbox_format/converter.pyLength of output: 109
Script:
#!/bin/bash # Description: Verify dataset integrity and error handling in the apply method. # Fixing the command to search within the file for error handling and dataset integrity checks. rg 'def apply' -A 100 datumaro/plugins/yolo_orientedbox_format/converter.py | rg 'raise|fail|error|check|verify'Length of output: 384
datumaro/plugins/yolo_orientedbox_format/extractor.py (2)
45-65
: Review Initialization ofYoloOrientedboxExtractor
The constructor of
YoloOrientedboxExtractor
correctly initializes various instance variables and performs essential checks such as ensuring the provided config path is a directory and that URLs are provided. These checks are crucial for ensuring that the extractor is set up with valid configurations.
103-123
: Error Handling in Data IterationThe implementation of the
__iter__
method inYoloOrientedboxExtractor
includes robust error handling. By using a progress reporter and handling exceptions for each item, the method ensures that errors in individual items do not halt the entire import process. This approach maintains the integrity of the import operation while providing detailed error reporting.site/content/en/docs/user-manual/supported_formats.md (1)
164-180
: Documentation for YOLO FormatsThe addition of YOLO formats (detection, segmentation, pose, oriented box) to the supported formats documentation is clear and well-structured. Each format type is linked to its specification, example, and documentation, providing a comprehensive resource for users.
[APROVED]
def _parse_annotations( | ||
self, anno_path: str, image: Image, *, item_id: Tuple[str, str] | ||
) -> List[Annotation]: | ||
lines = [] | ||
with open(anno_path, "r", encoding="utf-8") as f: | ||
for line in f: | ||
line = line.strip() | ||
if line: | ||
lines.append(line) | ||
|
||
annotations = [] | ||
|
||
if lines: | ||
# Use image info as late as possible to avoid unnecessary image loading | ||
if image.size is None: | ||
raise DatasetImportError(f"Can't find image info for '{self.localize_path(image.path)}'" | ||
) | ||
image_height, image_width = image.size | ||
|
||
for idx, line in enumerate(lines): | ||
try: | ||
parts = line.split() | ||
if len(parts) != 9: | ||
raise InvalidAnnotationError( | ||
f"Unexpected field count {len(parts)} in the oriented bbox description. " | ||
"Expected 9 fields (label, x1, y1, x2, y2, x3, y3, x4, y4)." | ||
) | ||
label_id, x1, y1, x2, y2, x3, y3, x4, y4 = parts | ||
|
||
label_id = self._parse_field(label_id, int, "oriented bbox label id") | ||
if label_id not in self._categories[AnnotationType.label]: | ||
raise UndeclaredLabelError(str(label_id)) | ||
|
||
x1 = self._parse_field(x1, float, "oriented bbox x1") | ||
y1 = self._parse_field(y1, float, "oriented bbox y1") | ||
x2 = self._parse_field(x2, float, "oriented bbox x2") | ||
y2 = self._parse_field(y2, float, "oriented bbox y2") | ||
x3 = self._parse_field(x3, float, "oriented bbox x3") | ||
y3 = self._parse_field(y3, float, "oriented bbox y3") | ||
x4 = self._parse_field(x4, float, "oriented bbox x4") | ||
y4 = self._parse_field(y4, float, "oriented bbox y4") | ||
|
||
(x, y), (w, h), r = xyxyxyxy2xywhr(np.array([[[x1 * image_width, y1 * image_height], [x2 * image_width, y2 * image_height], [x3 * image_width, y3 * image_height], [x4 * image_width, y4 * image_height]]])) | ||
|
||
annotations.append( | ||
Bbox( | ||
x, | ||
y, | ||
w, | ||
h, | ||
label=label_id, | ||
id=idx, | ||
group=idx, | ||
attributes={"angle": r}, | ||
) | ||
) | ||
except Exception as e: | ||
self._ctx.error_policy.report_annotation_error(e, item_id=item_id) | ||
|
||
return annotations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Complex Logic in Annotation Parsing
The method _parse_annotations
contains complex logic for parsing oriented bounding box annotations. While the method is comprehensive, it could benefit from further modularization or simplification to enhance readability and maintainability. Consider breaking down this method into smaller, more focused sub-methods, each handling a specific part of the annotation parsing process.
def _load_categories(self, names_path: str) -> LabelCategories: | ||
if has_meta_file(osp.dirname(names_path)): | ||
return LabelCategories.from_iterable(parse_meta_file(osp.dirname(names_path)).keys()) | ||
|
||
label_categories = LabelCategories() | ||
|
||
with open(names_path, "r") as fp: | ||
loaded = yaml.safe_load(fp.read()) | ||
if isinstance(loaded["names"], list): | ||
label_names = loaded["names"] | ||
elif isinstance(loaded["names"], dict): | ||
label_names = list(loaded["names"].values()) | ||
else: | ||
raise DatasetImportError(f"Can't read dataset category file '{names_path}'") | ||
|
||
for label_name in label_names: | ||
label_categories.add(label_name) | ||
|
||
return label_categories |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Improvement in Category Loading
The method _load_categories
effectively loads label categories from a YAML file. However, the error handling could be improved by providing more specific messages related to the expected format of the 'names' field in the YAML file. This would help users diagnose issues with their category files more effectively.
- raise DatasetImportError(f"Can't read dataset category file '{names_path}'")
+ raise DatasetImportError(f"Expected 'names' to be a list or a dictionary in the dataset category file '{names_path}', but got {type(loaded['names']).__name__}")
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
def _load_categories(self, names_path: str) -> LabelCategories: | |
if has_meta_file(osp.dirname(names_path)): | |
return LabelCategories.from_iterable(parse_meta_file(osp.dirname(names_path)).keys()) | |
label_categories = LabelCategories() | |
with open(names_path, "r") as fp: | |
loaded = yaml.safe_load(fp.read()) | |
if isinstance(loaded["names"], list): | |
label_names = loaded["names"] | |
elif isinstance(loaded["names"], dict): | |
label_names = list(loaded["names"].values()) | |
else: | |
raise DatasetImportError(f"Can't read dataset category file '{names_path}'") | |
for label_name in label_names: | |
label_categories.add(label_name) | |
return label_categories | |
def _load_categories(self, names_path: str) -> LabelCategories: | |
if has_meta_file(osp.dirname(names_path)): | |
return LabelCategories.from_iterable(parse_meta_file(osp.dirname(names_path)).keys()) | |
label_categories = LabelCategories() | |
with open(names_path, "r") as fp: | |
loaded = yaml.safe_load(fp.read()) | |
if isinstance(loaded["names"], list): | |
label_names = loaded["names"] | |
elif isinstance(loaded["names"], dict): | |
label_names = list(loaded["names"].values()) | |
else: | |
raise DatasetImportError(f"Expected 'names' to be a list or a dictionary in the dataset category file '{names_path}', but got {type(loaded['names']).__name__}") | |
for label_name in label_names: | |
label_categories.add(label_name) | |
return label_categories |
# Classes | ||
names: | ||
0: person | ||
1: bicycle |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a newline at the end of the file.
The YAML file should end with a newline to comply with common UNIX text file conventions and to ensure proper parsing by tools that read the file.
+
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
1: bicycle | |
1: bicycle | |
Tools
yamllint
[error] 19-19: no new line character at the end of file (new-line-at-end-of-file)
def _parse_config(path: str) -> Dict[str, str]: | ||
with open(path, "r") as fp: | ||
loaded = yaml.safe_load(fp.read()) | ||
|
||
if not isinstance(loaded, dict): | ||
raise Exception("Invalid config format") | ||
|
||
return loaded |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ensure robust error handling in YAML parsing.
The method _parse_config
uses yaml.safe_load
which is good for security. However, consider enhancing the error handling to provide more specific feedback about what went wrong if the YAML content is not a dictionary.
- if not isinstance(loaded, dict):
- raise Exception("Invalid config format")
+ if not isinstance(loaded, dict):
+ raise ValueError("Expected the configuration to be a dictionary. Please check the YAML structure.")
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
def _parse_config(path: str) -> Dict[str, str]: | |
with open(path, "r") as fp: | |
loaded = yaml.safe_load(fp.read()) | |
if not isinstance(loaded, dict): | |
raise Exception("Invalid config format") | |
return loaded | |
def _parse_config(path: str) -> Dict[str, str]: | |
with open(path, "r") as fp: | |
loaded = yaml.safe_load(fp.read()) | |
if not isinstance(loaded, dict): | |
raise ValueError("Expected the configuration to be a dictionary. Please check the YAML structure.") | |
return loaded |
def _parse_config(path: str) -> Dict[str, str]: | ||
with open(path, "r", encoding="utf-8") as f: | ||
config_lines = f.readlines() | ||
|
||
config = {} | ||
|
||
for line in config_lines: | ||
match = re.match(r"^\s*(\w+)\s*=\s*(.+)$", line) | ||
if not match: | ||
continue | ||
|
||
key = match.group(1) | ||
value = match.group(2) | ||
config[key] = value | ||
|
||
return config |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refactor configuration parsing to use YAML directly.
The manual parsing of the configuration file line by line using regex is error-prone and hard to maintain. Consider using yaml.safe_load
for parsing, which is already used in other parts of the project.
- config = {}
- for line in config_lines:
- match = re.match(r"^\s*(\w+)\s*=\s*(.+)$", line)
- if not match:
- continue
- key = match.group(1)
- value = match.group(2)
- config[key] = value
- return config
+ with open(path, "r", encoding="utf-8") as f:
+ return yaml.safe_load(f)
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
def _parse_config(path: str) -> Dict[str, str]: | |
with open(path, "r", encoding="utf-8") as f: | |
config_lines = f.readlines() | |
config = {} | |
for line in config_lines: | |
match = re.match(r"^\s*(\w+)\s*=\s*(.+)$", line) | |
if not match: | |
continue | |
key = match.group(1) | |
value = match.group(2) | |
config[key] = value | |
return config | |
def _parse_config(path: str) -> Dict[str, str]: | |
with open(path, "r", encoding="utf-8") as f: | |
return yaml.safe_load(f) |
@classmethod | ||
def patch(cls, dataset, patch, save_dir, **kwargs): | ||
conv = cls(dataset, save_dir=save_dir, **kwargs) | ||
conv._patch = patch | ||
conv.apply() | ||
|
||
for (item_id, subset), status in patch.updated_items.items(): | ||
if status != ItemStatus.removed: | ||
item = patch.data.get(item_id, subset) | ||
else: | ||
item = DatasetItem(item_id, subset=subset) | ||
|
||
if not (status == ItemStatus.removed or not item.media): | ||
continue | ||
|
||
if subset == DEFAULT_SUBSET_NAME: | ||
subset = YoloOrientedboxPath.DEFAULT_SUBSET_NAME | ||
subset_dir = osp.join(save_dir, "obj_%s_data" % subset) | ||
|
||
image_path = osp.join(subset_dir, conv._make_image_filename(item)) | ||
if osp.isfile(image_path): | ||
os.remove(image_path) | ||
|
||
ann_path = osp.join(subset_dir, "%s.txt" % item.id) | ||
if osp.isfile(ann_path): | ||
os.remove(ann_path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Handle file deletion errors in the patch
method.
When removing files in the patch
method, there's currently no error handling. It's good practice to handle potential exceptions that can occur during file deletion.
- os.remove(image_path)
+ try:
+ os.remove(image_path)
+ except OSError as e:
+ log.error(f"Failed to delete image file {image_path}: {e}")
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
@classmethod | |
def patch(cls, dataset, patch, save_dir, **kwargs): | |
conv = cls(dataset, save_dir=save_dir, **kwargs) | |
conv._patch = patch | |
conv.apply() | |
for (item_id, subset), status in patch.updated_items.items(): | |
if status != ItemStatus.removed: | |
item = patch.data.get(item_id, subset) | |
else: | |
item = DatasetItem(item_id, subset=subset) | |
if not (status == ItemStatus.removed or not item.media): | |
continue | |
if subset == DEFAULT_SUBSET_NAME: | |
subset = YoloOrientedboxPath.DEFAULT_SUBSET_NAME | |
subset_dir = osp.join(save_dir, "obj_%s_data" % subset) | |
image_path = osp.join(subset_dir, conv._make_image_filename(item)) | |
if osp.isfile(image_path): | |
os.remove(image_path) | |
ann_path = osp.join(subset_dir, "%s.txt" % item.id) | |
if osp.isfile(ann_path): | |
os.remove(ann_path) | |
if osp.isfile(image_path): | |
try: | |
os.remove(image_path) | |
except OSError as e: | |
log.error(f"Failed to delete image file {image_path}: {e}") | |
ann_path = osp.join(subset_dir, "%s.txt" % item.id) | |
if osp.isfile(ann_path): | |
os.remove(ann_path) |
def _check_dataset(self): | ||
if self._extractor.media_type() and not issubclass(self._extractor.media_type(), Image): | ||
raise MediaTypeError("Media type is not an image") | ||
|
||
subset_names = set(self._extractor.subsets().keys()) | ||
|
||
for subset in subset_names: | ||
if subset not in YoloOrientedboxPath.ALLOWED_SUBSET_NAMES: | ||
raise DatasetExportError( | ||
f"The allowed subset name is in {YoloOrientedboxPath.ALLOWED_SUBSET_NAMES}, " | ||
f'so that subset "{subset}" is not allowed.' | ||
) | ||
|
||
for must_name in YoloOrientedboxPath.MUST_SUBSET_NAMES: | ||
if must_name not in subset_names: | ||
raise DatasetExportError( | ||
f'Subset "{must_name}" is not in {subset_names}, ' | ||
"but Yolov8 requires both of them." | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Improve error messaging for subset name validation.
The error messages in _check_dataset
could be more descriptive and actionable. They currently do not guide the user on how to resolve the issues.
- f"The allowed subset name is in {YoloOrientedboxPath.ALLOWED_SUBSET_NAMES}, "
- f'so that subset "{subset}" is not allowed.'
+ f"Invalid subset name '{subset}'. Allowed subset names are: {YoloOrientedboxPath.ALLOWED_SUBSET_NAMES}. Please rename the subset or adjust the ALLOWED_SUBSET_NAMES in YoloOrientedboxPath."
Committable suggestion was skipped due to low confidence.
def _export_media(self, item: DatasetItem, subset_img_dir: str) -> str: | ||
try: | ||
if not item.media or not (item.media.has_data or item.media.has_size): | ||
raise DatasetExportError( | ||
"Failed to export item '%s': " "item has no image info" % item.id | ||
) | ||
|
||
image_name = self._make_image_filename(item) | ||
image_fpath = osp.join(subset_img_dir, image_name) | ||
|
||
if self._save_media: | ||
self._save_image(item, image_fpath) | ||
|
||
return image_fpath | ||
|
||
except Exception as e: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ensure proper error handling for media export failures.
Similar to the previous file, the error handling for media export failures is broad and catches all exceptions. Handling specific expected errors separately can provide more targeted feedback or recovery options.
- except Exception as e:
+ except (FileNotFoundError, PermissionError) as e:
Committable suggestion was skipped due to low confidence.
def _make_yolo_obbox(img_size, box, angle): | ||
# https://github.com/pjreddie/darknet/blob/master/scripts/voc_label.py | ||
# <x> <y> <width> <height> - values relative to width and height of image | ||
# <x> <y> - are center of rectangle | ||
x = (box[0] + box[2]) / 2 | ||
y = (box[1] + box[3]) / 2 | ||
w = (box[2] - box[0]) | ||
h = (box[3] - box[1]) | ||
|
||
rect = ((x, y), (w, h), angle) | ||
box = cv2.boxPoints(rect) | ||
|
||
for corner in box: | ||
corner[0] = corner[0] / img_size[0] | ||
corner[1] = corner[1] / img_size[1] | ||
|
||
rotated_corners = box.flatten() | ||
|
||
return rotated_corners |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refactor the oriented bounding box calculation to a method inside the class.
The function _make_yolo_obbox
is used extensively within the YoloOrientedboxConverter
class. It would improve encapsulation and maintainability to make this a method of the class.
- def _make_yolo_obbox(img_size, box, angle):
+ class YoloOrientedboxConverter(Converter):
+ def _make_yolo_obbox(self, img_size, box, angle):
Committable suggestion was skipped due to low confidence.
source = Dataset.import_from(DUMMY_DATASET_DIR, format="yolo_detection") | ||
|
||
parsed = pickle.loads(pickle.dumps(source)) # nosec | ||
|
||
compare_datasets_strict(self, source, parsed) | ||
|
||
|
||
class YoloDetectionExtractorTest(TestCase): | ||
def _prepare_dataset(self, path: str) -> Dataset: | ||
dataset = Dataset.from_iterable( | ||
[ | ||
DatasetItem( | ||
id = "a", | ||
subset="train", | ||
media=Image(data=np.ones((8, 8, 3))), | ||
annotations=[Bbox(0, 2, 4, 2, label=0)], | ||
) | ||
], | ||
categories=["test"], | ||
) | ||
dataset.export(path, "yolo_detection", save_images=True) | ||
|
||
return dataset | ||
|
||
@mark_requirement(Requirements.DATUM_GENERAL_REQ) | ||
def test_can_parse(self): | ||
with TestDir() as test_dir: | ||
expected = self._prepare_dataset(test_dir) | ||
|
||
actual = Dataset.import_from(test_dir, "yolo_detection") | ||
compare_datasets(self, expected, actual) | ||
|
||
@mark_requirement(Requirements.DATUM_ERROR_REPORTING) | ||
def test_can_report_invalid_data_file(self): | ||
with TestDir() as test_dir: | ||
with self.assertRaisesRegex(DatasetImportError, f"Can't find data.yaml in {test_dir}"): | ||
YoloDetectionExtractor(test_dir) | ||
|
||
@mark_requirement(Requirements.DATUM_ERROR_REPORTING) | ||
def test_can_report_invalid_ann_line_format(self): | ||
with TestDir() as test_dir: | ||
self._prepare_dataset(test_dir) | ||
with open(osp.join(test_dir, "labels", "train", "a.txt"), "w") as f: | ||
f.write("1 2 3\n") | ||
|
||
with self.assertRaises(AnnotationImportError) as capture: | ||
Dataset.import_from(test_dir, "yolo_detection").init_cache() | ||
self.assertIsInstance(capture.exception.__cause__, InvalidAnnotationError) | ||
self.assertIn("Unexpected field count", str(capture.exception.__cause__)) | ||
|
||
@mark_requirement(Requirements.DATUM_ERROR_REPORTING) | ||
def test_can_report_invalid_label(self): | ||
with TestDir() as test_dir: | ||
self._prepare_dataset(test_dir) | ||
with open(osp.join(test_dir, "labels", "train", "a.txt"), "w") as f: | ||
f.write("10 0.5 0.5 0.5 0.5\n") | ||
|
||
with self.assertRaises(AnnotationImportError) as capture: | ||
Dataset.import_from(test_dir, "yolo_detection").init_cache() | ||
self.assertIsInstance(capture.exception.__cause__, UndeclaredLabelError) | ||
self.assertEqual(capture.exception.__cause__.id, "10") | ||
|
||
@mark_requirement(Requirements.DATUM_ERROR_REPORTING) | ||
def test_can_report_invalid_field_type(self): | ||
for field, field_name in [ | ||
(1, "bbox center x"), | ||
(2, "bbox center y"), | ||
(3, "bbox width"), | ||
(4, "bbox height"), | ||
]: | ||
with self.subTest(field_name=field_name): | ||
with TestDir() as test_dir: | ||
self._prepare_dataset(test_dir) | ||
with open(osp.join(test_dir, "labels", "train", "a.txt"), "w") as f: | ||
values = [0, 0.5, 0.5, 0.5, 0.5] | ||
values[field] = "a" | ||
f.write(" ".join(str(v) for v in values)) | ||
|
||
with self.assertRaises(AnnotationImportError) as capture: | ||
Dataset.import_from(test_dir, "yolo_detection").init_cache() | ||
self.assertIsInstance(capture.exception.__cause__, InvalidAnnotationError) | ||
self.assertIn(field_name, str(capture.exception.__cause__)) | ||
|
||
@mark_requirement(Requirements.DATUM_ERROR_REPORTING) | ||
def test_can_report_missing_image_info(self): | ||
with TestDir() as test_dir: | ||
self._prepare_dataset(test_dir) | ||
os.remove(osp.join(test_dir, "images", "train", "a.jpg")) | ||
|
||
with self.assertRaises(ItemImportError) as capture: | ||
Dataset.import_from(test_dir, "yolo_detection").init_cache() | ||
|
||
@mark_requirement(Requirements.DATUM_ERROR_REPORTING) | ||
def test_can_report_missing_subset_info(self): | ||
with TestDir() as test_dir: | ||
self._prepare_dataset(test_dir) | ||
os.remove(osp.join(test_dir, "train.txt")) | ||
|
||
with self.assertRaisesRegex(InvalidAnnotationError, "subset list file"): | ||
Dataset.import_from(test_dir, "yolo_detection").init_cache() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Enhance test coverage and readability.
- Test Coverage: Ensure that all edge cases, such as handling of unusual or corrupt data files, are covered by the tests.
- Readability and Maintenance: Refactor common test setup tasks into helper methods to reduce redundancy and improve readability. This includes dataset creation, file setup, and assertions.
- Documentation: Add docstrings to each test method to describe what each test aims to verify.
- def test_can_save_and_load(self):
+ def test_can_save_and_load(self):
+ """
+ Test the ability to save a dataset in YOLO detection format and load it back, ensuring data integrity.
+ """
+ source_dataset = self._create_simple_dataset()
+ with TestDir() as test_dir:
+ self._test_save_and_load(source_dataset, test_dir)
+ def _create_simple_dataset(self):
+ return Dataset.from_iterable(
+ [
+ DatasetItem(id=1, subset="train", media=Image(data=np.ones((8, 8, 3))),
+ annotations=[Bbox(0, 2, 4, 2, label=2), Bbox(0, 1, 2, 3, label=4)]),
+ DatasetItem(id=2, subset="valid", media=Image(data=np.ones((8, 8, 3))),
+ annotations=[Bbox(0, 1, 5, 2, label=2), Bbox(0, 2, 3, 2, label=5),
+ Bbox(0, 2, 4, 2, label=6), Bbox(0, 7, 3, 2, label=7)]),
+ ],
+ categories=["label_" + str(i) for i in range(10)],
+ )
+ def _test_save_and_load(self, source_dataset, test_dir):
+ YoloDetectionConverter.convert(source_dataset, test_dir, save_media=True)
+ parsed_dataset = Dataset.import_from(test_dir, "yolo_detection")
+ compare_datasets(self, source_dataset, parsed_dataset)
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
class YoloDetectionConvertertTest(TestCase): | |
@mark_requirement(Requirements.DATUM_GENERAL_REQ) | |
def test_can_save_and_load(self): | |
source_dataset = Dataset.from_iterable( | |
[ | |
DatasetItem( | |
id=1, | |
subset="train", | |
media=Image(data=np.ones((8, 8, 3))), | |
annotations=[ | |
Bbox(0, 2, 4, 2, label=2), | |
Bbox(0, 1, 2, 3, label=4), | |
], | |
), | |
DatasetItem( | |
id=2, | |
subset="valid", | |
media=Image(data=np.ones((8, 8, 3))), | |
annotations=[ | |
Bbox(0, 1, 5, 2, label=2), | |
Bbox(0, 2, 3, 2, label=5), | |
Bbox(0, 2, 4, 2, label=6), | |
Bbox(0, 7, 3, 2, label=7), | |
], | |
), | |
], | |
categories=["label_" + str(i) for i in range(10)], | |
) | |
with TestDir() as test_dir: | |
YoloDetectionConverter.convert(source_dataset, test_dir, save_media=True) | |
parsed_dataset = Dataset.import_from(test_dir, "yolo_detection") | |
compare_datasets(self, source_dataset, parsed_dataset) | |
@mark_requirement(Requirements.DATUM_GENERAL_REQ) | |
def test_can_save_dataset_with_image_info(self): | |
source_dataset = Dataset.from_iterable( | |
[ | |
DatasetItem( | |
id=1, | |
subset="train", | |
media=Image(path="1.jpg", size=(10, 15)), | |
annotations=[ | |
Bbox(0, 2, 4, 1, label=2, id=0), | |
], | |
), | |
], | |
categories=["label_" + str(i) for i in range(10)], | |
) | |
with TestDir() as test_dir: | |
YoloDetectionConverter.convert(source_dataset, test_dir) | |
save_image( | |
osp.join(test_dir, "images", "train", "1.jpg"), np.ones((10, 15, 3)) | |
) # put the image for dataset | |
parsed_dataset = Dataset.import_from(test_dir, "yolo_detection") | |
compare_datasets(self, source_dataset, parsed_dataset) | |
@mark_requirement(Requirements.DATUM_GENERAL_REQ) | |
def test_can_save_dataset_with_cyrillic_and_spaces_in_filename(self): | |
source_dataset = Dataset.from_iterable( | |
[ | |
DatasetItem( | |
id="кириллица c пробелом", | |
subset="train", | |
media=Image(data=np.ones((8, 8, 3))), | |
annotations=[ | |
Bbox(0, 2, 4, 2, label=2), | |
Bbox(0, 1, 2, 3, label=4), | |
], | |
), | |
], | |
categories=["label_" + str(i) for i in range(10)], | |
) | |
with TestDir() as test_dir: | |
YoloDetectionConverter.convert(source_dataset, test_dir, save_media=True) | |
parsed_dataset = Dataset.import_from(test_dir, "yolo_detection") | |
compare_datasets(self, source_dataset, parsed_dataset, require_media=True) | |
@mark_requirement(Requirements.DATUM_GENERAL_REQ) | |
def test_can_save_and_load_image_with_arbitrary_extension(self): | |
source_dataset = Dataset.from_iterable( | |
[ | |
DatasetItem( | |
id="1", | |
subset="train", | |
media=Image(path="1.JPEG", data=np.zeros((4, 3, 3))), | |
annotations=[ | |
Bbox(0, 2, 3, 2, label=2), | |
], | |
), | |
DatasetItem( | |
id="2", | |
subset="valid", | |
media=Image(path="2.bmp", data=np.zeros((3, 4, 3))), | |
annotations=[ | |
Bbox(0, 1, 5, 2, label=2), | |
] | |
), | |
], | |
categories=["label_" + str(i) for i in range(10)], | |
) | |
with TestDir() as test_dir: | |
YoloDetectionConverter.convert(source_dataset, test_dir, save_media=True) | |
parsed_dataset = Dataset.import_from(test_dir, "yolo_detection") | |
compare_datasets(self, source_dataset, parsed_dataset) | |
@mark_requirement(Requirements.DATUM_GENERAL_REQ) | |
def test_can_save_and_load_with_meta_file(self): | |
source_dataset = Dataset.from_iterable( | |
[ | |
DatasetItem( | |
id=1, | |
subset="train", | |
media=Image(data=np.ones((8, 8, 3))), | |
annotations=[ | |
Bbox(0, 2, 4, 2, label=2), | |
Bbox(0, 1, 2, 3, label=4), | |
], | |
), | |
DatasetItem( | |
id=2, | |
subset="valid", | |
media=Image(data=np.ones((8, 8, 3))), | |
annotations=[ | |
Bbox(0, 1, 5, 2, label=2), | |
Bbox(0, 2, 3, 2, label=5), | |
Bbox(0, 2, 4, 2, label=6), | |
Bbox(0, 7, 3, 2, label=7), | |
], | |
), | |
], | |
categories=["label_" + str(i) for i in range(10)], | |
) | |
with TestDir() as test_dir: | |
YoloDetectionConverter.convert(source_dataset, test_dir, save_media=True, save_dataset_meta=True) | |
parsed_dataset = Dataset.import_from(test_dir, "yolo_detection") | |
self.assertTrue(osp.isfile(osp.join(test_dir, "dataset_meta.json"))) | |
compare_datasets(self, source_dataset, parsed_dataset) | |
@mark_requirement(Requirements.DATUM_609) | |
def test_can_save_and_load_without_path_prefix(self): | |
source_dataset = Dataset.from_iterable( | |
[ | |
DatasetItem( | |
id=3, | |
subset="train", | |
media=Image(data=np.ones((8, 8, 3))), | |
annotations=[ | |
Bbox(0, 1, 5, 2, label=1), | |
], | |
), | |
], | |
categories=["a", "b"], | |
) | |
with TestDir() as test_dir: | |
YoloDetectionConverter.convert(source_dataset, test_dir, save_media=True, add_path_prefix=False) | |
parsed_dataset = Dataset.import_from(test_dir, "yolo_detection") | |
with open(osp.join(test_dir, "data.yaml"), "r") as f: | |
lines = f.readlines() | |
self.assertIn("train: train.txt\n", lines) | |
with open(osp.join(test_dir, "train.txt"), "r") as f: | |
lines = f.readlines() | |
self.assertIn("./images/train/3.jpg\n", lines) | |
compare_datasets(self, source_dataset, parsed_dataset) | |
DUMMY_DATASET_DIR = osp.join(osp.dirname(__file__), "assets", "yolo_detection_dataset") | |
class YoloDetectionImporterTest(TestCase): | |
@mark_requirement(Requirements.DATUM_GENERAL_REQ) | |
def test_can_import(self): | |
expected_dataset = Dataset.from_iterable( | |
[ | |
DatasetItem( | |
id=1, | |
subset="train", | |
media=Image(data=np.ones((10, 15, 3))), | |
annotations=[ | |
Bbox(0, 3, 14, 5, label=1), | |
Bbox(7, 0, 7, 4, label=0), | |
], | |
), | |
], | |
categories=["person", "bicycle"], | |
) | |
dataset = Dataset.import_from(DUMMY_DATASET_DIR, "yolo_detection") | |
compare_datasets(self, expected_dataset, dataset) | |
@mark_requirement(Requirements.DATUM_GENERAL_REQ) | |
def test_can_import_with_exif_rotated_images(self): | |
expected_dataset = Dataset.from_iterable( | |
[ | |
DatasetItem( | |
id=1, | |
subset="train", | |
media=Image(data=np.ones((10, 15, 3))), | |
annotations=[ | |
Bbox(0, 3, 14, 5, label=1), | |
Bbox(7, 0, 7, 4, label=0), | |
], | |
), | |
], | |
categories=["person", "bicycle"], | |
) | |
with TestDir() as test_dir: | |
dataset_path = osp.join(test_dir, "dataset") | |
shutil.copytree(DUMMY_DATASET_DIR, dataset_path) | |
# Add exif rotation for image | |
image_path = osp.join(dataset_path, "images", "train", "1.jpg") | |
img = PILImage.open(image_path) | |
exif = img.getexif() | |
exif.update([(296, 3), (282, 28.0), (531, 1), (274, 6), (283, 28.0)]) | |
img.save(image_path, exif=exif) | |
dataset = Dataset.import_from(dataset_path, "yolo_detection") | |
compare_datasets(self, expected_dataset, dataset, require_media=True) | |
@mark_requirement(Requirements.DATUM_673) | |
def test_can_pickle(self): | |
source = Dataset.import_from(DUMMY_DATASET_DIR, format="yolo_detection") | |
parsed = pickle.loads(pickle.dumps(source)) # nosec | |
compare_datasets_strict(self, source, parsed) | |
class YoloDetectionExtractorTest(TestCase): | |
def _prepare_dataset(self, path: str) -> Dataset: | |
dataset = Dataset.from_iterable( | |
[ | |
DatasetItem( | |
id = "a", | |
subset="train", | |
media=Image(data=np.ones((8, 8, 3))), | |
annotations=[Bbox(0, 2, 4, 2, label=0)], | |
) | |
], | |
categories=["test"], | |
) | |
dataset.export(path, "yolo_detection", save_images=True) | |
return dataset | |
@mark_requirement(Requirements.DATUM_GENERAL_REQ) | |
def test_can_parse(self): | |
with TestDir() as test_dir: | |
expected = self._prepare_dataset(test_dir) | |
actual = Dataset.import_from(test_dir, "yolo_detection") | |
compare_datasets(self, expected, actual) | |
@mark_requirement(Requirements.DATUM_ERROR_REPORTING) | |
def test_can_report_invalid_data_file(self): | |
with TestDir() as test_dir: | |
with self.assertRaisesRegex(DatasetImportError, f"Can't find data.yaml in {test_dir}"): | |
YoloDetectionExtractor(test_dir) | |
@mark_requirement(Requirements.DATUM_ERROR_REPORTING) | |
def test_can_report_invalid_ann_line_format(self): | |
with TestDir() as test_dir: | |
self._prepare_dataset(test_dir) | |
with open(osp.join(test_dir, "labels", "train", "a.txt"), "w") as f: | |
f.write("1 2 3\n") | |
with self.assertRaises(AnnotationImportError) as capture: | |
Dataset.import_from(test_dir, "yolo_detection").init_cache() | |
self.assertIsInstance(capture.exception.__cause__, InvalidAnnotationError) | |
self.assertIn("Unexpected field count", str(capture.exception.__cause__)) | |
@mark_requirement(Requirements.DATUM_ERROR_REPORTING) | |
def test_can_report_invalid_label(self): | |
with TestDir() as test_dir: | |
self._prepare_dataset(test_dir) | |
with open(osp.join(test_dir, "labels", "train", "a.txt"), "w") as f: | |
f.write("10 0.5 0.5 0.5 0.5\n") | |
with self.assertRaises(AnnotationImportError) as capture: | |
Dataset.import_from(test_dir, "yolo_detection").init_cache() | |
self.assertIsInstance(capture.exception.__cause__, UndeclaredLabelError) | |
self.assertEqual(capture.exception.__cause__.id, "10") | |
@mark_requirement(Requirements.DATUM_ERROR_REPORTING) | |
def test_can_report_invalid_field_type(self): | |
for field, field_name in [ | |
(1, "bbox center x"), | |
(2, "bbox center y"), | |
(3, "bbox width"), | |
(4, "bbox height"), | |
]: | |
with self.subTest(field_name=field_name): | |
with TestDir() as test_dir: | |
self._prepare_dataset(test_dir) | |
with open(osp.join(test_dir, "labels", "train", "a.txt"), "w") as f: | |
values = [0, 0.5, 0.5, 0.5, 0.5] | |
values[field] = "a" | |
f.write(" ".join(str(v) for v in values)) | |
with self.assertRaises(AnnotationImportError) as capture: | |
Dataset.import_from(test_dir, "yolo_detection").init_cache() | |
self.assertIsInstance(capture.exception.__cause__, InvalidAnnotationError) | |
self.assertIn(field_name, str(capture.exception.__cause__)) | |
@mark_requirement(Requirements.DATUM_ERROR_REPORTING) | |
def test_can_report_missing_image_info(self): | |
with TestDir() as test_dir: | |
self._prepare_dataset(test_dir) | |
os.remove(osp.join(test_dir, "images", "train", "a.jpg")) | |
with self.assertRaises(ItemImportError) as capture: | |
Dataset.import_from(test_dir, "yolo_detection").init_cache() | |
@mark_requirement(Requirements.DATUM_ERROR_REPORTING) | |
def test_can_report_missing_subset_info(self): | |
with TestDir() as test_dir: | |
self._prepare_dataset(test_dir) | |
os.remove(osp.join(test_dir, "train.txt")) | |
with self.assertRaisesRegex(InvalidAnnotationError, "subset list file"): | |
Dataset.import_from(test_dir, "yolo_detection").init_cache() | |
class YoloDetectionConvertertTest(TestCase): | |
@mark_requirement(Requirements.DATUM_GENERAL_REQ) | |
def test_can_save_and_load(self): | |
""" | |
Test the ability to save a dataset in YOLO detection format and load it back, ensuring data integrity. | |
""" | |
source_dataset = self._create_simple_dataset() | |
with TestDir() as test_dir: | |
self._test_save_and_load(source_dataset, test_dir) | |
def _create_simple_dataset(self): | |
return Dataset.from_iterable( | |
[ | |
DatasetItem(id=1, subset="train", media=Image(data=np.ones((8, 8, 3))), | |
annotations=[Bbox(0, 2, 4, 2, label=2), Bbox(0, 1, 2, 3, label=4)]), | |
DatasetItem(id=2, subset="valid", media=Image(data=np.ones((8, 8, 3))), | |
annotations=[Bbox(0, 1, 5, 2, label=2), Bbox(0, 2, 3, 2, label=5), | |
Bbox(0, 2, 4, 2, label=6), Bbox(0, 7, 3, 2, label=7)]), | |
], | |
categories=["label_" + str(i) for i in range(10)], | |
) | |
def _test_save_and_load(self, source_dataset, test_dir): | |
YoloDetectionConverter.convert(source_dataset, test_dir, save_media=True) | |
parsed_dataset = Dataset.import_from(test_dir, "yolo_detection") | |
compare_datasets(self, source_dataset, parsed_dataset) |
Closed in favor of #50 |
Summary
This PR introduces the capability to export dataset in yolov8 format. This is still in WIP, with only a fraction of features implemented.
How to test
Checklist
develop
branchLicense
Feel free to contact the maintainers if that's a concern.
Summary by CodeRabbit
New Features
Documentation
Tests