Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve content scanning #149

Open
wants to merge 16 commits into
base: master
Choose a base branch
from

Conversation

Abestanis
Copy link
Collaborator

This includes #145 and is based on top of #148.

This makes content scanning more robust and prevents failing the entire content search because one item fails by catching exceptions on an item level and reporting them, which implements the first checkbox on #127.

I think it is valuable to report these as non-fatal errors, so we can see if we are too strict and need to make a content attribute nullable.

@Abestanis Abestanis added the enhancement New feature or request label Jul 21, 2024
@Abestanis Abestanis requested a review from nt4f04uNd July 21, 2024 11:14
@Abestanis Abestanis marked this pull request as draft July 21, 2024 11:14
@Abestanis Abestanis force-pushed the feature/improve_content_scanning branch 2 times, most recently from 96b3628 to 3e31748 Compare July 21, 2024 11:39
@Abestanis Abestanis force-pushed the feature/improve_content_scanning branch 2 times, most recently from cff81ae to 75de00c Compare October 14, 2024 16:21
@Abestanis Abestanis marked this pull request as ready for review October 15, 2024 10:40
@Abestanis Abestanis force-pushed the feature/improve_content_scanning branch from 4ae4508 to 2e33f71 Compare October 19, 2024 18:31
Copy link
Owner

@nt4f04uNd nt4f04uNd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General comment: the solution is fine, but tests are hard to read. If we could make it a bit more simple, or more readable, that would be great

albumWith(id: 19, lastYear: 2000, firstYear: 3000),
),
];
final propertiesThatCanBeMissing = ['albumArt', 'artistId', 'firstYear', 'lastYear', 'numberOfSongs'];
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we add a new property, this will likely not be updated

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way I see it, that's not going to be a problem. The tests verify the current behaviour, any change will (and should) break them. If we add a nullable property, the subSets function below will generate albums maps where that property is null/missing, so the test will fail when we expect parsing of these to fail but they get accepted.

test/logic/models/content_test.dart Outdated Show resolved Hide resolved
...validSong.copyWithout(propertiesThatCanBeMissing).subSets(),
...validSong
.withWrongTypes()
.whereNot((map) => propertiesThatCanBeMissing.any((property) => map[property] == null))
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does it do?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally it's hard to understand what is the output of these function calls, how they work, and why we use them

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally it's hard to understand what is the output of these function calls, how they work, and why we use them

I agree.

The point of this is to automatically create all possible bad variations of inputs to the parse function.
There is two types of problems that this covers:

  • Missing data: The subSets function gives us variations of the validSong map (without the values that are allowed to be null) with one or more missing elements. We generate all possible subsets instead of just checking if it's ok if individual elements are missing because sometimes we fall back to another element in that case (like Song.dateModified falling back to Song.dateAdded).
  • Invalid data types: The withWrongTypes gives us variations of the validSong map where each element of the map is a different type from the expected one. For example, the title property has a type String, so withWrongTypes will generate one variation where the title is null, one where the title is true, one where it's 0 and so on. We only generate variations where one of the elements is changed and don't bother with variations where multiple element types have changed because there would be a large amount of variations and because I don't believe it is very relevant.

I agree that this is not very easy to parse from the code, but I'm not quite sure how to improve the situation other than adding comments and maybe changing the name of withWrongTypes and subSets. Do you have any recomendations?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added some helper extension methods to improve the clarity of the tests.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think splitting the tests into the valid and invalid cases improves clarity a lot.

test/logic/models/content_test.dart Outdated Show resolved Hide resolved
];
final propertiesThatCanBeMissing = ['albumArt', 'artistId', 'firstYear', 'lastYear', 'numberOfSongs'];
final invalidAlbums = [
...validAlbum.copyWithout(propertiesThatCanBeMissing).subSets(),
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the point of generating a model, then removing by hand all of its properties?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

subSets generates all possible variations of the map where one or more elements are missing to generate a list of invalid maps that can't be parsed into a valid album. The copyWithout removes the elements of the map that are allowed to be missing and would therefore successfully parse.

The reason to not hard code the list of properties is that if we add a new property to Album it will be automatically included here.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The copyWithout removes the elements of the map that are allowed to be missing and would therefore successfully parse.

But if it removes those fields from map, subSets gets the map with those removed fields. How would it know it needs to generate some variations of those non-existent fields, am I missing something here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would it know it needs to generate some variations of those non-existent fields, am I missing something here?

The idea is for it to not generate any variations with the removed fields. We want to generate invalid maps here, but since these field are nullable it's valid for them to be missing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants