-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Export products as dataframe #102
base: master
Are you sure you want to change the base?
feat: Export products as dataframe #102
Conversation
189fddd
to
42725e5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets not add this unit folder and stick with the existing file layout. That would mean putting your tests in a product folder alongside core
- A new column would be created for unique properties across all products. The property | ||
columns would be named in the format `properties.property_name`. | ||
""" | ||
products_dict_representation = [product.dict() for product in products] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dict
has an exclude_none
option that would apply to this use since we're just going to drop those columns anyway. If we don't add them to the dict version of the object then that could save some fraction of time to checking which columns are empty. If that is reliable be might be able to drop the dropna
call altogether, which is probably a bigger win. Or maybe pandas is smart and can do this efficiently regardless. This SO post shows how to do a quick performance test to see if it makes any difference. It would probably be more significant on models that have more fields like results.
assert not products_dataframe.empty | ||
assert ( | ||
products_dataframe.columns.to_list() | ||
== expected_products_dataframe.columns.to_list() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In addition to verifying the column names match the expected, would you also check the column data types? In particular that the dates are dates and arrays are arrays
What does this Pull Request accomplish?
Adds functionality to return a normalized data frame, given a list of products as input.
Why should this Pull Request be merged?
This utility can be useful to get the products data in the form of a Data frame which can be useful for further processing.
What testing has been done?
Unit tests are added