Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Export products as dataframe #102

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

ShriramS-NI
Copy link

What does this Pull Request accomplish?

Adds functionality to return a normalized data frame, given a list of products as input.

Why should this Pull Request be merged?

This utility can be useful to get the products data in the form of a Data frame which can be useful for further processing.

What testing has been done?

Unit tests are added

@ShriramS-NI ShriramS-NI force-pushed the users/shriram/feat-products-dataframe-utility branch from 189fddd to 42725e5 Compare March 6, 2025 12:01
@ShriramS-NI ShriramS-NI requested a review from SSSantosh18 March 6, 2025 14:46
@ShriramS-NI ShriramS-NI marked this pull request as ready for review March 6, 2025 15:15
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets not add this unit folder and stick with the existing file layout. That would mean putting your tests in a product folder alongside core

- A new column would be created for unique properties across all products. The property
columns would be named in the format `properties.property_name`.
"""
products_dict_representation = [product.dict() for product in products]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dict has an exclude_none option that would apply to this use since we're just going to drop those columns anyway. If we don't add them to the dict version of the object then that could save some fraction of time to checking which columns are empty. If that is reliable be might be able to drop the dropna call altogether, which is probably a bigger win. Or maybe pandas is smart and can do this efficiently regardless. This SO post shows how to do a quick performance test to see if it makes any difference. It would probably be more significant on models that have more fields like results.

assert not products_dataframe.empty
assert (
products_dataframe.columns.to_list()
== expected_products_dataframe.columns.to_list()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition to verifying the column names match the expected, would you also check the column data types? In particular that the dates are dates and arrays are arrays

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants