Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PNE-241] Data Manager support #943

Closed
wants to merge 28 commits into from
Closed

Conversation

lkubie
Copy link
Contributor

@lkubie lkubie commented Jun 13, 2024

Citrine Python PR

Jira Ticket

PNE-241

Description

WIP draft for supporting data manager.

PR Type:

  • Breaking change (fix or feature that would cause existing functionality to change)
  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Maintenance (non-breaking change to assist developers)

Adherence to team decisions

  • I have added tests for 100% coverage
  • I have written Numpy-style docstrings for every method and class.
  • I have communicated the downstream consequences of the PR to others.
  • I have bumped the version in __version__.py

@anoto-moniz
Copy link
Collaborator

Note about how I PR, since I don't think I've done one for you before.
I'll likely start some comments with "nitpick" or the like. That's a non-blocking comment, which I use to mean "this isn't super important, but I figured I'd flag it, and leave it up to you if you wish to make the change."

Comment on lines 8 to 9
import warnings
warnings.filterwarnings('always', category=DeprecationWarning)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is just here for testing, but just flagging to ensure we remove it before merge.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep! Just testing. They weren't showing up in my notebook otherwise for some reason

)

def test_invalid_collection_construction(invalid_collection):
# assertion is within the construction of the invalid_collection
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixtures should be used to construct data. If the construction is the test, it belongs in the test function, unless there's some very good reason.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. I'll fix this and my other occurrences of a similar pattern

def __init__(self, project_id, session):
DatasetCollection.__init__(self, project_id=project_id, session=session)
def __init__(self, project_id, session, team_id):
DatasetCollection.__init__(self, team_id = team_id, project_id=project_id, session=session)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmmmmmmm...not sure why this was never moved over to super. Mind making that change while you're here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can do!

@lkubie
Copy link
Contributor Author

lkubie commented Jun 21, 2024

Note about how I PR, since I don't think I've done one for you before. I'll likely start some comments with "nitpick" or the like. That's a non-blocking comment, which I use to mean "this isn't super important, but I figured I'd flag it, and leave it up to you if you wish to make the change."

I appreciate it! The more feedback the better. For example, I didn't know that the standard is always is None and that using == None wasn't OK

@@ -58,8 +58,6 @@ def __str__(self):
class ConditionTemplateCollection(AttributeTemplateCollection[ConditionTemplate]):
"""A collection of condition templates."""

_path_template = 'projects/{project_id}/datasets/{dataset_id}/condition-templates'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 , go it, now it is a property on the DataConceptsCollection

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@anoto-moniz here we will need to discuss how to pick the appropriate API endpoint for building the table depending on the desired execution scope (whole team or project), I've sent you a pm to start talking about it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be the same as the rest. Add support for a team_id parameter, and expose it on the TeamCollection, while deprecating the project_id parameter.

dataset_uid=self.uid,
project_id=self.project_id
team_id=self.team_id
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, even the team_id is not mandatory for the dataset's collection both the project and team initialize the collection with the corresponding team_id. This is ok.

query = {
"criteria": [
{
"datasets": str(self.dataset_id),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

datasets is an array

@@ -192,14 +193,44 @@ def __str__(self):
class FileCollection(Collection[FileLink]):
"""Represents the collection of all file links associated with a dataset."""

_path_template = 'projects/{project_id}/datasets/{dataset_id}/files'
_path_template = 'teams/{team_id}/datasets/{dataset_id}/files'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will the projects version of the endpoint be dead?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll kill it after a while, but given that we are using the datasetId we can freely use the teams-based one for both project->dataset and team->dataset scenarios. It is going to return exactly the same data.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For each of these tests, we need to check that the projects version gives a DeprecatedWarning.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be checking that invoking the data related endpoints through ProjectCollection raises a deprecation warning.

@anoto-moniz anoto-moniz changed the title Feature/data manager pne 241 [PNE-241] Data Manager support Jul 10, 2024
@pacdaemon
Copy link
Contributor

This work was already handled on #949

@pacdaemon pacdaemon closed this Jul 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants