Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dataset cli - add support for schema, round-tripping to yaml #12764

Merged
merged 16 commits into from
Mar 4, 2025

Conversation

chakru-r
Copy link
Collaborator

@chakru-r chakru-r commented Mar 3, 2025

Checklist

  • The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
  • Links to related issues (if applicable)
  • Tests for the changes have been added/updated (if applicable)
  • Docs related to the changes have been added/updated (if applicable). If a new feature has been added a Usage Guide has been added for the same.
  • For any breaking change/potential downtime/deprecation/big changes an entry has been made in Updating DataHub

@github-actions github-actions bot added ingestion PR or Issue related to the ingestion of metadata docs Issues and Improvements to docs smoke_test Contains changes related to smoke tests labels Mar 3, 2025
Copy link

codecov bot commented Mar 3, 2025

Codecov Report

Attention: Patch coverage is 23.44828% with 333 lines in your changes missing coverage. Please review.

✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
...estion/src/datahub/api/entities/dataset/dataset.py 15.38% 286 Missing ⚠️
...-ingestion/src/datahub/cli/specific/dataset_cli.py 55.07% 31 Missing ⚠️
metadata-ingestion/src/datahub/pydantic/compat.py 38.46% 16 Missing ⚠️

❌ Your patch status has failed because the patch coverage (23.44%) is below the target coverage (75.00%). You can increase the patch coverage or adjust the target coverage.

📢 Thoughts on this report? Let us know!

Copy link
Contributor

@shirshanka shirshanka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@datahub-cyborg datahub-cyborg bot added pending-submitter-merge and removed needs-review Label for PRs that need review from a maintainer. labels Mar 3, 2025

# Create a temporary file in a secure way
# The file will be automatically deleted when the context manager exits
with tempfile.NamedTemporaryFile(suffix=".yml", delete=False) as temp:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is delete=False?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The delete=False parameter prevents automatic deletion of the temporary file when the context manager closes. This is necessary because we need the file to persist after the with block to compare it with the original, display any differences, and potentially copy it back. We're still ensuring proper cleanup by manually deleting the file in the finally block.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get it - the finally block with the temp_path.unlink in inside the with block. Even with the current impl, the temp file does not last beyond the with block. If we set delete = True, we could remove the try...finally entirely

dataset.to_yaml(temp_path)

# Compare the files
import filecmp
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

imports should be at the top

description: Union[None, str] = None
doc: Union[None, str] = None # doc is an alias for description
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this useful - seems like it's just extra work to keep these in sync?

exclude.add("nativeDataType")

# if the id is the same as the urn's fieldPath, exclude id from the output
from datahub.metadata.urns import SchemaFieldUrn
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

imports at the topc

raise ValueError(f"Type {input_type} is not a valid primitive type")

def dict(self, **kwargs):
"""Custom dict method for Pydantic v1 to handle YAML serialization properly."""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a bit scary

did you manually test this locally with both pydantic v1 and v2?

yield dataset

def generate_mcp(
def entity_references(self) -> List[str]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this use the list_urns method in urn_iter.py?

with open(
existing_dataset.schema_metadata.file
) as schema_fp:
schema_fp_content = schema_fp.read()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as a rule of thumb - when you've got this much indentation, you probably need to refactor something

new_data = self.dict(exclude_none=True, exclude_unset=True, by_alias=True)

# Set up ruamel.yaml for preserving comments
yaml_handler = YAML(typ="rt") # round-trip mode
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the YamlFileUpdater class might be useful here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs Issues and Improvements to docs ingestion PR or Issue related to the ingestion of metadata pending-submitter-merge smoke_test Contains changes related to smoke tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants