-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dataset cli - add support for schema, round-tripping to yaml #12764
Conversation
Codecov ReportAttention: Patch coverage is ✅ All tests successful. No failed tests found. ❌ Your patch status has failed because the patch coverage (23.44%) is below the target coverage (75.00%). You can increase the patch coverage or adjust the target coverage. 📢 Thoughts on this report? Let us know! |
4ab797e
to
7c5a222
Compare
7c5a222
to
6e0db05
Compare
6a0256b
to
9f2ceb2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
|
||
# Create a temporary file in a secure way | ||
# The file will be automatically deleted when the context manager exits | ||
with tempfile.NamedTemporaryFile(suffix=".yml", delete=False) as temp: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is delete=False?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The delete=False
parameter prevents automatic deletion of the temporary file when the context manager closes. This is necessary because we need the file to persist after the with
block to compare it with the original, display any differences, and potentially copy it back. We're still ensuring proper cleanup by manually deleting the file in the finally
block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't get it - the finally block with the temp_path.unlink in inside the with
block. Even with the current impl, the temp file does not last beyond the with
block. If we set delete = True, we could remove the try...finally entirely
dataset.to_yaml(temp_path) | ||
|
||
# Compare the files | ||
import filecmp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
imports should be at the top
description: Union[None, str] = None | ||
doc: Union[None, str] = None # doc is an alias for description |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this useful - seems like it's just extra work to keep these in sync?
exclude.add("nativeDataType") | ||
|
||
# if the id is the same as the urn's fieldPath, exclude id from the output | ||
from datahub.metadata.urns import SchemaFieldUrn |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
imports at the topc
raise ValueError(f"Type {input_type} is not a valid primitive type") | ||
|
||
def dict(self, **kwargs): | ||
"""Custom dict method for Pydantic v1 to handle YAML serialization properly.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a bit scary
did you manually test this locally with both pydantic v1 and v2?
yield dataset | ||
|
||
def generate_mcp( | ||
def entity_references(self) -> List[str]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can this use the list_urns
method in urn_iter.py?
with open( | ||
existing_dataset.schema_metadata.file | ||
) as schema_fp: | ||
schema_fp_content = schema_fp.read() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as a rule of thumb - when you've got this much indentation, you probably need to refactor something
new_data = self.dict(exclude_none=True, exclude_unset=True, by_alias=True) | ||
|
||
# Set up ruamel.yaml for preserving comments | ||
yaml_handler = YAML(typ="rt") # round-trip mode |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the YamlFileUpdater
class might be useful here
Checklist