Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-48975: Add show_dataset_types option to queryCollections #1157

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

leeskelvin
Copy link
Contributor

@leeskelvin leeskelvin commented Feb 17, 2025

Checklist

  • ran Jenkins
  • added a release note for user-visible changes to doc/changes
  • (if changing dimensions.yaml) make a copy of dimensions.yaml in configs/old_dimensions

Copy link

codecov bot commented Feb 17, 2025

Codecov Report

Attention: Patch coverage is 76.54321% with 19 lines in your changes missing coverage. Please review.

Project coverage is 89.33%. Comparing base (c069fb2) to head (816a1f9).

✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
python/lsst/daf/butler/script/queryCollections.py 71.64% 15 Missing and 4 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1157      +/-   ##
==========================================
- Coverage   89.35%   89.33%   -0.02%     
==========================================
  Files         367      367              
  Lines       49528    49584      +56     
  Branches     6013     6025      +12     
==========================================
+ Hits        44254    44296      +42     
- Misses       3853     3865      +12     
- Partials     1421     1423       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@leeskelvin leeskelvin force-pushed the tickets/DM-48975 branch 9 times, most recently from 22c9f86 to efe52a1 Compare February 18, 2025 13:59
type=click.STRING,
multiple=True,
default=["*_config,*_log,*_metadata,packages"],
callback=lambda ctx, param, value: sum((v.split(",") for v in value), []),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use split_commas (see other examples in this file).

@leeskelvin leeskelvin force-pushed the tickets/DM-48975 branch 4 times, most recently from 6575583 to e9b3c5f Compare February 18, 2025 15:06
Copy link
Member

@timj timj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. A couple of general comments:

  • Listing all the dataset types in the chain and then listing all the dataset types in the collections that are in the chain separately seems like we are presenting the same information twice.
  • In TABLE mode I'm a bit confused by the output.
butler query-collections -t ../pipelines_check/DATA_REPO/ --chains TABLE "*c*"
                 Name                     Type              Children                       Dataset Types            
------------------------------------- ----------- ---------------------------- -------------------------------------
HSC/calib                             CALIBRATION                                                                   
HSC/calib/curated/2013-01-31T00:00:00 RUN                                                                           
HSC/calib/gen2/2013-06-17             RUN                                                                           
HSC/calib/gen2/2013-11-03             RUN                                                                           
HSC/calib/unbounded                   RUN                                                                           
demo_collection                       RUN

but in TREE mode these collections do have dataset types:

butler query-collections -t ../pipelines_check/DATA_REPO/ --chains TREE "*c*"

                 Name                     Type                Dataset Types            
------------------------------------- ----------- -------------------------------------
HSC/calib                             CALIBRATION bfKernel                             
                                                  bias                                 
                                                  camera                               
                                                  dark                                 
                                                  defects                              
                                                  flat                                 
                                                  transmission_atmosphere              
                                                  transmission_filter                  
                                                  transmission_optics                  
                                                  transmission_sensor                  
HSC/calib/curated/2013-01-31T00:00:00 RUN         defects                              
HSC/calib/gen2/2013-06-17             RUN         flat                                 
HSC/calib/gen2/2013-11-03             RUN         bias                                 
                                                  dark                                 
HSC/calib/unbounded                   RUN         bfKernel

show_dataset_types : `bool`
If True, also show the dataset types present within each collection.
exclude_dataset_types : `~collections.abc.Iterable` [ `str` ]
A glob-style comma-separated list of dataset types to exclude.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By this point it isn't comma-separated any more.

show_dataset_types : `bool`
If True, also show the dataset types present within each collection.
exclude_dataset_types : `~collections.abc.Iterable` [ `str` ]
A glob-style comma-separated list of dataset types to exclude.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not comma-separated?

def addCollection(info: CollectionInfo) -> None:
collection_table = Table([[info.name], [info.type.name]], names=["Name", "Type"])
if show_dataset_types:
dataset_types = [""] if not info.dataset_types else info.dataset_types
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this identical code is used in three places (the filtering of dataset_types) please put it in a helper function.

for info_relative in info_relatives:
relative_table = Table([[info_relative]], names=(descriptionCol,))
if show_dataset_types:
cinfo = butler.collections.get_info(info_relative, include_summary=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this code is completely untested.

show_dataset_types : `bool`
If True, also show the dataset types present within each collection.
exclude_dataset_types : `~collections.abc.Iterable` [ `str` ]
A glob-style comma-separated list of dataset types to exclude.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not comma-separated?

collection_type
Same as `queryCollections`
flatten_chains : `bool`
If True, flatten the tree of CHAINED datasets.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
If True, flatten the tree of CHAINED datasets.
If `True`, flatten the tree of CHAINED datasets.

and elsewhere.

def addCollection(info: CollectionInfo) -> None:
collection_table = Table([[info.name], [info.type.name]], names=["Name", "Type"])
if show_dataset_types:
dataset_types = [""] if not info.dataset_types else info.dataset_types
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This branch also untested.

show_dataset_types : `bool`, optional
If True, include the dataset types present within each collection.
exclude_dataset_types : `~collections.abc.Iterable` [ `str` ], optional
A glob-style comma-separated list of dataset types to exclude.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comma-separated?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants