Add tutorial / how-to about filtering file (not granule) results by name #428

mfisher87 · 2024-01-16T20:25:54Z

Based on #409 and a recent Slack discussion, this is a common need that we should document, and perhaps also explore convenience features to make it easier. @betolink suggested e.g.

files = earthaccess.open(results, regex=["*B01*", "*B02*"])

Let's create a new ticket for such a feature?

andypbarrett · 2024-01-22T20:39:49Z

Playing devils advocate here. I'm wondering if this is a level of abstraction too far. And if a more general use case is filtering on any component of results, not just the data_link. So teaching/suggesting a filtering step along the lines of

filtered_results = [r for r in results if <add_filter_condition_here>]

where the filter condition could be a regex on data links or on some other element, or even a random selection.

Also from a reproducibility point of view, someone might want to save filtered_results as a json or some other file. Abstracting this to open hides the filtering and makes documenting the actual files used more difficult.

mfisher87 · 2024-01-22T23:18:19Z

I see your point! I don't really know how to balance "user-friendliness" / "accessibility" with "just learn/write a bit of Python (e.g. list comprehension filtering) if you want to do this". The latter response can be super valuable for learners if coupled with excellent learning materials and guidance. Or it can be off-putting.

In this case, we're talking about list comprehensions, which are a critical Python skill. Maybe our criteria for what's in scope should include "are we abstracting away a critical Python skill?" as a reason to reject a feature request.

We definitely need to be rejecting some subset of feature requests, but I think we need a conversation about how to handle those rejections kindly. I don't want to create experiences where expectations are unclear and people do work and then feel bad when it's not accepted!

andypbarrett · 2024-01-22T23:38:58Z

@mfisher87 I Agree 100%. We might want to explore passing a regex to the search_data using keyword. I think this is possible. Assuming that pycmr allows it.

jhkennedy · 2024-01-23T00:27:16Z

As I understand it, this is not about allowing regex/pattern-based search, which is already supported in CMR:
https://cmr.earthdata.nasa.gov/search/site/docs/search/api.html#parameter-options

but opening a specific subset of the assets in a granule record -- here's the STAC formatted record for the example in #409:
https://cmr.earthdata.nasa.gov/search/granules.stac?collection_concept_id=C2021957295-LPCLOUD&producer_granule_id=HLS.S30.T01KBU.2023340T220911

So, only wanting to open Band 10, for this record in particular, would mean opening asset data8, but what a user knows it by is "Band 10", which you can only get to via pattern matching the asset title or URL.

While it is a decently simple comprehension to get the URLs, it seems like we should support directly opening and/or downloading a specific asset, or a specific subset of assets, from results. And in that vein, I'd be in favor of filtering based on a pattern.

jhkennedy · 2024-01-23T00:43:27Z

Also from a reproducibility point of view, someone might want to save filtered_results as a json or some other file. Abstracting this to open hides the filtering and makes documenting the actual files used more difficult.

Importantly, this is selecting file URLs out of a record, not filtering down search results, so the filtered_results would be a list of URLs.

I would expect, if you wanted to record for reproducibility, you'd write the resulting search records to disk themselves, with all the metadata that way, not just the URLs (especially since DAACs do delete individual files/records when a scene is reprocessed, for example).

having an .open and .download method requires you to select someway what specific files/assets you want to open/download. I think earthaccess can easily support allowing that and making a best-guess at what asset to load/download if not specified (e.g., all of them?) without putting a list comprehension step between search and access.

github-project-automation bot added this to earthaccess project Jan 16, 2024

github-project-automation bot moved this to 🆕 New in earthaccess project Jan 16, 2024

mfisher87 added the impact: documentation Improvements or additions to documentation label Jan 16, 2024

mfisher87 mentioned this issue Jan 23, 2024

Update README.md #434

Merged

mfisher87 mentioned this issue Feb 29, 2024

Update tutorial / FAQ / troubleshooting / pitfalls doc to clarify filtering behavior #474

Open

betolink mentioned this issue Mar 7, 2024

Seasonal/recurrent searches #488

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add tutorial / how-to about filtering file (not granule) results by name #428

Add tutorial / how-to about filtering file (not granule) results by name #428

mfisher87 commented Jan 16, 2024 •

edited

Loading

andypbarrett commented Jan 22, 2024

mfisher87 commented Jan 22, 2024

andypbarrett commented Jan 22, 2024

jhkennedy commented Jan 23, 2024 •

edited

Loading

jhkennedy commented Jan 23, 2024

Add tutorial / how-to about filtering file (not granule) results by name #428

Add tutorial / how-to about filtering file (not granule) results by name #428

Comments

mfisher87 commented Jan 16, 2024 • edited Loading

andypbarrett commented Jan 22, 2024

mfisher87 commented Jan 22, 2024

andypbarrett commented Jan 22, 2024

jhkennedy commented Jan 23, 2024 • edited Loading

jhkennedy commented Jan 23, 2024

mfisher87 commented Jan 16, 2024 •

edited

Loading

jhkennedy commented Jan 23, 2024 •

edited

Loading