Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PNE-203] Fix reading file links on Windows. #950

Merged
merged 4 commits into from
Sep 3, 2024

Conversation

anoto-moniz
Copy link
Collaborator

If FileCollection.read is called with a FileLink which encodes a local path (i.e. starts with file:///), then it's a local file. Due to the differences in path handling on Windows and Unix, we need to explicitly convert the path portion of that URL into a filepath.

PR Type:

  • Breaking change (fix or feature that would cause existing functionality to change)
  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Maintenance (non-breaking change to assist developers)

Adherence to team decisions

  • I have added tests for 100% coverage
  • I have written Numpy-style docstrings for every method and class.
  • I have communicated the downstream consequences of the PR to others.
  • I have bumped the version in __version__.py

If FileCollection.read is called with a FileLink which encodes a local
path (i.e. starts with file:///), then it's a local file. Due to the
differences in path handling on Windows and Unix, we need to explicitly
convert the path portion of that URL into a filepath.
@anoto-moniz
Copy link
Collaborator Author

I tested this by spinning up a Windows VM in VirtualBox and running the snippet here on a variety of Windows-style paths (and a few Linux), and ensuring the output looked right.

@anoto-moniz anoto-moniz marked this pull request as ready for review July 18, 2024 20:26
@anoto-moniz anoto-moniz requested a review from kroenlein July 18, 2024 20:26
Copy link
Collaborator

@kroenlein kroenlein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How thoroughly was this tested? It sounds reasonable, but I'd like to maybe see additional tests added, since there were no additions to the test suite in this PR.

Also, I note that the documentation to url2pathname says

This function uses unquote() to decode path.

which sounds like we might have a double-decode bug in this workflow.

@anoto-moniz
Copy link
Collaborator Author

How thoroughly was this tested? It sounds reasonable, but I'd like to maybe see additional tests added, since there were no additions to the test suite in this PR.

Also, I note that the documentation to url2pathname says

This function uses unquote() to decode path.

which sounds like we might have a double-decode bug in this workflow.

While I'm sure there's a way to test Windows filepaths in a Linux environment, no obvious way revealed itself on Googling, which means it will be a lot more effort than it's worth for this bug, given it was reported months ago but seemingly not caused many of our customers any problems.

Also, we use urlunquote_plus, hence why I left it as is. But yeah, you're right about the double decode, so that would need to be fixed.

I was just looking for a quick win in some downtime, but given this has a bit more to it, I'm gonna put it aside and come back to it later.

path = Path(unquote_plus(url2pathname(urlparse(file_link.url).path)))
parsed_url = urlparse(file_link.url)
if parsed_url.netloc not in {'', '.', 'localhost'}:
raise ValueError("Non-local UNCs (e.g., Windows network paths) are not supported.")
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a new restriction, is it not?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Formally it's a new restriction, but we would have broken on this case before because we were not considering the netloc value.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a side note, I would have fixed this rather than raising an exception, except without a Windows box to experiment on, I don't trust that my implementation will be reliable. Stackoverflow

Copy link
Collaborator

@kroenlein kroenlein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With changes, I this is ready.

@anoto-moniz anoto-moniz merged commit 69d0f1f into main Sep 3, 2024
16 checks passed
@anoto-moniz anoto-moniz deleted the feature/pne-203-file-links-on-windows branch September 3, 2024 20:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants