Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Truncated paths when unarchiving #39

Open
leostera opened this issue Nov 26, 2022 · 0 comments · May be fixed by #55
Open

Truncated paths when unarchiving #39

leostera opened this issue Nov 26, 2022 · 0 comments · May be fixed by #55

Comments

@leostera
Copy link

Hello folks!

While unpacking an async-tar generated archive, file paths got truncated when they were over 100 chars. This seemed strange since macOS could unpack the archive correctly, and so did tar-rs and other tools.

Full context here: https://discord.com/channels/273534239310479360/1045944060650717204

Here's the header of the archive:
image

I can see there that the file path (3rdparty/https/hex.pm/packages/decimal/_build/default/lib/decimal/consolidated/Elixir.Hex.Solver.Constraint.beam) is complete, but at some point during the read process, it gets lost.

I tried also iterating over the .entries and printing out entry.path, entry.path_bytes, entry.header.path and entry.header.path_bytes, and they all have the truncated file path: 3rdparty/https/hex.pm/packages/decimal/_build/default/lib/decimal/consolidated/Elixir.Hex.Solver.Con.

Thanks @rrbrussell for the help debugging this 👋🏽

@endbr64 endbr64 linked a pull request Dec 24, 2024 that will close this issue
charliermarsh pushed a commit to astral-sh/tokio-tar that referenced this issue Feb 9, 2025
…name truncation (#1)

This is from edera-dev/tokio-tar#3

---

I tracked down this issue in
astral-sh/uv#5450 (comment)

>
https://github.com/edera-dev/tokio-tar/blob/4ee357285b5053e6bfada7f117e530b4da94b74a/src/archive.rs#L317
> 
> ```rust
> if is_recognized_header &&
entry.header().entry_type().is_pax_local_extensions() {
>                 if self.pax_extensions.is_some() {
>                     return Poll::Ready(Some(Err(other(
>                         "two pax extensions entries describing \
>                          the same member",
>                     ))));
>                 }
>                 let mut ef = EntryFields::from(entry);
> let val = ready_err!(Pin::new(&mut ef).poll_read_all(cx));
>                 self.pax_extensions = Some(val);
>                 continue;
>             }
> ```
> 
> if `Pin::new(&mut ef).poll_read_all(cx)` is `Poll::Pending` then
`ready_err!` returns it, so the Pax extension is lost. The same would
apply to a pending poll that occurs while a > longlink or longname is
being prepared. When `poll_next` is called again the next entry header
is parsed.

This PR demonstrates the issue by creating an AsyncRead impl which pends
every second time it is polled.

Commenting out [this
line](https://github.com/RazerM/tokio-tar/blob/15466052f63c47cf47decd4409a9b0e936302773/tests/all.rs#L816)
makes the test pass, because the reader doesn't enter a pending state in
the "wrong" place.

It is probably also the cause of
dignifiedquire/async-tar#39
charliermarsh pushed a commit to astral-sh/tokio-tar that referenced this issue Feb 9, 2025
…name truncation (#1)

This is from edera-dev/tokio-tar#3

---

I tracked down this issue in
astral-sh/uv#5450 (comment)

>
https://github.com/edera-dev/tokio-tar/blob/4ee357285b5053e6bfada7f117e530b4da94b74a/src/archive.rs#L317
> 
> ```rust
> if is_recognized_header &&
entry.header().entry_type().is_pax_local_extensions() {
>                 if self.pax_extensions.is_some() {
>                     return Poll::Ready(Some(Err(other(
>                         "two pax extensions entries describing \
>                          the same member",
>                     ))));
>                 }
>                 let mut ef = EntryFields::from(entry);
> let val = ready_err!(Pin::new(&mut ef).poll_read_all(cx));
>                 self.pax_extensions = Some(val);
>                 continue;
>             }
> ```
> 
> if `Pin::new(&mut ef).poll_read_all(cx)` is `Poll::Pending` then
`ready_err!` returns it, so the Pax extension is lost. The same would
apply to a pending poll that occurs while a > longlink or longname is
being prepared. When `poll_next` is called again the next entry header
is parsed.

This PR demonstrates the issue by creating an AsyncRead impl which pends
every second time it is polled.

Commenting out [this
line](https://github.com/RazerM/tokio-tar/blob/15466052f63c47cf47decd4409a9b0e936302773/tests/all.rs#L816)
makes the test pass, because the reader doesn't enter a pending state in
the "wrong" place.

It is probably also the cause of
dignifiedquire/async-tar#39
charliermarsh added a commit to astral-sh/tokio-tar that referenced this issue Feb 9, 2025
## Summary

Right now, if we hit a pending read while reading an entry, we end up
discarding the data rather than preserving it for the next poll (e.g.,
for a PAX extension). You can also see this reported at
dignifiedquire/async-tar#39.

This PR takes dignifiedquire/async-tar#55, but
applies an additional change as that PR didn't work on its own, in my
testing. Atop dignifiedquire/async-tar#55, we
also store the pending `Entry` to ensure that if we're pending, we don't
advance to the next entry on the next poll.

For more context, see: #1.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant