-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wrong number of bytes written for parquet dictionary @lib/ParquetOutFile.cpp:402 #122
Comments
Can you try to create a reproducible example? Or share the data frame that triggers this? Thanks! |
The thing is it only fails for me in docker. Minimal example in docker.
but it works for me on laptop. |
upd - reproduced on linux outside docker. ubuntu 22.04.
|
added
and I see So I changed to signed It seems tellp() returned error. Not sure why size is zero. |
Thanks, I can indeed reproduce it with
and then
Seems like a bug that happens on Linux when writing a page that only has missing valeus. |
Thanks. Currently I think this is something in RParquetOutFile::get_size_dictionary that gets dict size wrong. Somehow Rf_xlength returns zero in
|
I compared good dataframe dump with bad.
For bad dataframe it's
So the difference is that this section executed for bad_df
Maybe for bad_df address_id column encoding is set to run length and then later either in |
Hmm, not sure if this is related. Now that I upgraded to
|
That's probably a different thing, can you please open an issue with an example file? Thanks. |
Seems like it is not allowed to have an empty buffer without an underlying container. So now I initialize the buffer with a small static container. For #122.
I am fairly sure that this is now fixed in |
It works, thanks! |
Given this appears to break some common operations, is another patch for CRAN planned in the coming weeks? |
@al-obrien Do you see this as well? My guess was that it should be rare, given that only one use reported it over several months, and that you'd have to be a zero-length dictionary page as the first page to write. But I can definitely do a quick release, let me fix #132 first. |
That's correct, I did see this error come up |
Release happening here: #135. |
Hi,
I encountered weird bug where nanoparquet fails to write dataframe.
I narrowed it down to
File on disk after this operation has only
PAR1
written.Somehow if I take 2 or 1 row it works
If I add character column in front it works
It also works without compression but fails with any compression.
I think it fails in
ParquetOutFile::write_dictionary_page
and it's something about double buffer and not compression specific since it fails with all compression types.It fails for me inside docker image and I can't re-produce on mac laptop.
Default parquet options
I've tried changing parquet options with
option( )
and seems have no effect.I can recompile package from from branch with any additional debug, please let me know where I can put debug prints to see what's going on in ParquetOutFile::write_dictionary_page.
The text was updated successfully, but these errors were encountered: