-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parquet Modular Encryption #3766
Comments
Interesting request! I went through the document and seems to be a software development exercise in parquet2 and arrow2 (when not using pyarrow). Does pyarrow support this already? |
thanks for your interest here! It looks like PyArrow describes the Parquet Modular encryption format - https://arrow.apache.org/docs/dev/python/generated/pyarrow.parquet.encryption.EncryptionConfiguration.html Does your comment mean we'd need to first get arrow2 to support this encryption? If you like the idea, how should I proceed with helping? Thanks! |
Yeap. I am not familiar with it, so take it with a grain. I think we need to support that on parquet2; arrow2 cares how pages are deserialized to arrow2 - I imagine that parquet2 would provide a fallible streaming iterator adapter to decript the pages (prior to decompressing), and decript metadata prior to passing it to thrift so that arrow2 and other consumers do not need to worry about encryption.
I definitely love the idea :) I created jorgecarleitao/parquet2#154 and jorgecarleitao/parquet2#155 about this there. |
Deferring to upstream issue: jorgecarleitao/parquet2#154 👍 |
Duckdb 0.10 has released the parquet encryption feature. With pola.rs, it is really easy to encrypt a parquet file. import polars as pl
import duckdb
df = pl.DataFrame({"name": range(10)})
# Add key
duckdb.sql("""PRAGMA add_parquet_key('key128', '0123456789112345') """)
# encrypt
duckdb.sql("COPY df TO 'test.parquet' ( ENCRYPTION_CONFIG {footer_key: 'key128'})")
#decrypt
df = duckdb.sql("SELECT * FROM read_parquet('test.parquet', encryption_config = {footer_key: 'key128'})").pl() |
Describe your feature request
When turning a Polars DataFrame into a Parquet file, I'd like to be able to pass in some options for modular encryption (https://github.com/apache/parquet-format/blob/master/Encryption.md).
Similarly, when turning a Parquet file into a Polars DataFrame, I'd like to be able to decrypt the Parquet and create the Polars DataFrame in the same function.
This is a good article explaining the benefits and ease-of-use of modular encryption.
The text was updated successfully, but these errors were encountered: