Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move FileFormat and related pieces to datafusion-datasource #14873

Merged
merged 6 commits into from
Feb 26, 2025

Conversation

AdamGS
Copy link
Contributor

@AdamGS AdamGS commented Feb 25, 2025

Which issue does this PR close?

Part of #14444.

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@AdamGS AdamGS changed the title Move FileFormat and related pieces to datafusion-datasource. Move FileFormat and related pieces to datafusion-datasource Feb 25, 2025
@github-actions github-actions bot added the core Core DataFusion crate label Feb 25, 2025
Comment on lines +471 to +472
/// Coerces the file schema if the table schema uses a view type.
#[cfg(not(target_arch = "wasm32"))]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These functions used to be public but are only used in parquet (especially transform_binary_to_string), so I think it makes sense to put them here in the long-term.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(so they will long term become part of the datasource/parquet crate?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's my plan. I expect most of this module (and json, csv and avro) to move to the new format-specific crates, hopefully without too many leftovers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already ran into one small issue with csv/json when I tried to move the Decoder trait, but I think that once I make that big split it'll be easier to find a home for all the shared things, hopefully without having to change them too much.

pub use datafusion_datasource::file_compression_type;
use datafusion_datasource::file_scan_config::FileScanConfig;
pub use datafusion_datasource::file_format::*;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trying to keep existing use statements working

@alamb alamb mentioned this pull request Feb 25, 2025
10 tasks
/// relevant methods. [FileType] is only used in logical planning and only implements
/// the subset of methods required during logical planning.
#[derive(Debug)]
pub struct DefaultFileType {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is really nice to see this broken up into smaller modules

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me -- thank you @AdamGS

Comment on lines +471 to +472
/// Coerces the file schema if the table schema uses a view type.
#[cfg(not(target_arch = "wasm32"))]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(so they will long term become part of the datasource/parquet crate?)

@alamb alamb merged commit 212f424 into apache:main Feb 26, 2025
24 checks passed
@alamb
Copy link
Contributor

alamb commented Feb 26, 2025

Thanks again @AdamGS

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants