-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move FileSourceConfig
and FileStream
to the new datafusion-datasource
#14838
Conversation
FileSourceConfig
and FileStream
to the new datafusion-datasource
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking very cool!
Thank you @AdamGS
@@ -40,7 +40,7 @@ nested_expressions = ["datafusion-functions-nested"] | |||
# This feature is deprecated. Use the `nested_expressions` feature instead. | |||
array_expressions = ["nested_expressions"] | |||
# Used to enable the avro format | |||
avro = ["apache-avro", "num-traits", "datafusion-common/avro"] | |||
avro = ["apache-avro", "num-traits", "datafusion-common/avro", "datafusion-datasource/avro"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we can remove the avro stuff from datafusion-common 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the next PR will try and do that, but some will always stay even if just for stuff like AvroError
/ParquetError
datafusion/datasource/Cargo.toml
Outdated
@@ -69,6 +71,7 @@ xz2 = { version = "0.1", optional = true, features = ["static"] } | |||
zstd = { version = "0.13", optional = true, default-features = false } | |||
|
|||
[dev-dependencies] | |||
datafusion = { workspace = true, default-features = true } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typically to avoid this type of dependnecy there are two options:
- Rewrite the tests to not use stuff (like SessionContext) that are in the core
- Move the relevant tests somewhere into the core tests -- like
datafusion/core/tests/sql_integration.rs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems like some were unnecessary, and some were already moved into memory.rs
, so I just moved it up under #[cfg(test)]
.
@@ -28,7 +28,7 @@ use std::{error::Error, path::PathBuf}; | |||
/// | |||
/// Expects to be called about like this: | |||
/// | |||
/// `assert_batch_eq!(expected_lines: &[&str], batches: &[RecordBatch])` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this was just a typo
/// ```ignore | ||
/// # use std::sync::Arc; | ||
/// # use arrow::datatypes::{Field, Fields, DataType, Schema}; | ||
/// # use datafusion_datasource::PartitionedFile; | ||
/// # use datafusion_datasource::file_scan_config::FileScanConfig; | ||
/// # use datafusion_execution::object_store::ObjectStoreUrl; | ||
/// # use datafusion::datasource::physical_plan::ArrowSource; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had to ignore this very good rustdoc test because we don't have ArrowSource
here. My plan is to only have that as a temporary thing between PRs, with the fix being either:
- Have some
InMemoryFileSource
here that just getsVec<RecordBatch>
and knows how to return it throughFileSource
(or maybe even theMockSource
I added for tests somewhere in this PR). - Potentially put
ArrowSource
in this crate as the "canonical" source, especially given how small it is at less than 200 lines of actual code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a weird example anyways because it actually refers to ParquetFiles
I took the liberty of pushing a commit to this branch to restore the doc test by mocking out a ParquetSource in 35e6ab7
FileSourceConfig
and FileStream
to the new datafusion-datasource
FileSourceConfig
and FileStream
to the new datafusion-datasource
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @AdamGS -- this is great
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍🏻
woohoo! |
🥳 |
Which issue does this PR close?
datafusion
crate (datafusion/core
) #14444.Rationale for this change
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?