-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Split out avro, parquet, json and csv into individual crates #14951
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pub fn default_table_options(&self) -> TableOptions { | ||
Session::default_table_options(self) | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think default_table_options
and table_options
only need to be part of impl Session
rather than
SessionState
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My only concern is that might be a breaking change for some users, but its probably very minor one in the scope of this PR and other changes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reread the code around it, and seems like the Session
implementation mostly delegates to the actual implementation, which saves bringing it into scope in a ton of locations across the codebase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that since Session
trait is implemented for SessionState
, you should be able to call some_SessionState_struct.default_table_options()
, It seems that you are essentially overriding that function with itself(hence the original suggestion of removing it) unless I am missing something. Please let me know if that's the case!!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I change the function here to:
/// return the TableOptions options with its extensions
pub fn default_table_options(&self) -> TableOptions {
self.default_table_options()
}
clippy tells me its a recursion and the stack blows up when I run tests.
I think the default implementation here messes with with just having the trait implementation delegate to the struct. Of course I can copy it here and make the implementation delegate to it but that seems like unnecessary code duplication.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, this seems like the proper way of doing things. Thanks for clarification!!
pub fn default_table_options(&self) -> TableOptions { | ||
Session::default_table_options(self) | ||
} | ||
|
||
/// Return mutable table options | ||
pub fn table_options_mut(&mut self) -> &mut TableOptions { | ||
&mut self.table_options |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps table_options_mut
should also be a part of Session
@@ -501,7 +501,7 @@ impl FileScanConfig { | |||
(schema, constraints, stats, output_ordering) | |||
} | |||
|
|||
#[cfg_attr(not(feature = "avro"), allow(unused))] // Only used by avro |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is my understanding that this is being removed to avoid dependency on avro.
I am not sure what is better between this and empty feature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
its a pub fn -- I think we should simply deprecate it if it isn't needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is still in use (in avro-source with_projection
function), just not detected
Thank you -- I will look at this shortly |
|
||
#[cfg(test)] | ||
#[cfg(feature = "avro")] | ||
mod tests { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it might be nice (eventually) to move these tests to core_integration somewhere
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I didn't realize that was a thing. Glad to move everything there (in this PR or a different one).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you so much @AdamGS -- given the scope of this PR I can't review it line by line, but I think that is ok. We can get this new structure in and then do some additional cleanup PRs.
I think it makes the code much easier to work with
I'll plan to merge it tomorrow early to try and avoid accumulating conflicts
@@ -501,7 +501,7 @@ impl FileScanConfig { | |||
(schema, constraints, stats, output_ordering) | |||
} | |||
|
|||
#[cfg_attr(not(feature = "avro"), allow(unused))] // Only used by avro |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
its a pub fn -- I think we should simply deprecate it if it isn't needed
Thank you @logan-keede for your thoughts and suggestions |
|
actually - given that the avro feature is completly gone from |
That would be nice.
I am not too sure about this. It is only meant to be used by avro but maybe, some user might also be using this function as it was not behind a feature gate. (again, not sure if it is possible to be used like this). |
After some more thinking - while I couldn't find any use of this function in an active codebase on github, I can see why its useful (or at least nice). I just removed the |
Thanks again @AdamGS and @logan-keede -- this is amazing progress. Something I have thought was important for the last t |
I am merging this one in so it doesn't accumulate conflicts |
Which issue does this PR close?
Part of #14444.
The PR is obviously huge, but because the file are split (keeping tests in core), github doesn't understand some of the changes as moves, even though that's the majority of the change.
I've read through it a couple of times and kept finding small things that can be improved and I'll probably do that again during the weekend.
What changes are included in this PR?
Moving all format-specific functionality into dedicated crates, introducing:
datafusion-datasource-csv
datafusion-datasource-json
datafusion-datasource-avro
datafusion-datasource-parquet
Along the way I managed to move some stuff into
datasource
and tried multiple times to move tests infrastructure, but I think I ended pretty close to where I started.Are these changes tested?
Are there any user-facing changes?
I tried to maintain the current exposed modules and types, and keep them sane in general.
I'm not sure I managed to keep all of them but I think none got significantly more complex to pull in.
There is more public API surface now, as some of the shared code used its location to stay
pub(crate)
or just private (for example - theDecoder
trait and related pieces).As discussed,
avro
is now fully behind a feature flag.I also moved a couple of functions from
SessionState
to session, one tiny step towards that goal.