-
Notifications
You must be signed in to change notification settings - Fork 698
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sedona arrow udf example #1859
base: master
Are you sure you want to change the base?
Sedona arrow udf example #1859
Conversation
need to adjust it before I ll reopen it again |
need to add docs for this one |
|
||
val batchIter = if (batchSize > 0) new BatchIterator(iter, batchSize) else Iterator(iter) | ||
|
||
val columnarBatchIter = new ArrowPythonRunner( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a battle for this particular PR, but do we get to choose what the Python function is evaluating on or are we leaning on built-in Spark things such that we are forced to have this be a function of a pandas
series? (if it could be a function of, for example, two numpy arrays for points or Arrow buffers more generally, it would open up some options in terms of speed).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not super opting this solution as well. I just wanted to unlock the arrow based udf in Sedona. I totally agree that we can do better. Right now based on my internal tests it's 2 times faster than normal udf.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome! At some point my Spark/Scala will be good enough to see if there's any room to improve on that 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to help on that 🙇
Did you read the Contributor Guide?
Yes, I have read the Contributor Rules and Contributor Development Guide
No, I haven't read it.
Is this PR related to a ticket?
Yes, and the PR name follows the format
[SEDONA-XXX] my subject
.Yes, and the PR name follows the format
[GH-XXX] my subject
.No:
[DOCS] my subject
[CI] my subject
What changes were proposed in this PR?
How was this patch tested?
Did this PR include necessary documentation updates?
vX.Y.Z
format.