Sedona arrow udf example #1859

Imbruced · 2025-03-16T16:04:16Z

Did you read the Contributor Guide?

Yes, I have read the Contributor Rules and Contributor Development Guide
No, I haven't read it.

Is this PR related to a ticket?

Yes, and the PR name follows the format [SEDONA-XXX] my subject.
Yes, and the PR name follows the format [GH-XXX] my subject.
No:
- this is a documentation update. The PR name follows the format [DOCS] my subject
- this is a CI update. The PR name follows the format [CI] my subject

What changes were proposed in this PR?

How was this patch tested?

Did this PR include necessary documentation updates?

Yes, I am adding a new API. I am using the current SNAPSHOT version number in vX.Y.Z format.
Yes, I have updated the documentation.
No, this PR does not affect any public API so no need to change the documentation.

Imbruced · 2025-03-16T16:31:57Z

need to adjust it before I ll reopen it again

Imbruced · 2025-03-17T22:06:47Z

need to add docs for this one

paleolimbot · 2025-03-18T19:33:22Z

spark/spark-3.5/src/main/scala/org/apache/spark/sql/udf/SedonaArrowStrategy.scala

+
+    val batchIter = if (batchSize > 0) new BatchIterator(iter, batchSize) else Iterator(iter)
+
+    val columnarBatchIter = new ArrowPythonRunner(


Not a battle for this particular PR, but do we get to choose what the Python function is evaluating on or are we leaning on built-in Spark things such that we are forced to have this be a function of a pandas series? (if it could be a function of, for example, two numpy arrays for points or Arrow buffers more generally, it would open up some options in terms of speed).

I am not super opting this solution as well. I just wanted to unlock the arrow based udf in Sedona. I totally agree that we can do better. Right now based on my internal tests it's 2 times faster than normal udf.

Awesome! At some point my Spark/Scala will be good enough to see if there's any room to improve on that 🙂

I would like to help on that 🙇

Imbruced added 3 commits March 16, 2025 15:34

SEDONA-721 Add Sedona vectorized udf.

9872845

SEDONA-721 Add docs.

340d0f7

SEDONA-721 Add docs.

a989bb4

github-actions bot added sedona-python github-actions sedona-spark labels Mar 16, 2025

Imbruced closed this Mar 16, 2025

Imbruced added 2 commits March 16, 2025 18:58

SEDONA-721 Add docs.

a904beb

SEDONA-721 Add docs.

99f14e8

Imbruced reopened this Mar 16, 2025

Imbruced added 9 commits March 16, 2025 22:31

SEDONA-721 Add docs.

5c7c9b8

SEDONA-721 Add docs.

fdf778f

SEDONA-721 Add docs.

cb48f80

SEDONA-721 Add docs.

5f5b602

SEDONA-721 Add docs.

357a122

SEDONA-721 Add docs.

717226d

SEDONA-721 Add docs.

ace7001

SEDONA-721 Fix converters

bc0de7b

SEDONA-721 Fix converters

463be7c

SEDONA-721 Fix converters

186feae

paleolimbot reviewed Mar 18, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sedona arrow udf example #1859

Sedona arrow udf example #1859

Imbruced commented Mar 16, 2025

Imbruced commented Mar 16, 2025

Imbruced commented Mar 17, 2025

paleolimbot Mar 18, 2025

Imbruced Mar 18, 2025

paleolimbot Mar 18, 2025

Imbruced Mar 18, 2025 •

edited

Loading


		val batchIter = if (batchSize > 0) new BatchIterator(iter, batchSize) else Iterator(iter)

		val columnarBatchIter = new ArrowPythonRunner(

Sedona arrow udf example #1859

Are you sure you want to change the base?

Sedona arrow udf example #1859

Conversation

Imbruced commented Mar 16, 2025

Did you read the Contributor Guide?

Is this PR related to a ticket?

What changes were proposed in this PR?

How was this patch tested?

Did this PR include necessary documentation updates?

Imbruced commented Mar 16, 2025

Imbruced commented Mar 17, 2025

paleolimbot Mar 18, 2025

Choose a reason for hiding this comment

Imbruced Mar 18, 2025

Choose a reason for hiding this comment

paleolimbot Mar 18, 2025

Choose a reason for hiding this comment

Imbruced Mar 18, 2025 • edited Loading

Choose a reason for hiding this comment

Imbruced Mar 18, 2025 •

edited

Loading