Skip to content
This repository has been archived by the owner on Jan 20, 2022. It is now read-only.

Apache beam support? #736

Open
azymnis opened this issue Jun 22, 2017 · 5 comments
Open

Apache beam support? #736

azymnis opened this issue Jun 22, 2017 · 5 comments

Comments

@azymnis
Copy link

azymnis commented Jun 22, 2017

What do you all think about adding support for apache beam? If there is enough interest I could start looking into this.

@pankajroark
Copy link
Contributor

pankajroark commented Jun 22, 2017 via email

@johnynek
Copy link
Collaborator

this wouldn't be that hard to do (there are many planners to look at for examples (memory, concurrentmemory, scalding, storm and an old spark one we removed since we never used it).

@azymnis
Copy link
Author

azymnis commented Jun 26, 2017

Ah seems like this is what scio is doing, albeit using a completely different package. At least they are using algebird under the hood for aggregations. Also they explicitly give a shoutout to scalding in the readme (makes sense since spotify has been using scalding)

@sriramkrishnan
Copy link
Collaborator

I was going to ask how it would compare to https://github.com/spotify/scio myself.

How do we want to envision this? Should Summingbird just be a DSL on top of Beam? If so, what would have buy us? Would we completely drop the Scalding and Storm/Heron counterparts? And let Beam take care of the underlying framework (Spark, DataFlow, etc)? FYI the Heron community is discussing support for something like Beam on top of it.

@johnynek
Copy link
Collaborator

Well, scio is like scalding. Summingbird is about streaming intrinsically: there is a notion of time and single events. I think summingbird is, and always should be, about streaming map/reduce.

I think if we could get good performance on top of beam, great. I imagine that work won't/can't be funded to make it as performant and correct as the current code for maybe 1-2 years assuming someone even starts (notice like 4 years ago we were saying similar stuff about spark, which we have never found the time to support).

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants