Re: Ready to talk about Spark 2.0?

2015-11-08 Thread Romi Kuntsman
Since it seems we do have so much to talk about Spark 2.0, then the answer to the question "ready to talk about spark 2" is yes. But that doesn't mean the development of the 1.x branch is ready to stop or that there shouldn't be a 1.7 release. Regarding what should go into the next major version -

Re: Ready to talk about Spark 2.0?

2015-11-08 Thread Mark Hamstra
Yes, that's clearer -- at least to me. But before going any further, let me note that we are already sliding past Sean's opening question of "Should we start talking about Spark 2.0?" to actually start talking about Spark 2.0. I'll try to keep the rest of this post at a higher- or meta-level in o

Re: Ready to talk about Spark 2.0?

2015-11-08 Thread Romi Kuntsman
Hi, thanks for the feedback I'll try to explain better what I meant. First we had RDDs, then we had DataFrames, so could the next step be something like stored procedures over DataFrames? So I define the whole calculation flow, even if it includes any "actions" in between, and the whole thing is p

Re: Ready to talk about Spark 2.0?

2015-11-08 Thread Koert Kuipers
romi, unless am i misunderstanding your suggestion you might be interested in projects like the new mahout where they try to abstract out the engine with bindings, so that they can support multiple engines within a single platform. I guess cascading is heading in a similar direction (although no sp

Re: Ready to talk about Spark 2.0?

2015-11-08 Thread Sean Owen
Major releases can change APIs, yes. Although Flink is pretty similar in broad design and goals, the APIs are quite different in particulars. Speaking for myself, I can't imagine merging them, as it would either mean significantly changing Spark APIs, or making Flink use Spark APIs. It would mean e

Re: Ready to talk about Spark 2.0?

2015-11-08 Thread Romi Kuntsman
A major release usually means giving up on some API backward compatibility? Can this be used as a chance to merge efforts with Apache Flink ( https://flink.apache.org/) and create the one ultimate open source big data processing system? Spark currently feels like it was made for interactive use (li

Re: Ready to talk about Spark 2.0?

2015-11-06 Thread Jean-Baptiste Onofré
Hi Sean, Happy to see this discussion. I'm working on PoC to run Camel on Spark Streaming. The purpose is to have an ingestion and integration platform directly running on Spark Streaming. Basically, we would be able to use a Camel Spark DSL like: from("jms:queue:foo").choice().when(predica