Re: Ready to talk about Spark 2.0?

2015-11-08 Thread Romi Kuntsman
A major release usually means giving up on some API backward compatibility? Can this be used as a chance to merge efforts with Apache Flink ( https://flink.apache.org/) and create the one ultimate open source big data processing system? Spark currently feels like it was made for interactive use

Re: Ready to talk about Spark 2.0?

2015-11-08 Thread Sean Owen
Major releases can change APIs, yes. Although Flink is pretty similar in broad design and goals, the APIs are quite different in particulars. Speaking for myself, I can't imagine merging them, as it would either mean significantly changing Spark APIs, or making Flink use Spark APIs. It would mean

Re: PMML version in MLLib

2015-11-08 Thread Vincenzo Selvaggio
Hi, I confirm the models are exported for PMML version 4.2, in fact you can see in the generated xml PMML xmlns="http://www.dmg.org/PMML-4_2; This is the default version when using https://github.com/jpmml/jpmml-model/tree/1.1.X. I didn't realize the attribute version of the PMML root element

Re: Ready to talk about Spark 2.0?

2015-11-08 Thread Koert Kuipers
romi, unless am i misunderstanding your suggestion you might be interested in projects like the new mahout where they try to abstract out the engine with bindings, so that they can support multiple engines within a single platform. I guess cascading is heading in a similar direction (although no

Re: Ready to talk about Spark 2.0?

2015-11-08 Thread Romi Kuntsman
Hi, thanks for the feedback I'll try to explain better what I meant. First we had RDDs, then we had DataFrames, so could the next step be something like stored procedures over DataFrames? So I define the whole calculation flow, even if it includes any "actions" in between, and the whole thing is

Re: [VOTE] Release Apache Spark 1.5.2 (RC2)

2015-11-08 Thread Sean McNamara
+1 Sean On Nov 3, 2015, at 4:28 PM, Reynold Xin > wrote: Please vote on releasing the following candidate as Apache Spark version 1.5.2. The vote is open until Sat Nov 7, 2015 at 00:00 UTC and passes if a majority of at least 3 +1 PMC votes

Re: Calling stop on StreamingContext locks up

2015-11-08 Thread vonnagy
Hi Ted, Your fix addresses the issue for me. Thanks again for your help and I saw the PR you submitted to Master. Ivan -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Calling-stop-on-StreamingContext-locks-up-tp15063p15073.html Sent from the Apache

Re: Ready to talk about Spark 2.0?

2015-11-08 Thread Romi Kuntsman
Since it seems we do have so much to talk about Spark 2.0, then the answer to the question "ready to talk about spark 2" is yes. But that doesn't mean the development of the 1.x branch is ready to stop or that there shouldn't be a 1.7 release. Regarding what should go into the next major version

Re: Ready to talk about Spark 2.0?

2015-11-08 Thread Mark Hamstra
Yes, that's clearer -- at least to me. But before going any further, let me note that we are already sliding past Sean's opening question of "Should we start talking about Spark 2.0?" to actually start talking about Spark 2.0. I'll try to keep the rest of this post at a higher- or meta-level in

Re: [VOTE] Release Apache Spark 1.5.2 (RC2)

2015-11-08 Thread Ted Yu
Why did you directly jump to spark-streaming-mqtt module ? Can you drop 'spark-streaming-mqtt' and try again ? Not sure why 1.5.0-SNAPSHOT showed up. Were you using RC2 source ? Cheers On Sun, Nov 8, 2015 at 7:28 PM, 欧锐 <494165...@qq.com> wrote: > > build spark-streaming-mqtt_2.10 failed! > >

[build system] emergency restart to temporarily patch a massive java security hole

2015-11-08 Thread shane knapp
hey everyone! i'm about to shut down jenkins to deploy a temporary fix for a massive security hole i found out about late friday: http://foxglovesecurity.com/2015/11/06/what-do-weblogic-websphere-jboss-jenkins-opennms-and-your-application-have-in-common-this-vulnerability/ read the whole thing.

Re: [build system] emergency restart to temporarily patch a massive java security hole

2015-11-08 Thread shane knapp
ok, we're good to go. https://amplab.cs.berkeley.edu/jenkins/cli/ returns a 404, as it should. thanks for your patience... shane On Sun, Nov 8, 2015 at 2:53 PM, shane knapp wrote: > hey everyone! > > i'm about to shut down jenkins to deploy a temporary fix for a massive

??????[VOTE] Release Apache Spark 1.5.2 (RC2)

2015-11-08 Thread ????
build spark-streaming-mqtt_2.10 failed! nohup mvn -X -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -Phive -Phive-thriftserver -DskipTests clean package -rf :spark-streaming-mqtt_2.10 & [DEBUG] org.scala-tools.testing:test-interface:jar:0.5:test [DEBUG]

Re: [VOTE] Release Apache Spark 1.5.2 (RC2)

2015-11-08 Thread Ted Yu
+1 On Sat, Nov 7, 2015 at 4:35 PM, Denny Lee wrote: > +1 > > > On Sat, Nov 7, 2015 at 12:01 PM Mark Hamstra > wrote: > >> +1 >> >> On Tue, Nov 3, 2015 at 3:22 PM, Reynold Xin wrote: >> >>> Please vote on releasing the

Re: [VOTE] Release Apache Spark 1.5.2 (RC2)

2015-11-08 Thread Krishna Sankar
In addition to the wrong entry point, I suspect there is a cache problem as well. I have seen strange errors that disappear completely once the ivy cache is deleted. Cheers On Sun, Nov 8, 2015 at 7:54 PM, Ted Yu wrote: > Why did you directly jump to spark-streaming-mqtt

Re: PMML version in MLLib

2015-11-08 Thread Fazlan Nazeem
Hi Vincenzo/Owen, I have sent a pull request[1] with necessary changes to add the pmml version attribute to the root node. I have also linked the issue under the PMML improvement umbrella[2] as you suggested. [1] https://github.com/apache/spark/pull/9558 [2]

Re: OLAP query using spark dataframe with cassandra

2015-11-08 Thread Jörn Franke
Is there any distributor supporting these software components in combination? If no and your core business is not software then you may want to look for something else, because it might not make sense to build up internal know-how in all of these areas. In any case - it depends all highly on

Re: [VOTE] Release Apache Spark 1.5.2 (RC2)

2015-11-08 Thread Sean Owen
Looks like you are building a module without install-ing other modules. That won't work in general in Maven. Also, it looks like you are building a snapshot, not the release we are talking about. On Mon, Nov 9, 2015 at 3:28 AM, 欧锐 <494165...@qq.com> wrote: > > build spark-streaming-mqtt_2.10

Re: Re: OLAP query using spark dataframe with cassandra

2015-11-08 Thread fightf...@163.com
Hi, Thanks for suggesting. Actually we are now evaluating and stressing the spark sql on cassandra, while trying to define business models. FWIW, the solution mentioned here is different from traditional OLAP cube engine, right ? So we are hesitating on the common sense or direction choice

Re: [VOTE] Release Apache Spark 1.5.2 (RC2)

2015-11-08 Thread Reynold Xin
Thanks everybody for voting. I'm going to close the vote now. The vote passes with 14 +1 votes and no -1 vote. I will work on packaging this asap. +1: Jean-Baptiste Onofré Egor Pahomov Luc Bourlier Tom Graves* Chester Chen Michael Armbrust* Krishna Sankar Robin East Reynold Xin* Joseph Bradley

OLAP query using spark dataframe with cassandra

2015-11-08 Thread fightf...@163.com
Hi, community We are specially interested about this featural integration according to some slides from [1]. The SMACK(Spark+Mesos+Akka+Cassandra+Kafka) seems good implementation for lambda architecure in the open-source world, especially non-hadoop based cluster environment. As we can see,

Wrap an RDD with a ShuffledRDD

2015-11-08 Thread Muhammad Haseeb Javed
I am working on a modified Spark core and have a Broadcast variable which I deserialize to obtain an RDD along with its set of dependencies, as is done in ShuffleMapTask, as following: val taskBinary: Broadcast[Array[Byte]]var (rdd, dep) = ser.deserialize[(RDD[_], ShuffleDependency[_, _, _])](