Re: Task not serializable?

2014-03-31 Thread Daniel Liu
Hi I am new to Spark and I encountered this error when I try to map RDD[A] = RDD[Array[Double]] then collect the results. A is a custom class extends Serializable. (Actually it's just a wrapper class which wraps a few variables that are all serializable). I also tried KryoSerializer according

Re: SequenceFileRDDFunctions cannot be used output of spark package

2014-03-31 Thread pradeeps8
Hi Sonal, There are no custom objects in saveRDD, it is of type RDD[(String, String)]. Thanks, Pradeep -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/SequenceFileRDDFunctions-cannot-be-used-output-of-spark-package-tp250p3508.html Sent from the Apache

java.lang.ClassNotFoundException - spark on mesos

2014-03-31 Thread Bharath Bhushan
I am facing different kinds of java.lang.ClassNotFoundException when trying to run spark on mesos. One error has to do with org.apache.spark.executor.MesosExecutorBackend. Another has to do with org.apache.spark.serializer.JavaSerializer. I see other people complaining about similar issues. I

Re: java.lang.ClassNotFoundException - spark on mesos

2014-03-31 Thread Tim St Clair
What versions are you running? There is a known protobuf 2.5 mismatch, depending on your versions. Cheers, Tim - Original Message - From: Bharath Bhushan manku.ti...@outlook.com To: user@spark.apache.org Sent: Monday, March 31, 2014 8:16:19 AM Subject:

yarn.application.classpath in yarn-site.xml

2014-03-31 Thread Dan
Hi, I've just tested spark in yarn mode, but something made me confused. When I *delete* the yarn.application.classpath configuration in yarn-site.xml, the following command works well. *bin/spark-class org.apache.spark.deploy.yarn.Client --jar

Re: java.lang.ClassNotFoundException - spark on mesos

2014-03-31 Thread Bharath Bhushan
I tried 0.9.0 and the latest git tree of spark. For mesos, I tried 0.17.0 and the latest git tree. Thanks On 31-Mar-2014, at 7:24 pm, Tim St Clair tstcl...@redhat.com wrote: What versions are you running? There is a known protobuf 2.5 mismatch, depending on your versions. Cheers,

Best practices: Parallelized write to / read from S3

2014-03-31 Thread Nicholas Chammas
Howdy-doody, I have a single, very large file sitting in S3 that I want to read in with sc.textFile(). What are the best practices for reading in this file as quickly as possible? How do I parallelize the read as much as possible? Similarly, say I have a single, very large RDD sitting in memory

Re: Shouldn't the UNION of SchemaRDDs produce SchemaRDD ?

2014-03-31 Thread Michael Armbrust
* unionAll preserve duplicate v/s union that does not This is true, if you want to eliminate duplicate items you should follow the union with a distinct() * SQL union and unionAll result in same output format i.e. another SQL v/s different RDD types here. * Understand the existing union

Re: groupBy RDD does not have grouping column ?

2014-03-31 Thread Michael Armbrust
This is similar to how SQL works, items in the GROUP BY clause are not included in the output by default. You will need to include 'a in the second parameter list (which is similar to the SELECT clause) as well if you want it included in the output. On Sun, Mar 30, 2014 at 9:52 PM, Manoj Samel

Re: Error in SparkSQL Example

2014-03-31 Thread Michael Armbrust
val people: RDD[Person] // An RDD of case class objects, from the first example. is just a placeholder to avoid cluttering up each example with the same code for creating an RDD. The : RDD[People] is just there to let you know the expected type of the variable 'people'. Perhaps there is a

Re: Error in SparkSQL Example

2014-03-31 Thread Manoj Samel
Hi Michael, Thanks for the clarification. My question is about the error above error: class $iwC needs to be abstract and what does the RDD brings, since I can do the DSL without the people: people: org.apache.spark.rdd.RDD[Person] Thanks, On Mon, Mar 31, 2014 at 9:13 AM, Michael Armbrust

Re: Best practices: Parallelized write to / read from S3

2014-03-31 Thread Aaron Davidson
Note that you may have minSplits set to more than the number of cores in the cluster, and Spark will just run as many as possible at a time. This is better if certain nodes may be slow, for instance. In general, it is not necessarily the case that doubling the number of cores doing IO will double

Re: Best practices: Parallelized write to / read from S3

2014-03-31 Thread Nicholas Chammas
OK sweet. Thanks for walking me through that. I wish this were StackOverflow so I could bestow some nice rep on all you helpful people. On Mon, Mar 31, 2014 at 1:06 PM, Aaron Davidson ilike...@gmail.com wrote: Note that you may have minSplits set to more than the number of cores in the

Re: Calling Spark enthusiasts in NYC

2014-03-31 Thread Martin Goodson
How about London? -- Martin Goodson | VP Data Science (0)20 3397 1240 [image: Inline image 1] On Mon, Mar 31, 2014 at 6:28 PM, Andy Konwinski andykonwin...@gmail.comwrote: Hi folks, We have seen a lot of community growth outside of the Bay Area and we are looking to help spur even

Re: network wordcount example

2014-03-31 Thread Diana Carroll
Not sure what data you are sending in. You could try calling lines.print() instead which should just output everything that comes in on the stream. Just to test that your socket is receiving what you think you are sending. On Mon, Mar 31, 2014 at 12:18 PM, eric perler

Re: Calling Spark enthusiasts in NYC

2014-03-31 Thread Andy Konwinski
Responses about London, Montreal/Toronto, DC, Chicago. Great coverage so far, and keep 'em coming! (still looking for an NYC connection) I'll reply to each of you off-list to coordinate next-steps for setting up a Spark meetup in your home area. Thanks again, this is super exciting. Andy On

Re: Calling Spark enthusiasts in NYC

2014-03-31 Thread Chris Gore
We'd love to see a Spark user group in Los Angeles and connect with others working with it here. Ping me if you're in the LA area and use Spark at your company ( ch...@retentionscience.com ). Chris Retention Science call: 734.272.3099 visit: Site | like: Facebook | follow: Twitter On Mar

Re: java.lang.ClassNotFoundException - spark on mesos

2014-03-31 Thread Tim St Clair
It sounds like the protobuf issue. So FWIW, You might want to try updating the 0.9.0 w/pom mods for mesos protobuf. mesos 0.17.0 protobuf 2.5 Cheers, Tim - Original Message - From: Bharath Bhushan manku.ti...@outlook.com To: user@spark.apache.org Sent: Monday, March 31, 2014

how spark dstream handles congestion?

2014-03-31 Thread Dong Mo
Dear list, I was wondering how Spark handles congestion when the upstream is generating dstreams faster than downstream workers can handle? Thanks -Mo

Re: how spark dstream handles congestion?

2014-03-31 Thread Dong Mo
Thanks -Mo 2014-03-31 13:16 GMT-05:00 Evgeny Shishkin itparan...@gmail.com: On 31 Mar 2014, at 21:05, Dong Mo monted...@gmail.com wrote: Dear list, I was wondering how Spark handles congestion when the upstream is generating dstreams faster than downstream workers can handle? It

Re: Calling Spark enthusiasts in NYC

2014-03-31 Thread Yana Kadiyska
Nicholas, I'm in Boston and would be interested in a Spark group. Not sure if you know this -- there was a meetup that never got off the ground. Anyway, I'd be +1 for attending. Not sure what is involved in organizing. Seems a shame that a city like Boston doesn't have one. On Mon, Mar 31, 2014

Calling Spahk enthusiasts in Boston

2014-03-31 Thread Nicholas Chammas
My fellow Bostonians and New Englanders, We cannot allow New York to beat us to having a banging Spark meetup. Respond to me (and I guess also Andy?) if you are interested. Yana, I'm not sure either what is involved in organizing, but we can figure it out. I didn't know about the meetup that

Re: Calling Spahk enthusiasts in Boston

2014-03-31 Thread Nick Pentreath
I would offer to host one in Cape Town but we're almost certainly the only Spark users in the country apart from perhaps one in Johanmesburg :)— Sent from Mailbox for iPhone On Mon, Mar 31, 2014 at 8:53 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: My fellow Bostonians and New

Re: Calling Spark enthusiasts in NYC

2014-03-31 Thread Jeremy Freeman
Happy to help with an NYC meet up (just emailed Andy). I recently moved to VA, but am back in NYC quite often, and have been turning several computational people at Columbia / NYU / Simons Foundation onto Spark; there'd definitely be interest in those communities. -- Jeremy

Re: Calling Spark enthusiasts in NYC

2014-03-31 Thread Patrick Grinaway
Also in NYC, definitely interested in a spark meetup! Sent from my iPhone On Mar 31, 2014, at 3:07 PM, Jeremy Freeman freeman.jer...@gmail.com wrote: Happy to help with an NYC meet up (just emailed Andy). I recently moved to VA, but am back in NYC quite often, and have been turning several

Re: Calling Spark enthusiasts in NYC

2014-03-31 Thread Denny Lee
If you have any questions on helping to get a Spark Meetup off the ground, please do not hesitate to ping me (denny.g@gmail.com).  I helped jump start the one here in Seattle (and tangentially have been helping the Vancouver and Denver ones as well).  HTH! On March 31, 2014 at 12:35:38

Re: java.lang.ClassNotFoundException - spark on mesos

2014-03-31 Thread Bharath Bhushan
Your suggestion took me past the ClassNotFoundException. I then hit akka.actor.ActorNotFound exception. I patched in PR 568 into my 0.9.0 spark codebase and everything worked. So thanks a lot, Tim. Is there a JIRA/PR for the protobuf issue? Why is it not fixed in the latest git tree? Thanks.

Re: Using ProtoBuf 2.5 for messages with Spark Streaming

2014-03-31 Thread Patrick Wendell
Spark now shades its own protobuf dependency so protobuf 2.4.1 should't be getting pulled in unless you are directly using akka yourself. Are you? Does your project have other dependencies that might be indirectly pulling in protobuf 2.4.1? It would be helpful if you could list all of your

Calling Spark enthusiasts in Austin, TX

2014-03-31 Thread Ognen Duzlevski
In the spirit of everything being bigger and better in TX ;) = if anyone is in Austin and interested in meeting up over Spark - contact me! There seems to be a Spark meetup group in Austin that has never met and my initial email to organize the first gathering was never acknowledged. Ognen On

Re: network wordcount example

2014-03-31 Thread Chris Fregly
@eric- i saw this exact issue recently while working on the KinesisWordCount. are you passing local[2] to your example as the MASTER arg versus just local or local[1]? you need at least 2. it's documented as n1 in the scala source docs - which is easy to mistake for n=1. i just ran the

Re: java.lang.ClassNotFoundException - spark on mesos

2014-03-31 Thread Bharath Bhushan
I was talking about the protobuf version issue as not fixed. I could not find any reference to the problem or the fix. Reg. SPARK-1052, I could pull in the fix into my 0.9.0 tree (from the tar ball on the website) and I see the fix in the latest git. Thanks On 01-Apr-2014, at 3:28 am, deric

Re: Calling Spark enthusiasts in NYC

2014-03-31 Thread Sonal Goyal
Hi Andy, I would be interested in setting up a meetup in Delhi/NCR, India. Can you please let me know how to go about organizing it? Best Regards, Sonal Nube Technologies http://www.nubetech.co http://in.linkedin.com/in/sonalgoyal On Tue, Apr 1, 2014 at 10:04 AM, giive chen

Re: java.lang.ClassNotFoundException - spark on mesos

2014-03-31 Thread Bharath Bhushan
Another problem I noticed is that the current 1.0.0 git tree still gives me the ClassNotFoundException. I see that the SPARK-1052 is already fixed there. I then modified the pom.xml for mesos and protobuf and that still gave the ClassNotFoundException. I also tried modifying pom.xml only for