groupByKey() and keys with many values

2015-09-07 Thread kaklakariada
Hi, I already posted this question on the users mailing list (http://apache-spark-user-list.1001560.n3.nabble.com/Using-groupByKey-with-many-values-per-key-td24538.html) but did not get a reply. Maybe this is the correct forum to ask. My problem is, that doing groupByKey().mapToPair() loads all v

Re: groupByKey() and keys with many values

2015-09-07 Thread Sean Owen
That's how it's intended to work; if it's a problem, you probably need to re-design your computation to not use groupByKey. Usually you can do so. On Mon, Sep 7, 2015 at 9:02 AM, kaklakariada wrote: > Hi, > > I already posted this question on the users mailing list > (http://apache-spark-user-lis

Re: [VOTE] Release Apache Spark 1.5.0 (RC3)

2015-09-07 Thread james
add a critical bug https://issues.apache.org/jira/browse/SPARK-10474 (Aggregation failed with unable to acquire memory) -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/VOTE-Release-Apache-Spark-1-5-0-RC3-tp13928p13987.html Sent from the Apache Spark De

Re: groupByKey() and keys with many values

2015-09-07 Thread Antonio Piccolboni
To expand on what Sean said, I would look into replacing groupByKey with reduceByKey. Also take a look at this doc . I happen to have designed a library that was subject to

Re: Code generation for GPU

2015-09-07 Thread lonikar
Hi Reynold, Thanks for responding. I was waiting for this on the spark user group and my own email id since I had not posted this on spark dev. Just saw your reply. 1. I figured the various code generation classes have either *apply* or *eval* method depending on whether it computes something or

Fast Iteration while developing

2015-09-07 Thread Justin Uang
Hi, What is the normal workflow for the core devs? - Do we need to build the assembly jar to be able to run it from the spark repo? - Do you use sbt or maven to do the build? - Is zinc only usuable for maven? I'm asking because the current process I have right now is to do sbt build, which means

Re: Fast Iteration while developing

2015-09-07 Thread Reynold Xin
I usually write a test case for what I want to test, and then run sbt/sbt "~module/test:test-only *MyTestSuite" On Mon, Sep 7, 2015 at 6:02 PM, Justin Uang wrote: > Hi, > > What is the normal workflow for the core devs? > > - Do we need to build the assembly jar to be able to run it from the

Re: groupByKey() and keys with many values

2015-09-07 Thread kaklakariada
Hi Antonio! Thank you very much for your answer! You are right in that in my case the computation could be replaced by a reduceByKey. The thing is that my computation also involves database queries: 1. Fetch key-specific data from database into memory. This is expensive and I only want to do this