Re: Java 8 vs Scala

2015-07-14 Thread Tristan Blakers
We have had excellent results operating on RDDs using Java 8 with Lambdas. It’s slightly more verbose than Scala, but I haven’t found this an issue, and haven’t missed any functionality. The new DataFrame API makes the Spark platform even more language agnostic. Tristan On 15 July 2015 at 06:40,

Re: JavaPairRDD

2015-05-13 Thread Tristan Blakers
You could use a map() operation, but the easiest way is probably to just call values() method on the JavaPairRDD to get a JavaRDD. See this link: https://www.safaribooksonline.com/library/view/learning-spark/9781449359034/ch04.html Tristan On 13 May 2015 at 23:12, Yasemin Kaya wrote: > Hi,

Re: com.esotericsoftware.kryo.KryoException: java.lang.IndexOutOfBoundsException: Index:

2015-05-06 Thread Tristan Blakers
ahab wrote: > >> Thanks Tristan for sharing this. Actually this happens when I am reading >> a csv file of 3.5 GB. >> >> best, >> /Shahab >> >> >> >> On Tue, May 5, 2015 at 9:15 AM, Tristan Blakers >> wrote: >> >>> Hi S

Re: com.esotericsoftware.kryo.KryoException: java.lang.IndexOutOfBoundsException: Index:

2015-05-05 Thread Tristan Blakers
Hi Shahab, I’ve seen exceptions very similar to this (it also manifests as negative array size exception), and I believe it’s a really bug in Kryo. See this thread: http://mail-archives.us.apache.org/mod_mbox/spark-user/201502.mbox/%3ccag02ijuw3oqbi2t8acb5nlrvxso2xmas1qrqd_4fq1tgvvj...@mail.gmail

Re: NegativeArraySizeException when doing joins on skewed data

2015-02-26 Thread Tristan Blakers
iced from browsing the code > is that even when writing to a stream, kryo has an internal buffer of > limited size, which is periodically flushes. Perhaps we can get kryo to > turn off that buffer, or we can at least get it to flush more often.) > > thanks, > Imran > > > On

Re: NegativeArraySizeException when doing joins on skewed data

2015-02-25 Thread Tristan Blakers
I get the same exception simply by doing a large broadcast of about 6GB. Note that I’m broadcasting a small number (~3m) of fat objects. There’s plenty of free RAM. This and related kryo exceptions seem to crop-up whenever an object graph of more than a couple of GB gets passed around. at

Kryo buffer overflows

2015-01-28 Thread Tristan Blakers
A search shows several historical threads for similar Kryo issues, but none seem to have a definitive solution. Currently using Spark 1.2.0. While collecting/broadcasting/grouping moderately sized data sets (~500MB - 1GB), I regularly see exceptions such as the one below. I’ve tried increasing th

Re: Incorrect results when calling collect() ?

2014-12-18 Thread Tristan Blakers
ally?) > > On Thu, Dec 18, 2014 at 10:42 AM, Tristan Blakers > wrote: > > Suspected the same thing, but because the underlying data classes are > > deserialised by Avro I think they have to be mutable as you need to > provide > > the no-args constructor with settable fie

Re: Incorrect results when calling collect() ?

2014-12-18 Thread Tristan Blakers
at 21:25, Sean Owen wrote: > > It sounds a lot like your values are mutable classes and you are > mutating or reusing them somewhere? It might work until you actually > try to materialize them all and find many point to the same object. > > On Thu, Dec 18, 2014 at 10:06 A

Incorrect results when calling collect() ?

2014-12-18 Thread Tristan Blakers
Hi, I’m getting some seemingly invalid results when I collect an RDD. This is happening in both Spark 1.1.0 and 1.2.0, using Java8 on Mac. See the following code snippet: JavaRDD rdd= pairRDD.values(); rdd.foreach( e -> System.out.println ( "RDD Foreach: " + e ) ); rdd.collect().forEach( e -> Sy