Re: unit testing with spark

2014-02-19 Thread Ameet Kini
t 10:57 AM, Ameet Kini wrote: > Thanks, that really helps. > > So that helps me cache the spark context within a suite but not across > suites. The closest I could find to caching across suites is extending > Suites [1] and adding @DoNotDiscover annotations to the nested suites >

Re: unit testing with spark

2014-02-19 Thread Ameet Kini
ing: > > > https://github.com/apache/incubator-spark/blob/master/core/src/test/scala/org/apache/spark/SharedSparkContext.scala?source=cc > > /Heiko > > On 18 Feb 2014, at 22:36, Ameet Kini wrote: > > > I'm writing unit tests with Spark and need some help. > > I&

unit testing with spark

2014-02-18 Thread Ameet Kini
I'm writing unit tests with Spark and need some help. I've already read this helpful article: http://blog.quantifind.com/posts/spark-unit-test/ There are a couple differences in my testing environment versus the blog. 1. I'm using FunSpec instead of FunSuite. So my tests look like class MyTestSp

ghost executor messing up UI's stdout/stderr links

2013-12-30 Thread Ameet Kini
I refreshed my Spark version to the master branch as of this morning, and am noticing some strange behavior with executors and the UI reading executor logs while running a job in what used to be standalone mode (is still now called coarse grained scheduler mode or still standalone mode?). For start

Re: debugging NotSerializableException while using Kryo

2013-12-24 Thread Ameet Kini
; should be supported uniformly regardless of where it serializes, but that's > the state of things as it stands. > > > > On Mon, Dec 23, 2013 at 8:21 AM, Ameet Kini wrote: > >> Thanks Imran. >> >> I tried setting "spark.closure.serializer" to >

Re: debugging NotSerializableException while using Kryo

2013-12-24 Thread Ameet Kini
g at the code Executor.scala > line195, you will at least know what cause the NPE. > We can start from there. > > > > > On Dec 23, 2013, at 10:21 AM, Ameet Kini wrote: > > Thanks Imran. > > I tried setting "spark.closure.serializer" to > "org.apac

Re: debugging NotSerializableException while using Kryo

2013-12-23 Thread Ameet Kini
ybe try to implement your class with serializable... > > > 2013/12/23 Ameet Kini > >> Thanks Imran. >> >> I tried setting "spark.closure.serializer" to >> "org.apache.spark.serializer.KryoSerializer" and now end up seeing >> NullPoint

Re: debugging NotSerializableException while using Kryo

2013-12-23 Thread Ameet Kini
tml) > > that is used to serialize whatever is used by all the fucntions on an RDD, > eg., map, filter, and lookup. Those closures include referenced variables, > like your > TileIdWritable. > > So you need to either change that to use kryo, or make your object > serializable to j

debugging NotSerializableException while using Kryo

2013-12-20 Thread Ameet Kini
I'm getting the below NotSerializableException despite using Kryo to serialize that class (TileIdWritable). The offending line: awtestRdd.lookup(TileIdWritable(200)) Initially I thought Kryo is not being registered properly, so I tried running operations over awtestRDD which force a shuffle (e.g.

Re: examples of map-side join of two hadoop sequence files

2013-10-23 Thread Ameet Kini
your suggestions. Ameet On Mon, Oct 21, 2013 at 2:12 PM, Reynold Xin wrote: > Maybe you can override HadoopRDD's compute method to do that? > > > On Mon, Oct 21, 2013 at 8:16 AM, Ameet Kini wrote: > >> Right, except both my sequence files are large and so

Re: examples of map-side join of two hadoop sequence files

2013-10-21 Thread Ameet Kini
ey)).map { row => > join smallTable.get(row.joinKey) with row itself > } > } > > > > > On Fri, Oct 18, 2013 at 2:22 PM, Ameet Kini wrote: > >> Forgot to add an important point. My sequence files are sorted (they're >> actually Hadoop map files). Since th

Re: examples of map-side join of two hadoop sequence files

2013-10-18 Thread Ameet Kini
Forgot to add an important point. My sequence files are sorted (they're actually Hadoop map files). Since they're sorted, it makes sense to do a fetch at the partition-level of the inner sequence file. Thanks, Ameet On Fri, Oct 18, 2013 at 5:20 PM, Ameet Kini wrote: > > I

examples of map-side join of two hadoop sequence files

2013-10-18 Thread Ameet Kini
I've seen discussions where the suggestion is to do a map-side join, but haven't seen an example yet, and can certainly use one. I have two sequence files where the key is unique within each file, so the join is a one-to-one join, and can hence benefit from a map-side join. However both sequence fi

Re: job reports as KILLED in standalone mode

2013-10-18 Thread Ameet Kini
ILLED sounds most reasonable for normal > termination. > > I've went ahead and created > https://spark-project.atlassian.net/browse/SPARK-937 to fix this. > > > On Fri, Oct 18, 2013 at 7:56 AM, Ameet Kini wrote: > >> Jey, >> >> I don't see a "close()&

Re: job reports as KILLED in standalone mode

2013-10-18 Thread Ameet Kini
at jobs get reported as KILLED even though they run through successfully. Ameet On Thu, Oct 17, 2013 at 5:59 PM, Jey Kottalam wrote: > You can try calling the "close()" method on your SparkContext, which > should allow for a cleaner shutdown. > > On Thu, Oct 17, 2013 at 2:3

job reports as KILLED in standalone mode

2013-10-17 Thread Ameet Kini
I'm using the scala 2.10 branch of Spark in standalone mode, and am seeing the job reports itself as KILLED in the UI with the below message in each of the executors log, even though the job processes correctly and returns the correct result. The job is triggered by a .count on an RDD and the count

executor memory in standalone mode stays at default 512MB

2013-10-17 Thread Ameet Kini
I'm using the scala 2.10 branch of Spark in standalone mode, and am finding that the executor gets started with the default 512M even after setting spark.executor.memory to 6G. This leads to my job getting an OOM. I've tried setting spark.executor.memory both programmatically (using System.setPrope

Re: Saving compressed sequence files

2013-08-29 Thread Ameet Kini
lob/master/core/src/main/scala/spark/PairRDDFunctions.scala#L609 > > You can take a look at how that is done. > > > -- > Reynold Xin, AMPLab, UC Berkeley > http://rxin.org > > > > On Wed, Aug 28, 2013 at 6:56 AM, Ameet Kini wrote: > >> Folks, >> >>

Re: Saving compressed sequence files

2013-08-28 Thread Ameet Kini
Folks, Still stuck on this, so would greatly appreciate any pointers as to how to force Spark to recognize the mapred.output.compression.type hadoop parameter. Thanks, Ameet On Mon, Aug 26, 2013 at 6:09 PM, Ameet Kini wrote: > > I'm trying to use saveAsSequenceFile to output

Saving compressed sequence files

2013-08-26 Thread Ameet Kini
I'm trying to use saveAsSequenceFile to output compressed sequenced files where the "value" in each key,value pair is compressed. In Hadoop, I would set this job configuration parameter: "mapred.output.compression.type=RECORD" for record level compression. Previous posts have suggested that this is

Re: when should I copy object coming out of RDD

2013-08-12 Thread Ameet Kini
e serialization. In fact you might note that you can group > and reduce any kind of object in Spark, not just subclasses of Writable. > > Matei > > On Aug 10, 2013, at 6:20 PM, Ameet Kini wrote: > > > *copy a Writable object if you expect to use the value after the next one

Re: when should I copy object coming out of RDD

2013-08-10 Thread Ameet Kini
d to > allocate another Writable. So as another general rule, just converting the > object from a Writable to a "normal" Java type if you want to keep it > around longer is another way. Really it's take() and collect() that will be > the most confusing. > > Matei >

when should I copy object coming out of RDD

2013-08-09 Thread Ameet Kini
When iterating over a HadoopRDD created using SparkContext.sequenceFile, I noticed that if I don't copy the key as below, every tuple in the RDD has the same value as the last one seen. Clearly the object is being recycled, so if I don't clone the object, I'm in trouble. Say if my sequence files h

Re: eclipse search references not working

2013-07-30 Thread Ameet Kini
3 at 8:46 PM, Jason Dai wrote: > Yes, you can do that using the Scala IDE; it's not perfect though. > > Thanks, > -Jason > > > On Wed, Jul 31, 2013 at 5:18 AM, Ameet Kini wrote: >> >> Have any Eclipse users been able to search for references (i.e., >> Refer

eclipse search references not working

2013-07-30 Thread Ameet Kini
Have any Eclipse users been able to search for references (i.e., References -> Project, or References -> Workspace) on Scala classes in their Spark project ? I have a project Foo that depends on Spark. Both Foo and spark-core are Eclipse projects. Within Foo, I'm able to search for references for