t 10:57 AM, Ameet Kini wrote:
> Thanks, that really helps.
>
> So that helps me cache the spark context within a suite but not across
> suites. The closest I could find to caching across suites is extending
> Suites [1] and adding @DoNotDiscover annotations to the nested suites
>
ing:
>
>
> https://github.com/apache/incubator-spark/blob/master/core/src/test/scala/org/apache/spark/SharedSparkContext.scala?source=cc
>
> /Heiko
>
> On 18 Feb 2014, at 22:36, Ameet Kini wrote:
>
>
> I'm writing unit tests with Spark and need some help.
>
> I&
I'm writing unit tests with Spark and need some help.
I've already read this helpful article:
http://blog.quantifind.com/posts/spark-unit-test/
There are a couple differences in my testing environment versus the blog.
1. I'm using FunSpec instead of FunSuite. So my tests look like
class MyTestSp
I refreshed my Spark version to the master branch as of this morning, and
am noticing some strange behavior with executors and the UI reading
executor logs while running a job in what used to be standalone mode (is
still now called coarse grained scheduler mode or still standalone mode?).
For start
; should be supported uniformly regardless of where it serializes, but that's
> the state of things as it stands.
>
>
>
> On Mon, Dec 23, 2013 at 8:21 AM, Ameet Kini wrote:
>
>> Thanks Imran.
>>
>> I tried setting "spark.closure.serializer" to
>
g at the code Executor.scala
> line195, you will at least know what cause the NPE.
> We can start from there.
>
>
>
>
> On Dec 23, 2013, at 10:21 AM, Ameet Kini wrote:
>
> Thanks Imran.
>
> I tried setting "spark.closure.serializer" to
> "org.apac
ybe try to implement your class with serializable...
>
>
> 2013/12/23 Ameet Kini
>
>> Thanks Imran.
>>
>> I tried setting "spark.closure.serializer" to
>> "org.apache.spark.serializer.KryoSerializer" and now end up seeing
>> NullPoint
tml)
>
> that is used to serialize whatever is used by all the fucntions on an RDD,
> eg., map, filter, and lookup. Those closures include referenced variables,
> like your
> TileIdWritable.
>
> So you need to either change that to use kryo, or make your object
> serializable to j
I'm getting the below NotSerializableException despite using Kryo to
serialize that class (TileIdWritable).
The offending line: awtestRdd.lookup(TileIdWritable(200))
Initially I thought Kryo is not being registered properly, so I tried
running operations over awtestRDD which force a shuffle (e.g.
your suggestions.
Ameet
On Mon, Oct 21, 2013 at 2:12 PM, Reynold Xin wrote:
> Maybe you can override HadoopRDD's compute method to do that?
>
>
> On Mon, Oct 21, 2013 at 8:16 AM, Ameet Kini wrote:
>
>> Right, except both my sequence files are large and so
ey)).map { row =>
> join smallTable.get(row.joinKey) with row itself
> }
> }
>
>
>
>
> On Fri, Oct 18, 2013 at 2:22 PM, Ameet Kini wrote:
>
>> Forgot to add an important point. My sequence files are sorted (they're
>> actually Hadoop map files). Since th
Forgot to add an important point. My sequence files are sorted (they're
actually Hadoop map files). Since they're sorted, it makes sense to do a
fetch at the partition-level of the inner sequence file.
Thanks,
Ameet
On Fri, Oct 18, 2013 at 5:20 PM, Ameet Kini wrote:
>
> I
I've seen discussions where the suggestion is to do a map-side join, but
haven't seen an example yet, and can certainly use one. I have two sequence
files where the key is unique within each file, so the join is a one-to-one
join, and can hence benefit from a map-side join. However both sequence
fi
ILLED sounds most reasonable for normal
> termination.
>
> I've went ahead and created
> https://spark-project.atlassian.net/browse/SPARK-937 to fix this.
>
>
> On Fri, Oct 18, 2013 at 7:56 AM, Ameet Kini wrote:
>
>> Jey,
>>
>> I don't see a "close()&
at jobs get reported as
KILLED even though they run through successfully.
Ameet
On Thu, Oct 17, 2013 at 5:59 PM, Jey Kottalam wrote:
> You can try calling the "close()" method on your SparkContext, which
> should allow for a cleaner shutdown.
>
> On Thu, Oct 17, 2013 at 2:3
I'm using the scala 2.10 branch of Spark in standalone mode, and am
seeing the job reports itself as KILLED in the UI with the below
message in each of the executors log, even though the job processes
correctly and returns the correct result. The job is triggered by a
.count on an RDD and the count
I'm using the scala 2.10 branch of Spark in standalone mode, and am finding
that the executor gets started with the default 512M even after setting
spark.executor.memory to 6G. This leads to my job getting an OOM. I've
tried setting spark.executor.memory both programmatically (using
System.setPrope
lob/master/core/src/main/scala/spark/PairRDDFunctions.scala#L609
>
> You can take a look at how that is done.
>
>
> --
> Reynold Xin, AMPLab, UC Berkeley
> http://rxin.org
>
>
>
> On Wed, Aug 28, 2013 at 6:56 AM, Ameet Kini wrote:
>
>> Folks,
>>
>>
Folks,
Still stuck on this, so would greatly appreciate any pointers as to how to
force Spark to recognize the mapred.output.compression.type hadoop
parameter.
Thanks,
Ameet
On Mon, Aug 26, 2013 at 6:09 PM, Ameet Kini wrote:
>
> I'm trying to use saveAsSequenceFile to output
I'm trying to use saveAsSequenceFile to output compressed sequenced files
where the "value" in each key,value pair is compressed. In Hadoop, I would
set this job configuration parameter:
"mapred.output.compression.type=RECORD" for record level compression.
Previous posts have suggested that this is
e serialization. In fact you might note that you can group
> and reduce any kind of object in Spark, not just subclasses of Writable.
>
> Matei
>
> On Aug 10, 2013, at 6:20 PM, Ameet Kini wrote:
>
>
> *copy a Writable object if you expect to use the value after the next one
d to
> allocate another Writable. So as another general rule, just converting the
> object from a Writable to a "normal" Java type if you want to keep it
> around longer is another way. Really it's take() and collect() that will be
> the most confusing.
>
> Matei
>
When iterating over a HadoopRDD created using SparkContext.sequenceFile, I
noticed that if I don't copy the key as below, every tuple in the RDD has
the same value as the last one seen. Clearly the object is being recycled,
so if I don't clone the object, I'm in trouble.
Say if my sequence files h
3 at 8:46 PM, Jason Dai wrote:
> Yes, you can do that using the Scala IDE; it's not perfect though.
>
> Thanks,
> -Jason
>
>
> On Wed, Jul 31, 2013 at 5:18 AM, Ameet Kini wrote:
>>
>> Have any Eclipse users been able to search for references (i.e.,
>> Refer
Have any Eclipse users been able to search for references (i.e.,
References -> Project, or References -> Workspace) on Scala classes in
their Spark project ?
I have a project Foo that depends on Spark. Both Foo and spark-core
are Eclipse projects. Within Foo, I'm able to search for references
for
25 matches
Mail list logo