gt;
>
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of t
Not sure exactly how you use it. My understanding is that in spark it would be
better to keep the overhead of driver as less as possible. Is it possible to
broadcast trie to executors, do computation there and then aggregate the
counters (??) in reduct phase?
Thanks.
Zhan Zhang
On Aug 18, 201
Hi Zhan,
Thanks for looking into this. I'm actually using the hash map as an example
of the simplest snippet of code that is failing for me. I know that this is
just the word count. In my actual problem I'm using a Trie data structure
to find substring matches.
On Sun, Aug 17, 2014 at 11:35 PM, Z
Is it because countByValue or toArray put too much stress on the driver, if
there are many unique words
To me it is a typical word count problem, then you can solve it as follows
(correct me if I am wrong)
val textFile = sc.textFile(“file")
val counts = textFile.flatMap(line => line.split(" "))
----
>> >> > To unsubscribe, e-mail: [hidden email]
>> >> <http://user/SendEmail.jtp?type=node&node=7883&i=1>
>> >> > For additional commands, e-mail: [hidden email]
>> >> <http://user/SendEmail.jtp?type=node&node=7883&
gt; >> >
> >>
> >> -
> >> To unsubscribe, e-mail: [hidden email]
> >> <http://user/SendEmail.jtp?type=node&node=7883&i=3>
> >> For additional commands, e-mail: [
il: [hidden email]
>> <http://user/SendEmail.jtp?type=node&node=7883&i=3>
>> For additional commands, e-mail: [hidden email]
>> <http://user/SendEmail.jtp?type=node&node=7883&i=4>
>>
>>
>>
>>
tp?type=node&node=7883&i=4>
>
>
>
> --
> If you reply to this email, your message will be added to the discussion
> below:
>
> http://apache-spark-developers-list.1001551.n3.nabble.com/spark-akka-frameSize
Did you verify the driver memory in the Executor tab of the WebUI? I
think you need `--driver-memory 8g` with spark-shell or spark-submit
instead of setting it in spark-defaults.conf.
On Fri, Aug 15, 2014 at 12:41 PM, jerryye wrote:
> Setting spark.driver.memory has no effect. It's still hanging
Setting spark.driver.memory has no effect. It's still hanging trying to
compute result.count when I'm sampling greater than 35% regardless of what
value of spark.driver.memory I'm setting.
Here's my settings:
export SPARK_JAVA_OPTS="-Xms5g -Xmx10g -XX:MaxPermSize=10g"
export SPARK_MEM=10g
in conf
---
> > To unsubscribe, e-mail: [hidden email]
> > For additional commands, e-mail: [hidden email]
> >
>
> -
> To unsubscribe, e-mail: [hidden email]
> For additio
Did you set driver memory? You can confirm it in the Executors tab of
the WebUI. Btw, the code may only work in local mode. In a cluster
mode, counts will be serialized to remote workers and the result is
not fetched by the driver after foreach. You can use RDD.countByValue
instead. -Xiangrui
On F
Hi All,
I'm not sure if I should file a JIRA or if I'm missing something obvious
since the test code I'm trying is so simple. I've isolated the problem I'm
seeing to a memory issue but I don't know what parameter I need to tweak, it
does seem related to spark.akka.frameSize. If I sample my RDD with
13 matches
Mail list logo