Thanks for the response, Aaron!  We'll give it a try tomorrow.

On Tue, May 20, 2014 at 12:13 AM, Aaron Davidson [via Apache Spark User
List] <ml-node+s1001560n6073...@n3.nabble.com> wrote:

> This is very likely because the serialized map output locations buffer
> exceeds the akka frame size. Please try setting "spark.akka.frameSize"
> (default 10 MB) to some higher number, like 64 or 128.
>
> In the newest version of Spark, this would throw a better error, for what
> it's worth.
>
>
>
> On Mon, May 19, 2014 at 8:39 PM, jonathan.keebler <[hidden 
> email]<http://user/SendEmail.jtp?type=node&node=6073&i=0>
> > wrote:
>
>> Has anyone observed Spark worker threads stalling during a shuffle phase
>> with
>> the following message (one per worker host) being echoed to the terminal
>> on
>> the driver thread?
>>
>> INFO spark.MapOutputTrackerActor: Asked to send map output locations for
>> shuffle 0 to [worker host]...
>>
>>
>> At this point Spark-related activity on the hadoop cluster completely
>> halts
>> .. there's no network activity, disk IO or CPU activity, and individual
>> tasks are not completing and the job just sits in this state.  At this
>> point
>> we just kill the job & a re-start of the Spark server service is required.
>>
>> Using identical jobs we were able to by-pass this halt point by increasing
>> available heap memory to the workers, but it's odd we don't get an
>> out-of-memory error or any error at all.  Upping the memory available
>> isn't
>> a very satisfying answer to what may be going on :)
>>
>> We're running Spark 0.9.0 on CDH5.0 in stand-alone mode.
>>
>> Thanks for any help or ideas you may have!
>>
>> Cheers,
>> Jonathan
>>
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-stalling-during-shuffle-maybe-a-memory-issue-tp6067.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>
>
>
> ------------------------------
>  If you reply to this email, your message will be added to the discussion
> below:
>
> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-stalling-during-shuffle-maybe-a-memory-issue-tp6067p6073.html
>  To unsubscribe from Spark stalling during shuffle (maybe a memory issue), 
> click
> here<http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=6067&code=amtlZWJsZXI0MkBnbWFpbC5jb218NjA2N3wtMjA5NzAzMzE5NQ==>
> .
> NAML<http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-stalling-during-shuffle-maybe-a-memory-issue-tp6067p6074.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Reply via email to