Thanks Krishna,
I use a small cluster and each compute node has 16GB of RAM and 8 2.66GHz
CPU cores.









On Sat, Jun 21, 2014 at 3:16 PM, Krishna Sankar [via Apache Spark User
List] <ml-node+s1001560n8077...@n3.nabble.com> wrote:

> Hi,
>
>    - I have seen similar behavior before. As far as I can tell, the root
>    cause is the out of memory error - verified this by monitoring the memory.
>       - I had a 30 GB file and was running on a single machine with 16GB.
>       So I knew it would fail.
>       - But instead of raising an exception, some part of the system
>       keeps on churning.
>    - My suggestion is to follow the memory settings for the JVM (try
>    bigger settings), make sure the settings are propagated to all the workers
>    and finally monitor the memory while the job is running.
>    - Another vector is to split the file, try with progressively
>    increasing size.
>    - I also see symptoms of failed connections. While I can't positively
>    say that it is a problem, check your topology & network connectivity.
>    - Out of curiosity, what kind of machines are you running ? Bare metal
>    ? EC2 ? How much memory ? 64 bit OS ?
>       - I assume these are big machines and so the resources themselves
>       might not be a problem.
>
> Cheers
> <k/>
>
>
> On Sat, Jun 21, 2014 at 12:55 PM, yxzhao <[hidden email]
> <http://user/SendEmail.jtp?type=node&node=8077&i=0>> wrote:
>
>> I run the pagerank example processing a large data set, 5GB in size,
>> using 48
>> machines. The job got stuck at the time point: 14/05/20 21:32:17, as the
>> attached log shows. It was stuck there for more than 10 hours and then I
>> killed it at last. But I did not find any information explaining why it
>> was
>> stuck. Any suggestions? Thanks.
>>
>> Spark_OK_48_pagerank.log
>> <
>> http://apache-spark-user-list.1001560.n3.nabble.com/file/n8075/Spark_OK_48_pagerank.log
>> >
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Processing-Large-Data-Stuck-tp8075.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>
>
>
> ------------------------------
>  If you reply to this email, your message will be added to the discussion
> below:
>
> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Processing-Large-Data-Stuck-tp8075p8077.html
>  To unsubscribe from Spark Processing Large Data Stuck, click here
> <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=8075&code=eXh6aGFvQHVhbHIuZWR1fDgwNzV8LTY0Mjc0NDkzMQ==>
> .
> NAML
> <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Processing-Large-Data-Stuck-tp8075p8080.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Reply via email to