Hi,

   - I have seen similar behavior before. As far as I can tell, the root
   cause is the out of memory error - verified this by monitoring the memory.
      - I had a 30 GB file and was running on a single machine with 16GB.
      So I knew it would fail.
      - But instead of raising an exception, some part of the system keeps
      on churning.
   - My suggestion is to follow the memory settings for the JVM (try bigger
   settings), make sure the settings are propagated to all the workers and
   finally monitor the memory while the job is running.
   - Another vector is to split the file, try with progressively increasing
   size.
   - I also see symptoms of failed connections. While I can't positively
   say that it is a problem, check your topology & network connectivity.
   - Out of curiosity, what kind of machines are you running ? Bare metal ?
   EC2 ? How much memory ? 64 bit OS ?
      - I assume these are big machines and so the resources themselves
      might not be a problem.

Cheers
<k/>


On Sat, Jun 21, 2014 at 12:55 PM, yxzhao <yxz...@ualr.edu> wrote:

> I run the pagerank example processing a large data set, 5GB in size, using
> 48
> machines. The job got stuck at the time point: 14/05/20 21:32:17, as the
> attached log shows. It was stuck there for more than 10 hours and then I
> killed it at last. But I did not find any information explaining why it was
> stuck. Any suggestions? Thanks.
>
> Spark_OK_48_pagerank.log
> <
> http://apache-spark-user-list.1001560.n3.nabble.com/file/n8075/Spark_OK_48_pagerank.log
> >
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Processing-Large-Data-Stuck-tp8075.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>

Reply via email to