Re: local[k] job gets stuck - spark 0.8.0

Vijay Gaikwad Wed, 27 Nov 2013 14:15:43 -0800

The job get stuck meaning it halts for some time. Doesn’t do any processing. 
CPU usage goes to 0%. After some time the processing resumes and CPU goes up. 
This cycle continues as job progresses till it completes.


But today while I am running some other spark jobs it isn’t happening. The job 
is running seamlessly without halts on multiple cores. Although it throws 
“TooManyFileOpen exception” if I increase the number of cores beyond 4.

Still I will try running the jstack on the process, especially the one which 
gets stuck.
Thx

Vijay Gaikwad
University of Washington MSIM
vijay...@gmail.com
(206) 261-5828

On Nov 27, 2013, at 1:34 PM, Patrick Wendell <pwend...@gmail.com> wrote:

> Vijay - you said the job gets stuck but you also said it eventually
> completes. What do you mean by stuck? Do you mean that there are
> periods of low CPU utilization?
> 
> If you can run jstack during one of the periods and post the output
> that would be most helpful.
> 
> On Wed, Nov 27, 2013 at 1:04 AM, Vijay Gaikwad <vijay...@gmail.com> wrote:
>> The server has 100+ gb of memory. Virtual memory for my job is 60gb and
>> reserved is 20-30 gb. So there is plenty of memory to spare even when job is
>> stuck. I am not sure if it is GC because there is still lot of memory which
>> job could have used. The jobs memory consumption remains same after it
>> resumes and there are no swaps too. (I observe all this using top command)
>> However as job progresses (with all the halt and resume cycles) the memory
>> used slowly increases but never reaches max.
>> 
>> When the job gets stuck, CPU drops to 0% and memory is unchanged.
>> 
>> I have observed this behavior with my other spark scripts too which run on
>> multiple small files. I thought it was because I was using a single machine
>> . but I believe that shouldn't be the case.
>> 
>> Does anyone of you observe such behavior?
>> Thx
>> 
>> -Vijay
>> University of Washington
>> 
>> On Nov 27, 2013 12:44 AM, "Liu, Raymond" <raymond....@intel.com> wrote:
>>> 
>>> How about memory usage, any GC problem? When you mention get stuck, you
>>> mean 0% or 1200% CPU while no progress?
>>> 
>>> Raymond
>>> 
>>> From: Vijay Gaikwad [mailto:vijay...@gmail.com]
>>> Sent: Wednesday, November 27, 2013 2:54 PM
>>> To: user@spark.incubator.apache.org
>>> Subject: Re: local[k] job gets stuck - spark 0.8.0
>>> 
>>> Hi Patrick,
>>> 
>>> Sorry I don't have access to web UI.
>>> So I have been running these jobs on larger servers and letting them run..
>>> I have observed that when I run a job with "local[12]", it runs for some
>>> time on full throttle at 1200% CPU consumptions, but after some this
>>> processing goes to 0%.
>>> After few seconds it again starts processing and goes to high percentage
>>> of CPU utilization. This cycle repeats till the job is completed.
>>> Ironically I observed similar behavior simple "local" jobs.
>>> 
>>> Is it the nature of the job that is causing this? I am processing a 70GB
>>> file and performing simple map and reduce operations. I am sufficient 100GB
>>> ram.
>>> Any thoughts?
>>> 
>>> Vijay Gaikwad
>>> University of Washington MSIM
>>> vijay...@gmail.com
>>> (206) 261-5828
>>> 
>>> On Nov 25, 2013, at 11:43 AM, Patrick Wendell <pwend...@gmail.com> wrote:
>>> 
>>> 
>>> When it gets stuck, what does it show in the web UI? Also, can you run
>>> a jstack on the process and attach the output... that might explain
>>> what's going on.
>>> 
>>> On Mon, Nov 25, 2013 at 11:30 AM, Vijay Gaikwad <vijay...@gmail.com>
>>> wrote:
>>> 
>>> I am using apache spark 0.8.0 to process a large data file and perform
>>> some
>>> basic .map and.reduceByKey operations on the RDD.
>>> 
>>> Since I am using a single machine with multiple processors, I mention
>>> local[8] in the Master URL field while creating SparkContext
>>> 
>>> val sc = new SparkContext("local[8]", "Tower-Aggs", SPARK_HOME )
>>> 
>>> But whenever I mention multiple processors, the job gets stuck
>>> (pauses/halts) randomly. There is no definite place where it gets stuck,
>>> its
>>> just random. Sometimes it won't happen at all. I am not sure if it
>>> continues
>>> after that but it gets stuck for a long time after which I abort the job.
>>> 
>>> But when I just use local in place of local[8], the job runs seamlessly
>>> without getting stuck ever.
>>> 
>>> val sc = new SparkContext("local", "Tower-Aggs", SPARK_HOME )
>>> 
>>> I am not able to understand where is the problem.
>>> 
>>> I am using Scala 2.9.3 and sbt to build and run the application
>>> 
>>> 
>>> -
>>> 
>>> http://stackoverflow.com/questions/20187048/apache-spark-localk-master-url-job-gets-stuck
>>> 
>>> Thx
>>> Vijay Gaikwad
>>> University of Washington MSIM
>>> vijay...@gmail.com
>>> (206) 261-5828
>>> 
>>

Re: local[k] job gets stuck - spark 0.8.0

Reply via email to