The server has 100+ gb of memory. Virtual memory for my job is 60gb and reserved is 20-30 gb. So there is plenty of memory to spare even when job is stuck. I am not sure if it is GC because there is still lot of memory which job could have used. The jobs memory consumption remains same after it resumes and there are no swaps too. (I observe all this using top command) However as job progresses (with all the halt and resume cycles) the memory used slowly increases but never reaches max.
When the job gets stuck, CPU drops to 0% and memory is unchanged. I have observed this behavior with my other spark scripts too which run on multiple small files. I thought it was because I was using a single machine . but I believe that shouldn't be the case. Does anyone of you observe such behavior? Thx -Vijay University of Washington On Nov 27, 2013 12:44 AM, "Liu, Raymond" <raymond....@intel.com> wrote: > How about memory usage, any GC problem? When you mention get stuck, you > mean 0% or 1200% CPU while no progress? > > Raymond > > From: Vijay Gaikwad [mailto:vijay...@gmail.com] > Sent: Wednesday, November 27, 2013 2:54 PM > To: user@spark.incubator.apache.org > Subject: Re: local[k] job gets stuck - spark 0.8.0 > > Hi Patrick, > > Sorry I don't have access to web UI. > So I have been running these jobs on larger servers and letting them run.. > I have observed that when I run a job with "local[12]", it runs for some > time on full throttle at 1200% CPU consumptions, but after some this > processing goes to 0%. > After few seconds it again starts processing and goes to high percentage > of CPU utilization. This cycle repeats till the job is completed. > Ironically I observed similar behavior simple "local" jobs. > > Is it the nature of the job that is causing this? I am processing a 70GB > file and performing simple map and reduce operations. I am sufficient 100GB > ram. > Any thoughts? > > Vijay Gaikwad > University of Washington MSIM > vijay...@gmail.com > (206) 261-5828 > > On Nov 25, 2013, at 11:43 AM, Patrick Wendell <pwend...@gmail.com> wrote: > > > When it gets stuck, what does it show in the web UI? Also, can you run > a jstack on the process and attach the output... that might explain > what's going on. > > On Mon, Nov 25, 2013 at 11:30 AM, Vijay Gaikwad <vijay...@gmail.com> > wrote: > > I am using apache spark 0.8.0 to process a large data file and perform some > basic .map and.reduceByKey operations on the RDD. > > Since I am using a single machine with multiple processors, I mention > local[8] in the Master URL field while creating SparkContext > > val sc = new SparkContext("local[8]", "Tower-Aggs", SPARK_HOME ) > > But whenever I mention multiple processors, the job gets stuck > (pauses/halts) randomly. There is no definite place where it gets stuck, > its > just random. Sometimes it won't happen at all. I am not sure if it > continues > after that but it gets stuck for a long time after which I abort the job. > > But when I just use local in place of local[8], the job runs seamlessly > without getting stuck ever. > > val sc = new SparkContext("local", "Tower-Aggs", SPARK_HOME ) > > I am not able to understand where is the problem. > > I am using Scala 2.9.3 and sbt to build and run the application > > > - > > http://stackoverflow.com/questions/20187048/apache-spark-localk-master-url-job-gets-stuck > > Thx > Vijay Gaikwad > University of Washington MSIM > vijay...@gmail.com > (206) 261-5828 > >