The job get stuck meaning it halts for some time. Doesn’t do any processing. CPU usage goes to 0%. After some time the processing resumes and CPU goes up. This cycle continues as job progresses till it completes.
But today while I am running some other spark jobs it isn’t happening. The job is running seamlessly without halts on multiple cores. Although it throws “TooManyFileOpen exception” if I increase the number of cores beyond 4. Still I will try running the jstack on the process, especially the one which gets stuck. Thx Vijay Gaikwad University of Washington MSIM vijay...@gmail.com (206) 261-5828 On Nov 27, 2013, at 1:34 PM, Patrick Wendell <pwend...@gmail.com> wrote: > Vijay - you said the job gets stuck but you also said it eventually > completes. What do you mean by stuck? Do you mean that there are > periods of low CPU utilization? > > If you can run jstack during one of the periods and post the output > that would be most helpful. > > On Wed, Nov 27, 2013 at 1:04 AM, Vijay Gaikwad <vijay...@gmail.com> wrote: >> The server has 100+ gb of memory. Virtual memory for my job is 60gb and >> reserved is 20-30 gb. So there is plenty of memory to spare even when job is >> stuck. I am not sure if it is GC because there is still lot of memory which >> job could have used. The jobs memory consumption remains same after it >> resumes and there are no swaps too. (I observe all this using top command) >> However as job progresses (with all the halt and resume cycles) the memory >> used slowly increases but never reaches max. >> >> When the job gets stuck, CPU drops to 0% and memory is unchanged. >> >> I have observed this behavior with my other spark scripts too which run on >> multiple small files. I thought it was because I was using a single machine >> . but I believe that shouldn't be the case. >> >> Does anyone of you observe such behavior? >> Thx >> >> -Vijay >> University of Washington >> >> On Nov 27, 2013 12:44 AM, "Liu, Raymond" <raymond....@intel.com> wrote: >>> >>> How about memory usage, any GC problem? When you mention get stuck, you >>> mean 0% or 1200% CPU while no progress? >>> >>> Raymond >>> >>> From: Vijay Gaikwad [mailto:vijay...@gmail.com] >>> Sent: Wednesday, November 27, 2013 2:54 PM >>> To: user@spark.incubator.apache.org >>> Subject: Re: local[k] job gets stuck - spark 0.8.0 >>> >>> Hi Patrick, >>> >>> Sorry I don't have access to web UI. >>> So I have been running these jobs on larger servers and letting them run.. >>> I have observed that when I run a job with "local[12]", it runs for some >>> time on full throttle at 1200% CPU consumptions, but after some this >>> processing goes to 0%. >>> After few seconds it again starts processing and goes to high percentage >>> of CPU utilization. This cycle repeats till the job is completed. >>> Ironically I observed similar behavior simple "local" jobs. >>> >>> Is it the nature of the job that is causing this? I am processing a 70GB >>> file and performing simple map and reduce operations. I am sufficient 100GB >>> ram. >>> Any thoughts? >>> >>> Vijay Gaikwad >>> University of Washington MSIM >>> vijay...@gmail.com >>> (206) 261-5828 >>> >>> On Nov 25, 2013, at 11:43 AM, Patrick Wendell <pwend...@gmail.com> wrote: >>> >>> >>> When it gets stuck, what does it show in the web UI? Also, can you run >>> a jstack on the process and attach the output... that might explain >>> what's going on. >>> >>> On Mon, Nov 25, 2013 at 11:30 AM, Vijay Gaikwad <vijay...@gmail.com> >>> wrote: >>> >>> I am using apache spark 0.8.0 to process a large data file and perform >>> some >>> basic .map and.reduceByKey operations on the RDD. >>> >>> Since I am using a single machine with multiple processors, I mention >>> local[8] in the Master URL field while creating SparkContext >>> >>> val sc = new SparkContext("local[8]", "Tower-Aggs", SPARK_HOME ) >>> >>> But whenever I mention multiple processors, the job gets stuck >>> (pauses/halts) randomly. There is no definite place where it gets stuck, >>> its >>> just random. Sometimes it won't happen at all. I am not sure if it >>> continues >>> after that but it gets stuck for a long time after which I abort the job. >>> >>> But when I just use local in place of local[8], the job runs seamlessly >>> without getting stuck ever. >>> >>> val sc = new SparkContext("local", "Tower-Aggs", SPARK_HOME ) >>> >>> I am not able to understand where is the problem. >>> >>> I am using Scala 2.9.3 and sbt to build and run the application >>> >>> >>> - >>> >>> http://stackoverflow.com/questions/20187048/apache-spark-localk-master-url-job-gets-stuck >>> >>> Thx >>> Vijay Gaikwad >>> University of Washington MSIM >>> vijay...@gmail.com >>> (206) 261-5828 >>> >>