yes I can certainly use jstack but it requires 4 to 5 hours for me to reproduce the error so I can get back as early as possible.
Thanks a lot! On Mon, Oct 31, 2016 at 12:41 PM, Shixiong(Ryan) Zhu < shixi...@databricks.com> wrote: > Then it should not be a Receiver issue. Could you use `jstack` to find out > the name of leaking threads? > > On Mon, Oct 31, 2016 at 12:35 PM, kant kodali <kanth...@gmail.com> wrote: > >> Hi Ryan, >> >> It happens on the driver side and I am running on a client mode (not the >> cluster mode). >> >> Thanks! >> >> On Mon, Oct 31, 2016 at 12:32 PM, Shixiong(Ryan) Zhu < >> shixi...@databricks.com> wrote: >> >>> Sorry, there is a typo in my previous email: this may **not** be the >>> root cause if the leak threads are in the driver side. >>> >>> Does it happen in the driver or executors? >>> >>> On Mon, Oct 31, 2016 at 12:20 PM, kant kodali <kanth...@gmail.com> >>> wrote: >>> >>>> Hi Ryan, >>>> >>>> Ahh My Receiver.onStop method is currently empty. >>>> >>>> 1) I have a hard time seeing why the receiver would crash so many times >>>> within a span of 4 to 5 hours but anyways I understand I should still >>>> cleanup during OnStop. >>>> >>>> 2) How do I clean up those threads? The documentation here >>>> https://docs.oracle.com/javase/8/docs/api/java/lang/Thread.html doesn't >>>> seem to have any method where I can clean up the threads created during >>>> OnStart. any ideas? >>>> >>>> Thanks! >>>> >>>> >>>> On Mon, Oct 31, 2016 at 11:58 AM, Shixiong(Ryan) Zhu < >>>> shixi...@databricks.com> wrote: >>>> >>>>> So in your code, each Receiver will start a new thread. Did you stop >>>>> the receiver properly in `Receiver.onStop`? Otherwise, you may leak >>>>> threads >>>>> after a receiver crashes and is restarted by Spark. However, this may be >>>>> the root cause since the leak threads are in the driver side. Could you >>>>> use >>>>> `jstack` to check which types of threads are leaking? >>>>> >>>>> On Mon, Oct 31, 2016 at 11:50 AM, kant kodali <kanth...@gmail.com> >>>>> wrote: >>>>> >>>>>> I am also under the assumption that *onStart *function of the >>>>>> Receiver is only called only once by Spark. please correct me if I >>>>>> am wrong. >>>>>> >>>>>> On Mon, Oct 31, 2016 at 11:35 AM, kant kodali <kanth...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> My driver program runs a spark streaming job. And it spawns a >>>>>>> thread by itself only in the *onStart()* function below Other than >>>>>>> that it doesn't spawn any other threads. It only calls MapToPair, >>>>>>> ReduceByKey, forEachRDD, Collect functions. >>>>>>> >>>>>>> public class NSQReceiver extends Receiver<String> { >>>>>>> >>>>>>> private String topic=""; >>>>>>> >>>>>>> public NSQReceiver(String topic) { >>>>>>> super(StorageLevel.MEMORY_AND_DISK_2()); >>>>>>> this.topic = topic; >>>>>>> } >>>>>>> >>>>>>> @Override >>>>>>> public void *onStart()* { >>>>>>> new Thread() { >>>>>>> @Override public void run() { >>>>>>> receive(); >>>>>>> } >>>>>>> }.start(); >>>>>>> } >>>>>>> >>>>>>> } >>>>>>> >>>>>>> >>>>>>> Environment info: >>>>>>> >>>>>>> Java 8 >>>>>>> >>>>>>> Scala 2.11.8 >>>>>>> >>>>>>> Spark 2.0.0 >>>>>>> >>>>>>> More than happy to share any other info you may need. >>>>>>> >>>>>>> >>>>>>> On Mon, Oct 31, 2016 at 11:05 AM, Jakob Odersky <ja...@odersky.com> >>>>>>> wrote: >>>>>>> >>>>>>>> > how do I tell my spark driver program to not create so many? >>>>>>>> >>>>>>>> This may depend on your driver program. Do you spawn any threads in >>>>>>>> it? Could you share some more information on the driver program, >>>>>>>> spark >>>>>>>> version and your environment? It would greatly help others to help >>>>>>>> you >>>>>>>> >>>>>>>> On Mon, Oct 31, 2016 at 3:47 AM, kant kodali <kanth...@gmail.com> >>>>>>>> wrote: >>>>>>>> > The source of my problem is actually that I am running into the >>>>>>>> following >>>>>>>> > error. This error seems to happen after running my driver program >>>>>>>> for 4 >>>>>>>> > hours. >>>>>>>> > >>>>>>>> > "Exception in thread "ForkJoinPool-50-worker-11" Exception in >>>>>>>> thread >>>>>>>> > "dag-scheduler-event-loop" Exception in thread >>>>>>>> "ForkJoinPool-50-worker-13" >>>>>>>> > java.lang.OutOfMemoryError: unable to create new native thread" >>>>>>>> > >>>>>>>> > and this wonderful book taught me that the error "unable to >>>>>>>> create new >>>>>>>> > native thread" can happen because JVM is trying to request the OS >>>>>>>> for a >>>>>>>> > thread and it is refusing to do so for the following reasons >>>>>>>> > >>>>>>>> > 1. The system has actually run out of virtual memory. >>>>>>>> > 2. On Unix-style systems, the user has already created (between >>>>>>>> all programs >>>>>>>> > user is running) the maximum number of processes configured for >>>>>>>> that user >>>>>>>> > login. Individual threads are considered a process in that regard. >>>>>>>> > >>>>>>>> > Option #2 is ruled out in my case because my driver programing is >>>>>>>> running >>>>>>>> > with a userid of root which has maximum number of processes set >>>>>>>> to 120242 >>>>>>>> > >>>>>>>> > ulimit -a gives me the following >>>>>>>> > >>>>>>>> > core file size (blocks, -c) 0 >>>>>>>> > data seg size (kbytes, -d) unlimited >>>>>>>> > scheduling priority (-e) 0 >>>>>>>> > file size (blocks, -f) unlimited >>>>>>>> > pending signals (-i) 120242 >>>>>>>> > max locked memory (kbytes, -l) 64 >>>>>>>> > max memory size (kbytes, -m) unlimited >>>>>>>> > open files (-n) 1024 >>>>>>>> > pipe size (512 bytes, -p) 8 >>>>>>>> > POSIX message queues (bytes, -q) 819200 >>>>>>>> > real-time priority (-r) 0 >>>>>>>> > stack size (kbytes, -s) 8192 >>>>>>>> > cpu time (seconds, -t) unlimited >>>>>>>> > max user processes (-u) 120242 >>>>>>>> > virtual memory (kbytes, -v) unlimited >>>>>>>> > file locks (-x) unlimited >>>>>>>> > >>>>>>>> > So at this point I do understand that the I am running out of >>>>>>>> memory due to >>>>>>>> > allocation of threads so my biggest question is how do I tell my >>>>>>>> spark >>>>>>>> > driver program to not create so many? >>>>>>>> > >>>>>>>> > On Mon, Oct 31, 2016 at 3:25 AM, Sean Owen <so...@cloudera.com> >>>>>>>> wrote: >>>>>>>> >> >>>>>>>> >> ps -L [pid] is what shows threads. I am not sure this is >>>>>>>> counting what you >>>>>>>> >> think it does. My shell process has about a hundred threads, and >>>>>>>> I can't >>>>>>>> >> imagine why one would have thousands unless your app spawned >>>>>>>> them. >>>>>>>> >> >>>>>>>> >> On Mon, Oct 31, 2016 at 10:20 AM kant kodali <kanth...@gmail.com> >>>>>>>> wrote: >>>>>>>> >>> >>>>>>>> >>> when I do >>>>>>>> >>> >>>>>>>> >>> ps -elfT | grep "spark-driver-program.jar" | wc -l >>>>>>>> >>> >>>>>>>> >>> The result is around 32K. why does it create so many threads >>>>>>>> how can I >>>>>>>> >>> limit this? >>>>>>>> > >>>>>>>> > >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >