I know for sure that R process gets killed (or quits) but don't know if its father process (interpreter.sh) gets killed too.
I noticed that I can always restart the interpreter on 0.7.1 while sometimes it was impossible to do on 0.7.0 (I had to manually restart zeppelin service). Probably that JIRA improved the situation a little bit. Now I'm running a bash script that tracks start and stop time of R process in order to shed some light on this issue. I enabled DEBUG logging in log4j properties file. Il 6 mag 2017 4:43 PM, "Paul Brenner" <pbren...@placeiq.com> ha scritto: > Great work documenting repeatable steps for this hard to nail down > problem. I see similar problems running the spark (scala) interpreter but > haven’t been as systematic about hunting down the issue as you. > > I do wonder if this is related somehow to https://issues.apache.org/ > jira/browse/ZEPPELIN-1832 > <https://share.polymail.io/v1/z/b/NTkwZGRlMzNiZmFi/Go00wlomvjABQNciq78PfdeRmR4K6c4M5l8KsTYGlks2sD4oe9jS7NYIkVZ2KKlntmyN0z2ZbiIFSP59SQpYL0hq_V6k3ZjCvIj_gDhLCD5s9K74YEQl1S5xOyCx0TK-xuhEd59t3p3nhZrhs1xXLJxUEM6PoX1EWAcJswdLQj6oNrNLeE-0uF9D4DZjlMlBWs_aYKvi14I21deKenrCDUCPJccm> > which just seems to have addressed killing off zombie processes but I’m > not sure it covered where zombie processes are coming from. Perhaps we need > to open a ticket for this? > > In the mean time if you don’t have the ability to restart zeppelin every > time you run into this process you can probably just kill the interpreter > process. I find myself doing that multiple times in an normal work day. > > <http://www.placeiq.com/> <http://www.placeiq.com/> > <http://www.placeiq.com/> Paul Brenner <https://twitter.com/placeiq> > <https://twitter.com/placeiq> <https://twitter.com/placeiq> > <https://www.facebook.com/PlaceIQ> <https://www.facebook.com/PlaceIQ> > <https://www.linkedin.com/company/placeiq> > <https://www.linkedin.com/company/placeiq> > DATA SCIENTIST > *(217) 390-3033 * > [image: PlaceIQ:Location Data Accuracy] > <http://placeiq.com/2016/12/07/placeiq-introduces-landmark-a-groundbreaking-offering-that-delivers-access-to-the-highest-quality-location-data-for-insights-that-fuel-limitless-business-decisions/> > <http://placeiq.com/2016/10/26/the-making-of-a-location-data-industry-milestone/> > <http://placeiq.com/2016/08/03/placeiq-bolsters-location-intelligence-platform-with-mastercard-insights/> > <http://pages.placeiq.com/Location-Data-Accuracy-Whitepaper-Download.html?utm_source=Signature&utm_medium=Email&utm_campaign=AccuracyWP> > <http://placeiq.com/2016/04/13/placeiq-joins-the-network-advertising-initiative-nai-as-100th-member/> > <http://placeiq.com/2016/04/13/placeiq-joins-the-network-advertising-initiative-nai-as-100th-member/> > <http://placeiq.com/2016/04/13/placeiq-joins-the-network-advertising-initiative-nai-as-100th-member/> > <http://placeiq.com/2016/04/13/placeiq-joins-the-network-advertising-initiative-nai-as-100th-member/> > <http://placeiq.com/2016/04/13/placeiq-joins-the-network-advertising-initiative-nai-as-100th-member/> > <http://placeiq.com/2016/03/08/measuring-addressable-tv-campaigns-is-now-possible/> > <http://placeiq.com/2015/12/18/accuracy-vs-precision-in-location-data-mma-webinar/> > <http://placeiq.com/2015/12/18/accuracy-vs-precision-in-location-data-mma-webinar/> > <http://placeiq.com/2015/12/18/accuracy-vs-precision-in-location-data-mma-webinar/> > <http://placeiq.com/2015/12/18/accuracy-vs-precision-in-location-data-mma-webinar/> > <http://www.placeiq.com/2015/05/26/placeiq-named-winner-of-prestigious-2015-oracle-data-cloud-activate-award/> > > On Sat, May 06, 2017 at 6:47 AM Pietro Pugni <Pietro Pugni > <pietro+pugni+%3cpietro.pu...@gmail.com%3E>> wrote: > >> Hi all, >> I am facing a strange issue on two different machines that acts like >> servers. Each of them runs an instance of Zeppelin installed as a system.d >> service. >> The configuration is: >> - Ubuntu Server 16.04.2 LTS >> - Spark 2.1.0 >> - Microsoft Open R 3.3.2 >> - Zeppelin 0.7.1 (0.7.0 gave the same problems) >> >> zeppelin-env.sh has the following settings: >> export SPARK_HOME="/spark/home/directory" >> >> spark-env.sh has the following settings: >> export LANG="en_US" >> export SPARK_DAEMON_JAVA_OPTS+=" -Dspark.local.dir=/some/dir >> -Dspark.eventLog.dir=/some/dir/spark-events -Dhadoop.tmp.dir=/some/dir" >> export _JAVA_OPTIONS+=" -Djava.io.tmpdir=/some/dir" >> >> spark-defaults.conf is set as: >> spark.executor.memory 21g >> spark.driver.memory 21g >> spark.python.worker.memory 4g >> spark.sql.autoBroadcastJoinThreshold 0 >> >> I use Spark in stand-alone mode and it works perfectly. It also works >> correctly with Zeppelin but this is what happens: >> 1) Start zeppelin on the server using the command service zeppelin start >> 2) Connect to port 8080 using Mozilla Firefox from client >> 3) Insert username and password (I enabled Shiro authentication) >> 4) open a notebook >> 5) Execute the following code: >> %spark.r >> 2+2 >> 6) The code runs correctly and I can see that R is currently running as a >> process. >> 7) Repeat steps 2-5 after some time (let’s say 2 or 3 hours) and Zeppelin >> remains forever on “Running” or, if the elapsed time is higher (for example >> 1 day) since the last run, it returns “Error”. The >> “time-to-be-unresponsive” seems to be random and unpredictable. Also, R is >> not present in the list of running processes. Spark session remains active >> because I can access Spark UI from port 4040 and the application name is >> “Zeppelin”, so it’s the Spark instance created by Zeppelin. >> >> I observed that sometimes I can simply restart the interpreter from >> Zeppelin UI, but many other times it doesn’t work and I have to restart >> Zeppelin ( service zeppelin restart ). >> >> This issue afflicts both 0.7.0 and 0.7.1 but I haven’t tried with >> previous versions. It also happens if Zeppelin isn’t installed as a service. >> >> I can’t provide more detail because I can’t see any error or warning in >> the logs.. this is really strange. >> >> Thank you all. >> Kind regards >> Pietro Pugni >> > >