I know for sure that R process gets killed (or quits) but don't know if its
father process (interpreter.sh) gets killed too.

I noticed that I can always restart the interpreter on 0.7.1 while
sometimes it was impossible to do on 0.7.0 (I had to manually restart
zeppelin service). Probably that JIRA improved the situation a little bit.

Now I'm running a bash script that tracks start and stop time of R process
in order to shed some light on this issue. I enabled DEBUG logging in log4j
properties file.


Il 6 mag 2017 4:43 PM, "Paul Brenner" <pbren...@placeiq.com> ha scritto:

> Great work documenting repeatable steps for this hard to nail down
> problem. I see similar problems running the spark (scala) interpreter but
> haven’t been as systematic about hunting down the issue as you.
>
> I do wonder if this is related somehow to https://issues.apache.org/
> jira/browse/ZEPPELIN-1832
> <https://share.polymail.io/v1/z/b/NTkwZGRlMzNiZmFi/Go00wlomvjABQNciq78PfdeRmR4K6c4M5l8KsTYGlks2sD4oe9jS7NYIkVZ2KKlntmyN0z2ZbiIFSP59SQpYL0hq_V6k3ZjCvIj_gDhLCD5s9K74YEQl1S5xOyCx0TK-xuhEd59t3p3nhZrhs1xXLJxUEM6PoX1EWAcJswdLQj6oNrNLeE-0uF9D4DZjlMlBWs_aYKvi14I21deKenrCDUCPJccm>
> which just seems to have addressed killing off zombie processes but I’m
> not sure it covered where zombie processes are coming from. Perhaps we need
> to open a ticket for this?
>
> In the mean time if you don’t have the ability to restart zeppelin every
> time you run into this process you can probably just kill the interpreter
> process. I find myself doing that multiple times in an normal work day.
>
> <http://www.placeiq.com/> <http://www.placeiq.com/>
> <http://www.placeiq.com/> Paul Brenner <https://twitter.com/placeiq>
> <https://twitter.com/placeiq> <https://twitter.com/placeiq>
> <https://www.facebook.com/PlaceIQ> <https://www.facebook.com/PlaceIQ>
> <https://www.linkedin.com/company/placeiq>
> <https://www.linkedin.com/company/placeiq>
> DATA SCIENTIST
> *(217) 390-3033 *
> [image: PlaceIQ:Location Data Accuracy]
> <http://placeiq.com/2016/12/07/placeiq-introduces-landmark-a-groundbreaking-offering-that-delivers-access-to-the-highest-quality-location-data-for-insights-that-fuel-limitless-business-decisions/>
> <http://placeiq.com/2016/10/26/the-making-of-a-location-data-industry-milestone/>
> <http://placeiq.com/2016/08/03/placeiq-bolsters-location-intelligence-platform-with-mastercard-insights/>
> <http://pages.placeiq.com/Location-Data-Accuracy-Whitepaper-Download.html?utm_source=Signature&utm_medium=Email&utm_campaign=AccuracyWP>
> <http://placeiq.com/2016/04/13/placeiq-joins-the-network-advertising-initiative-nai-as-100th-member/>
> <http://placeiq.com/2016/04/13/placeiq-joins-the-network-advertising-initiative-nai-as-100th-member/>
> <http://placeiq.com/2016/04/13/placeiq-joins-the-network-advertising-initiative-nai-as-100th-member/>
> <http://placeiq.com/2016/04/13/placeiq-joins-the-network-advertising-initiative-nai-as-100th-member/>
> <http://placeiq.com/2016/04/13/placeiq-joins-the-network-advertising-initiative-nai-as-100th-member/>
> <http://placeiq.com/2016/03/08/measuring-addressable-tv-campaigns-is-now-possible/>
> <http://placeiq.com/2015/12/18/accuracy-vs-precision-in-location-data-mma-webinar/>
> <http://placeiq.com/2015/12/18/accuracy-vs-precision-in-location-data-mma-webinar/>
> <http://placeiq.com/2015/12/18/accuracy-vs-precision-in-location-data-mma-webinar/>
> <http://placeiq.com/2015/12/18/accuracy-vs-precision-in-location-data-mma-webinar/>
> <http://www.placeiq.com/2015/05/26/placeiq-named-winner-of-prestigious-2015-oracle-data-cloud-activate-award/>
>
> On Sat, May 06, 2017 at 6:47 AM Pietro Pugni <Pietro Pugni
> <pietro+pugni+%3cpietro.pu...@gmail.com%3E>> wrote:
>
>> Hi all,
>> I am facing a strange issue on two different machines that acts like
>> servers. Each of them runs an instance of Zeppelin installed as a system.d
>> service.
>> The configuration is:
>>  - Ubuntu Server 16.04.2 LTS
>>  - Spark 2.1.0
>>  - Microsoft Open R 3.3.2
>>  - Zeppelin 0.7.1 (0.7.0 gave the same problems)
>>
>> zeppelin-env.sh has the following settings:
>> export SPARK_HOME="/spark/home/directory"
>>
>> spark-env.sh has the following settings:
>> export LANG="en_US"
>> export SPARK_DAEMON_JAVA_OPTS+=" -Dspark.local.dir=/some/dir
>> -Dspark.eventLog.dir=/some/dir/spark-events -Dhadoop.tmp.dir=/some/dir"
>> export _JAVA_OPTIONS+=" -Djava.io.tmpdir=/some/dir"
>>
>> spark-defaults.conf is set as:
>> spark.executor.memory           21g
>> spark.driver.memory                     21g
>> spark.python.worker.memory       4g
>> spark.sql.autoBroadcastJoinThreshold    0
>>
>> I use Spark in stand-alone mode and it works perfectly. It also works
>> correctly with Zeppelin but this is what happens:
>> 1) Start zeppelin on the server using the command service zeppelin start
>> 2) Connect to port 8080 using Mozilla Firefox from client
>> 3) Insert username and password (I enabled Shiro authentication)
>> 4) open a notebook
>> 5) Execute the following code:
>> %spark.r
>> 2+2
>> 6) The code runs correctly and I can see that R is currently running as a
>> process.
>> 7) Repeat steps 2-5 after some time (let’s say 2 or 3 hours) and Zeppelin
>> remains forever on “Running” or, if the elapsed time is higher (for example
>> 1 day) since the last run, it returns “Error”. The
>> “time-to-be-unresponsive” seems to be random and unpredictable. Also, R is
>> not present in the list of running processes. Spark session remains active
>> because I can access Spark UI from port 4040 and the application name is
>> “Zeppelin”, so it’s the Spark instance created by Zeppelin.
>>
>> I observed that sometimes I can simply restart the interpreter from
>> Zeppelin UI, but many other times it doesn’t work and I have to restart
>> Zeppelin ( service zeppelin restart ).
>>
>> This issue afflicts both 0.7.0 and 0.7.1 but I haven’t tried with
>> previous versions. It also happens if Zeppelin isn’t installed as a service.
>>
>> I can’t provide more detail because I can’t see any error or warning in
>> the logs.. this is really strange.
>>
>> Thank you all.
>> Kind regards
>>  Pietro Pugni
>>
>
>

Reply via email to