Hi, Thanks for this detail debugging.

At first, notebookserver doesn't have any clue for this symptom because
it's used between browser and zeppelin server.

I don't know why R has stoped unexpectedly. Is there any log related to R?
I'm not familiar with R actually.

BTW, I'll install R and test it in my local

On Tue, May 9, 2017 at 8:29 AM, Pietro Pugni <pietro.pu...@gmail.com> wrote:

> I repost this because it didn’t appear on the mailing list board.
>
> These are the step needed to reproduce the error and to track down the log
> message.
>
> 1) I started a brand new instance of zeppelin issuing:
> service zeppelin start
>
> and started a bash script that tracks down R processes activity.
> After running a simple R script from Zeppelin, the R interpreter process
> was started:
>
> Mon May  8 11:27:59 CEST 2017 >>> R started
>
> 2) I left the browser open and at 12:26:15 I closed the browser. Zeppelin
> tracked down the connection being closed:
> INFO [2017-05-08 12:26:15,879] ({qtp423031029-60}
> NotebookServer.java[onClose]:363) - Closed connection to 127.0.0.1 :
> 33798. (1001) null
>
> 3) At 13:08:00 R was closed. My script returned:
> Mon May  8 13:08:00 CEST 2017 >>> R stopped
>
> This is the output from the interpreter log file (deleted non-useful
> lines):
> INFO [2017-05-08 11:27:43,632] ({Thread-0} 
> RemoteInterpreterServer.java[run]:95)
> - Starting remote interpreter server on port 45227
> INFO [2017-05-08 11:27:44,600] ({pool-1-thread-3}
> RemoteInterpreterServer.java[createInterpreter]:190) - Instantiate
> interpreter org.apache.zeppelin.spark.SparkInterpreter
> INFO [2017-05-08 11:27:44,624] ({pool-1-thread-3}
> RemoteInterpreterServer.java[createInterpreter]:190) - Instantiate
> interpreter org.apache.zeppelin.spark.SparkSqlInterpreter
> INFO [2017-05-08 11:27:44,629] ({pool-1-thread-3}
> RemoteInterpreterServer.java[createInterpreter]:190) - Instantiate
> interpreter org.apache.zeppelin.spark.DepInterpreter
> INFO [2017-05-08 11:27:44,640] ({pool-1-thread-3}
> RemoteInterpreterServer.java[createInterpreter]:190) - Instantiate
> interpreter org.apache.zeppelin.spark.PySparkInterpreter
> INFO [2017-05-08 11:27:44,643] ({pool-1-thread-3}
> RemoteInterpreterServer.java[createInterpreter]:190) - Instantiate
> interpreter org.apache.zeppelin.spark.SparkRInterpreter
> ...
> INFO [2017-05-08 11:28:00,188] ({pool-2-thread-2} 
> SchedulerFactory.java[jobFinished]:137)
> - Job remoteInterpretJob_1494235664723 finished by scheduler
> org.apache.zeppelin.spark.SparkRInterpreter2097894179
> DEBUG [2017-05-08 11:28:00,819] ({pool-1-thread-3}
> RemoteInterpreterServer.java[resourcePoolGetAll]:911) - Request getAll
> from ZeppelinServer
> *DEBUG [2017-05-08 13:08:00,187] ({Exec Stream Pumper}
> InterpreterOutputStream.java[processLine]:72) - Interpreter output:Error in
> handleErrors(returnStatus, conn) : *
> *DEBUG [2017-05-08 13:08:00,188] ({Exec Stream Pumper}
> InterpreterOutputStream.java[processLine]:72) - Interpreter output:  No
> status is returned. Java SparkR backend might have failed.*
> *DEBUG [2017-05-08 13:08:00,188] ({Exec Stream Pumper}
> InterpreterOutputStream.java[processLine]:72) - Interpreter output:Calls:
> <Anonymous> -> invokeJava -> handleErrors*
> *DEBUG [2017-05-08 13:08:00,188] ({Exec Stream Pumper}
> InterpreterOutputStream.java[processLine]:72) - Interpreter
> output:Execution halted*
>
> This is the output from zeppelin log file (it didn't track the R
> interpreter failure):
> INFO [2017-05-08 11:28:00,221] ({pool-2-thread-2} 
> NotebookServer.java[afterStatusChange]:2056)
> - Job 20170506-145151_1585482989 is finished successfully, status: FINISHED
> INFO [2017-05-08 11:28:00,675] ({pool-2-thread-2} 
> SchedulerFactory.java[jobFinished]:137)
> - Job paragraph_1494075111996_-1250116940 finished by scheduler
> org.apache.zeppelin.interpreter.remote.RemoteInterpretershared_
> session2130846287
> *INFO [2017-05-08 12:26:15,879] ({qtp423031029-60}
> NotebookServer.java[onClose]:363) - Closed connection to 127.0.0.1 : 33798.
> (1001) null*
> INFO [2017-05-08 12:27:12,126] ({Thread-33} 
> AbstractValidatingSessionManager.java[validateSessions]:271)
> - Validating all active sessions...
> INFO [2017-05-08 12:27:12,126] ({Thread-33} 
> AbstractValidatingSessionManager.java[validateSessions]:304)
> - Finished session validation.  No sessions were stopped.
>
> Hope this helps.
> Any hints?
>
> Il giorno 08 mag 2017, alle ore 11:08, Pietro Pugni <
> pietro.pu...@gmail.com> ha scritto:
>
> I know for sure that R process gets killed (or quits) but don't know if
> its father process (interpreter.sh) gets killed too.
>
> I noticed that I can always restart the interpreter on 0.7.1 while
> sometimes it was impossible to do on 0.7.0 (I had to manually restart
> zeppelin service). Probably that JIRA improved the situation a little bit.
>
> Now I'm running a bash script that tracks start and stop time of R process
> in order to shed some light on this issue. I enabled DEBUG logging in log4j
> properties file.
>
>
> Il 6 mag 2017 4:43 PM, "Paul Brenner" <pbren...@placeiq.com> ha scritto:
>
>> Great work documenting repeatable steps for this hard to nail down
>> problem. I see similar problems running the spark (scala) interpreter but
>> haven’t been as systematic about hunting down the issue as you.
>>
>> I do wonder if this is related somehow to https://issues.apache.org/j
>> ira/browse/ZEPPELIN-1832
>> <https://share.polymail.io/v1/z/b/NTkwZGRlMzNiZmFi/Go00wlomvjABQNciq78PfdeRmR4K6c4M5l8KsTYGlks2sD4oe9jS7NYIkVZ2KKlntmyN0z2ZbiIFSP59SQpYL0hq_V6k3ZjCvIj_gDhLCD5s9K74YEQl1S5xOyCx0TK-xuhEd59t3p3nhZrhs1xXLJxUEM6PoX1EWAcJswdLQj6oNrNLeE-0uF9D4DZjlMlBWs_aYKvi14I21deKenrCDUCPJccm>
>> which just seems to have addressed killing off zombie processes but I’m
>> not sure it covered where zombie processes are coming from. Perhaps we need
>> to open a ticket for this?
>>
>> In the mean time if you don’t have the ability to restart zeppelin every
>> time you run into this process you can probably just kill the interpreter
>> process. I find myself doing that multiple times in an normal work day.
>>
>> <http://www.placeiq.com/> <http://www.placeiq.com/>
>> <http://www.placeiq.com/> Paul Brenner <https://twitter.com/placeiq>
>> <https://twitter.com/placeiq> <https://twitter.com/placeiq>
>> <https://www.facebook.com/PlaceIQ> <https://www.facebook.com/PlaceIQ>
>> <https://www.linkedin.com/company/placeiq>
>> <https://www.linkedin.com/company/placeiq>
>> DATA SCIENTIST
>> *(217) 390-3033 *
>>
>> <http://www.placeiq.com/2015/05/26/placeiq-named-winner-of-prestigious-2015-oracle-data-cloud-activate-award/>
>> <http://placeiq.com/2015/12/18/accuracy-vs-precision-in-location-data-mma-webinar/>
>> <http://placeiq.com/2015/12/18/accuracy-vs-precision-in-location-data-mma-webinar/>
>> <http://placeiq.com/2015/12/18/accuracy-vs-precision-in-location-data-mma-webinar/>
>> <http://placeiq.com/2015/12/18/accuracy-vs-precision-in-location-data-mma-webinar/>
>> <http://placeiq.com/2016/03/08/measuring-addressable-tv-campaigns-is-now-possible/>
>> <http://placeiq.com/2016/04/13/placeiq-joins-the-network-advertising-initiative-nai-as-100th-member/>
>> <http://placeiq.com/2016/04/13/placeiq-joins-the-network-advertising-initiative-nai-as-100th-member/>
>> <http://placeiq.com/2016/04/13/placeiq-joins-the-network-advertising-initiative-nai-as-100th-member/>
>> <http://placeiq.com/2016/04/13/placeiq-joins-the-network-advertising-initiative-nai-as-100th-member/>
>> <http://placeiq.com/2016/04/13/placeiq-joins-the-network-advertising-initiative-nai-as-100th-member/>
>> <http://pages.placeiq.com/Location-Data-Accuracy-Whitepaper-Download.html?utm_source=Signature&utm_medium=Email&utm_campaign=AccuracyWP>
>> <http://placeiq.com/2016/08/03/placeiq-bolsters-location-intelligence-platform-with-mastercard-insights/>
>> <http://placeiq.com/2016/10/26/the-making-of-a-location-data-industry-milestone/>[image:
>> PlaceIQ:Location Data Accuracy]
>> <http://placeiq.com/2016/12/07/placeiq-introduces-landmark-a-groundbreaking-offering-that-delivers-access-to-the-highest-quality-location-data-for-insights-that-fuel-limitless-business-decisions/>
>>
>> On Sat, May 06, 2017 at 6:47 AM Pietro Pugni <Pietro Pugni
>> <pietro+pugni+%3cpietro.pu...@gmail.com%3E>> wrote:
>>
>>> Hi all,
>>> I am facing a strange issue on two different machines that acts like
>>> servers. Each of them runs an instance of Zeppelin installed as a system.d
>>> service.
>>> The configuration is:
>>>  - Ubuntu Server 16.04.2 LTS
>>>  - Spark 2.1.0
>>>  - Microsoft Open R 3.3.2
>>>  - Zeppelin 0.7.1 (0.7.0 gave the same problems)
>>>
>>> zeppelin-env.sh has the following settings:
>>> export SPARK_HOME="/spark/home/directory"
>>>
>>> spark-env.sh has the following settings:
>>> export LANG="en_US"
>>> export SPARK_DAEMON_JAVA_OPTS+=" -Dspark.local.dir=/some/dir
>>> -Dspark.eventLog.dir=/some/dir/spark-events -Dhadoop.tmp.dir=/some/dir"
>>> export _JAVA_OPTIONS+=" -Djava.io.tmpdir=/some/dir"
>>>
>>> spark-defaults.conf is set as:
>>> spark.executor.memory           21g
>>> spark.driver.memory                     21g
>>> spark.python.worker.memory       4g
>>> spark.sql.autoBroadcastJoinThreshold    0
>>>
>>> I use Spark in stand-alone mode and it works perfectly. It also works
>>> correctly with Zeppelin but this is what happens:
>>> 1) Start zeppelin on the server using the command service zeppelin start
>>> 2) Connect to port 8080 using Mozilla Firefox from client
>>> 3) Insert username and password (I enabled Shiro authentication)
>>> 4) open a notebook
>>> 5) Execute the following code:
>>> %spark.r
>>> 2+2
>>> 6) The code runs correctly and I can see that R is currently running as
>>> a process.
>>> 7) Repeat steps 2-5 after some time (let’s say 2 or 3 hours) and
>>> Zeppelin remains forever on “Running” or, if the elapsed time is higher
>>> (for example 1 day) since the last run, it returns “Error”. The
>>> “time-to-be-unresponsive” seems to be random and unpredictable. Also, R is
>>> not present in the list of running processes. Spark session remains active
>>> because I can access Spark UI from port 4040 and the application name is
>>> “Zeppelin”, so it’s the Spark instance created by Zeppelin.
>>>
>>> I observed that sometimes I can simply restart the interpreter from
>>> Zeppelin UI, but many other times it doesn’t work and I have to restart
>>> Zeppelin ( service zeppelin restart ).
>>>
>>> This issue afflicts both 0.7.0 and 0.7.1 but I haven’t tried with
>>> previous versions. It also happens if Zeppelin isn’t installed as a service.
>>>
>>> I can’t provide more detail because I can’t see any error or warning in
>>> the logs.. this is really strange.
>>>
>>> Thank you all.
>>> Kind regards
>>>  Pietro Pugni
>>>
>>
>>
>
>


-- 
이종열, Jongyoul Lee, 李宗烈
http://madeng.net

Reply via email to