I don’t believe that this explains my issue. Running the Scala REPL also keeps
a session alive for as long as the REPL is running. I’ve had REPLs open for
days (shhhhh don’t tell anyone) that have correspondingly kept sessions alive
for the same period of time with no problem. I only see this issue in zeppelin.
We run zeppelin on a server and allow multiple users to connect, each with
their own interpreters. We also find that zeppelin memory usage on the server
will steadily creep up over time. Executing sys.exit in a spark paragraph,
restarting the interpreter, and using yarn application -kill often will cause
zeppelin to end the related interpreter process but not always. So over time we
find that many zombie processes pile up and eat up resources.
The only way to keep on top of this is to regularly login to the zeppelin
server and kill zombie jobs. Here is a command that I’ve found helpful. When
you know that a specific user has no active zeppelin interpreters running then
execute the following:
ps aux | grep zeppelin | grep "2BSGYY7S8" | grep java | awk -F " "
'{print$2}' | xargs sudo -u yarn kill -9
where “2BSGYY7S8" is the interpreter id (found in interpreter.json) and “yarn”
is actually the name of the user that originally started zeppelin with:
zeppelin-daemon.sh start
To kill every interpreter except for a specific users just flip it around with:
ps aux | grep zeppelin | grep -v "2BSGYY7S8" |grep -v
"zeppelin.server.ZeppelinServer" | grep java | awk -F " " '{print$2}' | xargs
sudo -u yarn kill -9
If I do this every few days zeppelin keeps humming along pretty smoothly most
of the time.
http://www.placeiq.com/ http://www.placeiq.com/ http://www.placeiq.com/
Paul Brenner
https://twitter.com/placeiq https://twitter.com/placeiq
https://twitter.com/placeiq
https://www.facebook.com/PlaceIQ https://www.facebook.com/PlaceIQ
https://www.linkedin.com/company/placeiq
https://www.linkedin.com/company/placeiq
DATA SCIENTIST
(217) 390-3033
http://www.placeiq.com/2015/05/26/placeiq-named-winner-of-prestigious-2015-oracle-data-cloud-activate-award/
http://placeiq.com/2015/12/18/accuracy-vs-precision-in-location-data-mma-webinar/
http://placeiq.com/2015/12/18/accuracy-vs-precision-in-location-data-mma-webinar/
http://placeiq.com/2015/12/18/accuracy-vs-precision-in-location-data-mma-webinar/
http://placeiq.com/2015/12/18/accuracy-vs-precision-in-location-data-mma-webinar/
http://placeiq.com/2016/03/08/measuring-addressable-tv-campaigns-is-now-possible/
http://placeiq.com/2016/04/13/placeiq-joins-the-network-advertising-initiative-nai-as-100th-member/
http://placeiq.com/2016/04/13/placeiq-joins-the-network-advertising-initiative-nai-as-100th-member/
http://placeiq.com/2016/04/13/placeiq-joins-the-network-advertising-initiative-nai-as-100th-member/
http://placeiq.com/2016/04/13/placeiq-joins-the-network-advertising-initiative-nai-as-100th-member/
http://placeiq.com/2016/04/13/placeiq-joins-the-network-advertising-initiative-nai-as-100th-member/
http://pages.placeiq.com/Location-Data-Accuracy-Whitepaper-Download.html?utm_source=Signature&utm_medium=Email&utm_campaign=AccuracyWP
http://placeiq.com/2016/08/03/placeiq-bolsters-location-intelligence-platform-with-mastercard-insights/
http://placeiq.com/2016/10/26/the-making-of-a-location-data-industry-milestone/
http://placeiq.com/2016/12/07/placeiq-introduces-landmark-a-groundbreaking-offering-that-delivers-access-to-the-highest-quality-location-data-for-insights-that-fuel-limitless-business-decisions/
On Fri, Feb 17, 2017 at 3:23 PM "[email protected]"
<
mailto:
> wrote:
a, pre, code, a:link, body { word-wrap: break-word !important; }
body { line-height: 1.5; }blockquote { margin-top: 0px; margin-bottom: 0px;
margin-left: 0.5em; }div.foxdiv20170217121700549974 { }body { font-size:
10.5pt; font-family: 'Segoe UI'; color: rgb(0, 0, 0); line-height: 1.5; }
The problem could be not only the resource, but the session. If you run a chunk
of spark code and you should see the a running application in the spark UI, but
in your code if you shut it down after the job is finished, then on the spark
UI you will see the hob is finished. Within zeppelin, each job will start the
spark session only once(different interpreter mode could be set if you want
notebooks to share the session or not), if you closed it ,it will never restart
it again. The only way to get the same code work again is to restat the
interpreter or restart zeppelin. I`m not sure if I explain clearly, but hope it
could help
From:
mailto:[email protected]
Date:
2017-02-17 12:14
To:
mailto:[email protected]
Subject:
Re: Re: Zeppelin unable to respond after some time
I’ve definitely had this problem with jobs that don’t take all the resources on
the cluster. Also, my experience matches what others have reported: just
restarting zeppelin and re-runing the stuck paragraph solves the issue.
I’ve also experienced this problem with for loops. Some for loops which write
to disk but absolutely don’t have any variables that are increasing in size
will hang in Zeppelin. If I run the exact same code in the scala REPL it goes
through without problem.
http://www.placeiq.com/ http://www.placeiq.com/ http://www.placeiq.com/
Paul Brenner
https://twitter.com/placeiq https://twitter.com/placeiq
https://twitter.com/placeiq
https://www.facebook.com/PlaceIQ https://www.facebook.com/PlaceIQ
https://www.linkedin.com/company/placeiq
https://www.linkedin.com/company/placeiq
DATA SCIENTIST
(217) 390-3033
http://www.placeiq.com/2015/05/26/placeiq-named-winner-of-prestigious-2015-oracle-data-cloud-activate-award/
http://placeiq.com/2015/12/18/accuracy-vs-precision-in-location-data-mma-webinar/
http://placeiq.com/2015/12/18/accuracy-vs-precision-in-location-data-mma-webinar/
http://placeiq.com/2015/12/18/accuracy-vs-precision-in-location-data-mma-webinar/
http://placeiq.com/2015/12/18/accuracy-vs-precision-in-location-data-mma-webinar/
http://placeiq.com/2016/03/08/measuring-addressable-tv-campaigns-is-now-possible/
http://placeiq.com/2016/04/13/placeiq-joins-the-network-advertising-initiative-nai-as-100th-member/
http://placeiq.com/2016/04/13/placeiq-joins-the-network-advertising-initiative-nai-as-100th-member/
http://placeiq.com/2016/04/13/placeiq-joins-the-network-advertising-initiative-nai-as-100th-member/
http://placeiq.com/2016/04/13/placeiq-joins-the-network-advertising-initiative-nai-as-100th-member/
http://placeiq.com/2016/04/13/placeiq-joins-the-network-advertising-initiative-nai-as-100th-member/
http://pages.placeiq.com/Location-Data-Accuracy-Whitepaper-Download.html?utm_source=Signature&utm_medium=Email&utm_campaign=AccuracyWP
http://placeiq.com/2016/08/03/placeiq-bolsters-location-intelligence-platform-with-mastercard-insights/
http://placeiq.com/2016/10/26/the-making-of-a-location-data-industry-milestone/
http://placeiq.com/2016/12/07/placeiq-introduces-landmark-a-groundbreaking-offering-that-delivers-access-to-the-highest-quality-location-data-for-insights-that-fuel-limitless-business-decisions/
On Fri, Feb 17, 2017 at 2:12 PM "[email protected]"
<
mailto:
> wrote:
I have solved the similar issue before. You should check on spark UI and
probably you will see your single job is taking all the resources. Therefore
further job that submitting to the same cluster will just hang on there. When
you restart zeppelin then the old job is killed and all the resource it took
will be released
[email protected]
From:
mailto:[email protected]
Date:
2017-02-17 02:29
To:
mailto:[email protected]
Subject:
Re: Zeppelin unable to respond after some time
Yes happens with r and spark codes frequently
On Feb 17, 2017 3:25 PM, "小野圭二" <
mailto:[email protected]
> wrote:
yes, almost every time.
There are not any special operations.
Just run the tutorial demos.
>From my feeling, it happens in R demo frequently.
2017-02-17 18:50 GMT+09:00 Jeff Zhang
<
mailto:[email protected]
>
:
Is it easy to reproduce it ?
小野圭二 <
mailto:[email protected]
>于2017年2月17
日周五 下午5:47写道:
I am facing on the same issue now.
2017-02-17 18:25 GMT+09:00 RUSHIKESH RAUT
<
mailto:[email protected]
>
:
Hi all,
I am facing a issue while using Zeppelin. I am trying to load some data(not
that big data) into Zeppelin and then build some visualization on it. The
problem is that when I try to run the code first time it's working but after
some time the same code doesn't work. It remains in running state on gui, but
no logs are generated in Zeppelin logs. Also all further tasks are hanging in
pending state.
As soon as I restart Zeppelin it works. So I am guessing it's some memory
issue. I have read that Zeppelin stores the data in memory so it is possible
that it runs out of memory after some time.
How do I debug this issue? How much is the default memory that Zeppelin takes
at start? Also is there any way that I can run Zeppelin with specified memory
so that I can start the process with more memory. Because it doesn't make sense
to restart Zeppelin after every half hour
Thanks,
Rushikesh Raut