Re: Re: Zeppelin unable to respond after some time

moon soo Lee Fri, 17 Feb 2017 19:39:28 -0800

Hi,

Download 0.7.0 -> Run R tutorial notebook repeatedly


will reproduce the problem? Otherwise, can someone clarify instruction to
reproduce the problem?


Thanks,
moon

On Sat, Feb 18, 2017 at 5:45 AM xyun...@simuwell.com <xyun...@simuwell.com>
wrote:

> Within the Scala REPL everything is working fine. Even you application
> session is down but you run the same code again you will see there is a new
> job getting created. But zeppelin has such a problem. you can do a test.
> run a notebook and get the job finished at the end of the notebook. Then
> re-run the same notebook again then it will get stucked. Running spark code
> on Scala REPL and Zeppelin are different
>
>
> *From:* Paul Brenner <pbren...@placeiq.com>
> *Date:* 2017-02-17 12:37
>
> *To:* users <users@zeppelin.apache.org>
> *Subject:* Re: Re: Zeppelin unable to respond after some time
>
> I don’t believe that this explains my issue. Running the Scala REPL also
> keeps a session alive for as long as the REPL is running. I’ve had REPLs
> open for days (shhhhh don’t tell anyone) that have correspondingly kept
> sessions alive for the same period of time with no problem. I only see this
> issue in zeppelin.
>
> We run zeppelin on a server and allow multiple users to connect, each with
> their own interpreters. We also find that zeppelin memory usage on the
> server will steadily creep up over time. Executing sys.exit in a spark
> paragraph, restarting the interpreter, and using yarn application -kill
> often will cause zeppelin to end the related interpreter process but not
> always. So over time we find that many zombie processes pile up and eat up
> resources.
>
> The only way to keep on top of this is to regularly login to the zeppelin
> server and kill zombie jobs. Here is a command that I’ve found helpful.
> When you know that a specific user has no active zeppelin interpreters
> running then execute the following:
>
>  ps aux | grep zeppelin | grep  "2BSGYY7S8" | grep java | awk -F " "
> '{print$2}' | xargs sudo -u yarn kill -9
>
> where “2BSGYY7S8" is the interpreter id (found in interpreter.json) and
> “yarn” is actually the name of the user that originally started zeppelin
> with:
>  zeppelin-daemon.sh start
>
> To kill every interpreter except for a specific users just flip it around
> with:
> ps aux | grep zeppelin | grep -v  "2BSGYY7S8" |grep -v
> "zeppelin.server.ZeppelinServer"  | grep java | awk -F " " '{print$2}' |
> xargs sudo -u yarn kill -9
>
>
> If I do this every few days zeppelin keeps humming along pretty smoothly
> most of the time.
> <http://www.placeiq.com/> <http://www.placeiq.com/>
> <http://www.placeiq.com/> Paul Brenner <https://twitter.com/placeiq>
> <https://twitter.com/placeiq> <https://twitter.com/placeiq>
> <https://www.facebook.com/PlaceIQ> <https://www.facebook.com/PlaceIQ>
> <https://www.linkedin.com/company/placeiq>
> <https://www.linkedin.com/company/placeiq>
> DATA SCIENTIST
> *(217) 390-3033 <(217)%20390-3033> *
>
> <http://www.placeiq.com/2015/05/26/placeiq-named-winner-of-prestigious-2015-oracle-data-cloud-activate-award/>
> <http://placeiq.com/2015/12/18/accuracy-vs-precision-in-location-data-mma-webinar/>
> <http://placeiq.com/2015/12/18/accuracy-vs-precision-in-location-data-mma-webinar/>
> <http://placeiq.com/2015/12/18/accuracy-vs-precision-in-location-data-mma-webinar/>
> <http://placeiq.com/2015/12/18/accuracy-vs-precision-in-location-data-mma-webinar/>
> <http://placeiq.com/2016/03/08/measuring-addressable-tv-campaigns-is-now-possible/>
> <http://placeiq.com/2016/04/13/placeiq-joins-the-network-advertising-initiative-nai-as-100th-member/>
> <http://placeiq.com/2016/04/13/placeiq-joins-the-network-advertising-initiative-nai-as-100th-member/>
> <http://placeiq.com/2016/04/13/placeiq-joins-the-network-advertising-initiative-nai-as-100th-member/>
> <http://placeiq.com/2016/04/13/placeiq-joins-the-network-advertising-initiative-nai-as-100th-member/>
> <http://placeiq.com/2016/04/13/placeiq-joins-the-network-advertising-initiative-nai-as-100th-member/>
> <http://pages.placeiq.com/Location-Data-Accuracy-Whitepaper-Download.html?utm_source=Signature&utm_medium=Email&utm_campaign=AccuracyWP>
> <http://placeiq.com/2016/08/03/placeiq-bolsters-location-intelligence-platform-with-mastercard-insights/>
> <http://placeiq.com/2016/10/26/the-making-of-a-location-data-industry-milestone/>[image:
> PlaceIQ:Location Data Accuracy]
> <http://placeiq.com/2016/12/07/placeiq-introduces-landmark-a-groundbreaking-offering-that-delivers-access-to-the-highest-quality-location-data-for-insights-that-fuel-limitless-business-decisions/>
>
> On Fri, Feb 17, 2017 at 3:23 PM "xyun...@simuwell.com" 
> <">"xyun...@simuwell.com"
> > wrote:
>
> The problem could be not only the resource, but the session. If you run a
> chunk of spark code and you should see the a running application in the
> spark UI, but in your code if you shut it down after the job is finished,
> then on the spark UI you will see the hob is finished. Within zeppelin,
> each job will start the spark session only once(different interpreter mode
> could be set if you want notebooks to share the session or not), if you
> closed it ,it will never restart it again. The only way to get the same
> code work again is to restat the interpreter or restart zeppelin. I`m not
> sure if I explain clearly, but hope it could help
>
>
>
> From: Paul Brenner <pbren...@placeiq.com>
>
> *Date:* 2017-02-17 12:14
> *To:* users <users@zeppelin.apache.org>
> *Subject:* Re: Re: Zeppelin unable to respond after some time
> I’ve definitely had this problem with jobs that don’t take all the
> resources on the cluster. Also, my experience matches what others have
> reported: just restarting zeppelin and re-runing the stuck paragraph solves
> the issue.
>
> I’ve also experienced this problem with for loops. Some for loops which
> write to disk but absolutely don’t have any variables that are increasing
> in size will hang in Zeppelin. If I run the exact same code in the scala
> REPL it goes through without problem.
>
>
>
> <http://www.placeiq.com/> <http://www.placeiq.com/>
> <http://www.placeiq.com/> Paul Brenner <https://twitter.com/placeiq>
> <https://twitter.com/placeiq> <https://twitter.com/placeiq>
> <https://www.facebook.com/PlaceIQ> <https://www.facebook.com/PlaceIQ>
> <https://www.linkedin.com/company/placeiq>
> <https://www.linkedin.com/company/placeiq>
> DATA SCIENTIST
> *(217) 390-3033 <(217)%20390-3033> *
>
> <http://www.placeiq.com/2015/05/26/placeiq-named-winner-of-prestigious-2015-oracle-data-cloud-activate-award/>
> <http://placeiq.com/2015/12/18/accuracy-vs-precision-in-location-data-mma-webinar/>
> <http://placeiq.com/2015/12/18/accuracy-vs-precision-in-location-data-mma-webinar/>
> <http://placeiq.com/2015/12/18/accuracy-vs-precision-in-location-data-mma-webinar/>
> <http://placeiq.com/2015/12/18/accuracy-vs-precision-in-location-data-mma-webinar/>
> <http://placeiq.com/2016/03/08/measuring-addressable-tv-campaigns-is-now-possible/>
> <http://placeiq.com/2016/04/13/placeiq-joins-the-network-advertising-initiative-nai-as-100th-member/>
> <http://placeiq.com/2016/04/13/placeiq-joins-the-network-advertising-initiative-nai-as-100th-member/>
> <http://placeiq.com/2016/04/13/placeiq-joins-the-network-advertising-initiative-nai-as-100th-member/>
> <http://placeiq.com/2016/04/13/placeiq-joins-the-network-advertising-initiative-nai-as-100th-member/>
> <http://placeiq.com/2016/04/13/placeiq-joins-the-network-advertising-initiative-nai-as-100th-member/>
> <http://pages.placeiq.com/Location-Data-Accuracy-Whitepaper-Download.html?utm_source=Signature&utm_medium=Email&utm_campaign=AccuracyWP>
> <http://placeiq.com/2016/08/03/placeiq-bolsters-location-intelligence-platform-with-mastercard-insights/>
> <http://placeiq.com/2016/10/26/the-making-of-a-location-data-industry-milestone/>[image:
> PlaceIQ:Location Data Accuracy]
> <http://placeiq.com/2016/12/07/placeiq-introduces-landmark-a-groundbreaking-offering-that-delivers-access-to-the-highest-quality-location-data-for-insights-that-fuel-limitless-business-decisions/>
>
> On Fri, Feb 17, 2017 at 2:12 PM "xyun...@simuwell.com" 
> <">"xyun...@simuwell.com"
> > wrote:
>
> I have solved the similar issue before.  You should check on spark UI and
> probably you will see your single job is taking all the resources.
> Therefore further job that submitting to the same cluster will just hang on
> there. When you restart zeppelin then the old job is killed and all the
> resource it took will be released
>
> ------------------------------
> xyun...@simuwell.com
>
>
>
> From: RUSHIKESH RAUT <rushikeshraut...@gmail.com>
>
> *Date:* 2017-02-17 02:29
> *To:* users <users@zeppelin.apache.org>
> *Subject:* Re: Zeppelin unable to respond after some time
> Yes happens with r and spark codes frequently
>
> On Feb 17, 2017 3:25 PM, "小野圭二" <onoke...@gmail.com> wrote:
>
> yes, almost every time.
> There are not any special operations.
> Just run the tutorial demos.
> From my feeling, it happens in R demo frequently.
>
> 2017-02-17 18:50 GMT+09:00 Jeff Zhang <zjf...@gmail.com>:
>
>
> Is it easy to reproduce it ?
>
> 小野圭二 <onoke...@gmail.com>于2017年2月17日周五 下午5:47写道：
>
> I am facing on the same issue now.
>
> 2017-02-17 18:25 GMT+09:00 RUSHIKESH RAUT <rushikeshraut...@gmail.com>:
>
> Hi all,
>
> I am facing a issue while using Zeppelin. I am trying to load some
> data(not that big data) into Zeppelin and then build some visualization on
> it. The problem is that when I try to run the code first time it's working
> but after some time the same code doesn't work. It remains in running state
> on gui, but no logs are generated in Zeppelin logs. Also all further tasks
> are hanging in pending state.
> As soon as I restart  Zeppelin it works. So I am guessing it's some memory
> issue. I have read that Zeppelin stores the data in memory so it is
> possible that it runs out of memory after some time.
> How do I debug this issue? How much is the default memory that Zeppelin
> takes at start? Also is there any way that I can run Zeppelin with
> specified memory so that I can start the process with more memory. Because
> it doesn't make sense to restart Zeppelin after every half hour
>
> Thanks,
> Rushikesh Raut
>
>
>
>

Re: Re: Zeppelin unable to respond after some time

Reply via email to