[ https://issues.apache.org/jira/browse/LIVY-541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16897280#comment-16897280 ]
Jeffrey E Rodriguez commented on LIVY-541: ------------------------------------------- Is this still being worked at?? > Multiple Livy servers submitting to Yarn results in LivyException: Session is > finished ... No YARN application is found with tag livy-session-197-uveqmqyj > in 300 seconds. Please check your cluster status, it is may be very busy > ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > > Key: LIVY-541 > URL: https://issues.apache.org/jira/browse/LIVY-541 > Project: Livy > Issue Type: Bug > Components: Server > Affects Versions: 0.5.0 > Environment: Hortonworks HDP 2.6 > Reporter: Hari Sekhon > Priority: Critical > > It appears Livy doesn't differentiate sessions properly in Yarn causing > errors when running multiple Livy servers behind a load balancer for HA / > performance scaling on the same Hadoop cluster. > Each livy server uses monotonically incrementing session IDs with a random > suffix but it appears that the random suffix isn't passed to Yarn which > results in the following errors on the Livy server which is further behind in > session numbers because it appears to find the session with the same number > has already finished (submitted earlier by a different user on another Livy > server as seen in Yarn RM UI): > {code:java} > org.apache.zeppelin.livy.LivyException: Session 197 is finished, appId: null, > log: [ at > org.apache.hadoop.fs.FileSystem$Cache.closeAll(FileSystem.java:2887), at > org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer.run(FileSystem.java:2904), > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511), > at java.util.concurrent.FutureTask.run(FutureTask.java:266), at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142), > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617), > at java.lang.Thread.run(Thread.java:748), > YARN Diagnostics: , java.lang.Exception: No YARN application is found with > tag livy-session-197-uveqmqyj in 300 seconds. Please check your cluster > status, it is may be very busy., > org.apache.livy.utils.SparkYarnApp.org$apache$livy$utils$SparkYarnApp$$getAppIdFromTag(SparkYarnApp.scala:182) > > org.apache.livy.utils.SparkYarnApp$$anonfun$1$$anonfun$4.apply(SparkYarnApp.scala:239) > > org.apache.livy.utils.SparkYarnApp$$anonfun$1$$anonfun$4.apply(SparkYarnApp.scala:236) > scala.Option.getOrElse(Option.scala:120) > org.apache.livy.utils.SparkYarnApp$$anonfun$1.apply$mcV$sp(SparkYarnApp.scala:236) > org.apache.livy.Utils$$anon$1.run(Utils.scala:94)] > at > org.apache.zeppelin.livy.BaseLivyInterpreter.createSession(BaseLivyInterpreter.java:300) > at > org.apache.zeppelin.livy.BaseLivyInterpreter.initLivySession(BaseLivyInterpreter.java:184) > at > org.apache.zeppelin.livy.LivySharedInterpreter.open(LivySharedInterpreter.java:57) > at > org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:69) > at > org.apache.zeppelin.livy.BaseLivyInterpreter.getLivySharedInterpreter(BaseLivyInterpreter.java:165) > at > org.apache.zeppelin.livy.BaseLivyInterpreter.open(BaseLivyInterpreter.java:139) > at > org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:69) > at > org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:493) > at org.apache.zeppelin.scheduler.Job.run(Job.java:175) > at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:139) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:748) > {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016)