That looks to be related to Cloudera, I'm sorry I can't provide any useful advice, I've not used that distro. Probably best to ask on a Cloudera forum, user group or the Cloudera support.
From: Serega Sheypak<mailto:serega.shey...@gmail.com> Sent: Wednesday, 3 May 2017 3:11 AM To: David Howell<mailto:david.how...@zipmoney.com.au> Cc: users@zeppelin.apache.org<mailto:users@zeppelin.apache.org> Subject: Re: Can't run simple example with scala and spark SQL. Some non obvious syntax error in SQL Ah... sorry, so stupid. You are right. I've moved %sql select firstField, secondField, count(1) from lines group by firstField, secondField order by firstField, secondField to the next paragraph and it works! What does it mean? zeppelin-interpreter-spark-spark interpreter complain every second on: WARN [2017-05-02 17:05:42,396] ({dispatcher-event-loop-16} ScriptBasedMapping.java[runResolveCommand]:254) - Exception running /etc/hadoop/conf.cloudera.yarn7/topology.py 17.134.172.218 java.io.IOException: Cannot run program "/etc/hadoop/conf.cloudera.yarn7/topology.py" (in directory "/"): error=2, No such file or directory does spark really need it? What I'm supposed to do? 2017-05-02 15:16 GMT+02:00 David Howell <david.how...@zipmoney.com.au<mailto:david.how...@zipmoney.com.au>>: Hi Serega, I see this in the error log “error: ';' expected but ',' found.” Are you running the %sql in the same paragraph as the %spark? I don’t think that is supported. I think you have to shift the %sql to a new paragraph, you can then run the spark and then the sql separately. From: Serega Sheypak<mailto:serega.shey...@gmail.com> Sent: Tuesday, 2 May 2017 9:58 PM To: users@zeppelin.apache.org<mailto:users@zeppelin.apache.org> Subject: Can't run simple example with scala and spark SQL. Some non obvious syntax error in SQL Here is my sample notebook: %spark val linesText = sc.textFile("hdfs://cluster/user/me/lines.txt") case class Line(id:Long, firstField:String, secondField:String) val lines = linesText.map{ line => val splitted = line.split(" ") println("splitted => " + splitted) Line(splitted(0).toLong, splitted(1), splitted(2)) } lines.toDF().registerTempTable("lines") %sql select firstField, secondField, count(1) from lines group by firstField, secondField order by firstField, secondField 1. I can see that spark job was started on my YARN cluster. 2. It failed UI shows exception. Can't understand what do I do wrong: %sql select firstField, secondField, count(1) from lines group by firstField, secondField order by firstField, secondField ^ 3. There is suspicious output in zeppelin log: INFO [2017-05-02 11:50:02,846] ({pool-2-thread-8} SchedulerFactory.java[jobFinished]:137) - Job paragraph_1493724118696_868476558 finished by scheduler org.apache.zeppelin.interpreter.remote.RemoteInterpretershared_session1712472970 ERROR [2017-05-02 11:50:18,809] ({qtp1286783232-166} NotebookServer.java[onMessage]:380) - Can't handle message java.lang.NullPointerException at org.apache.zeppelin.socket.NotebookServer.addNewParagraphIfLastParagraphIsExecuted(NotebookServer.java:1713) at org.apache.zeppelin.socket.NotebookServer.persistAndExecuteSingleParagraph(NotebookServer.java:1741) at org.apache.zeppelin.socket.NotebookServer.runAllParagraphs(NotebookServer.java:1641) at org.apache.zeppelin.socket.NotebookServer.onMessage(NotebookServer.java:291) at org.apache.zeppelin.socket.NotebookSocket.onWebSocketText(NotebookSocket.java:59) at org.eclipse.jetty.websocket.common.events.JettyListenerEventDriver.onTextMessage(JettyListenerEventDriver.java:128) at org.eclipse.jetty.websocket.common.message.SimpleTextMessage.messageComplete(SimpleTextMessage.java:69) at org.eclipse.jetty.websocket.common.events.AbstractEventDriver.appendMessage(AbstractEventDriver.java:65) at org.eclipse.jetty.websocket.common.events.JettyListenerEventDriver.onTextFrame(JettyListenerEventDriver.java:122) at org.eclipse.jetty.websocket.common.events.AbstractEventDriver.incomingFrame(AbstractEventDriver.java:161) at org.eclipse.jetty.websocket.common.WebSocketSession.incomingFrame(WebSocketSession.java:309) at org.eclipse.jetty.websocket.common.extensions.ExtensionStack.incomingFrame(ExtensionStack.java:214) at org.eclipse.jetty.websocket.common.Parser.notifyFrame(Parser.java:220) at org.eclipse.jetty.websocket.common.Parser.parse(Parser.java:258) at org.eclipse.jetty.websocket.common.io<http://common.io>.AbstractWebSocketConnection.readParse(AbstractWebSocketConnection.java:632) at org.eclipse.jetty.websocket.common.io<http://common.io>.AbstractWebSocketConnection.onFillable(AbstractWebSocketConnection.java:480) at org.eclipse.jetty.io<http://org.eclipse.jetty.io>.AbstractConnection$2.run(AbstractConnection.java:544) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555) at java.lang.Thread.run(Thread.java:745) INFO [2017-05-02 11:50:18,811] ({pool-2-thread-14} SchedulerFactory.java[jobStarted]:131) - Job paragraph_1493724118696_868476558 started by scheduler org.apache.zeppelin.interpreter.remote.RemoteInterpretershared_session1712472970 INFO [2017-05-02 11:50:18,812] ({pool-2-thread-14} Paragraph.java[jobRun]:363) - run paragraph 20170502-112158_458502255 using spark org.apache.zeppelin.interpreter.LazyOpenInterpreter@24ca4045 WARN [2017-05-02 11:50:19,810] ({pool-2-thread-14} NotebookServer.java[afterStatusChange]:2162) - Job 20170502-112158_458502255 is finished, status: ERROR, exception: null, result: %text linesText: org.apache.spark.rdd.RDD[String] = hdfs://path/to/my/file.txt MapPartitionsRDD[21] at textFile at <console>:27 defined class Line lines: org.apache.spark.rdd.RDD[Line] = MapPartitionsRDD[22] at map at <console>:31 warning: there was one deprecation warning; re-run with -deprecation for details <console>:1: error: ';' expected but ',' found. %sql select firstField, secondField, count(1) from lines group by firstField, secondField order by firstField, secondField 4. Interpreter log is also confusing: INFO [2017-05-02 11:41:52,706] ({pool-2-thread-2} Logging.scala[logInfo]:54) - Warehouse location for Hive client (version 1.1.0) is file:/spark-warehouse INFO [2017-05-02 11:41:52,983] ({pool-2-thread-2} PerfLogger.java[PerfLogBegin]:122) - <PERFLOG method=create_database from=org.apache.hadoop.hive.metastore.RetryingHMSHandler> INFO [2017-05-02 11:41:52,984] ({pool-2-thread-2} HiveMetaStore.java[logInfo]:795) - 0: create_database: Database(name:default, description:default database, locationUri:file:/spark-warehouse, parameters:{}) INFO [2017-05-02 11:41:52,984] ({pool-2-thread-2} HiveMetaStore.java[logAuditEvent]:388) - ugi=zblenessy ip=unknown-ip-addr cmd=create_database: Database(name:default, description:default database, locationUri:file:/spark-warehouse, parameters:{}) ERROR [2017-05-02 11:41:52,992] ({pool-2-thread-2} RetryingHMSHandler.java[invokeInternal]:189) - AlreadyExistsException(message:Database default already exists) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_database(HiveMetaStore.java:944) What should I do with failed metastore DB creation? Is it fine? I'm stuck, can you give me some ideas please?