Piotr Nestorow created ZEPPELIN-2156: ----------------------------------------
Summary: Paragraph with PySpark streaming - running job cannot be canceled Key: ZEPPELIN-2156 URL: https://issues.apache.org/jira/browse/ZEPPELIN-2156 Project: Zeppelin Issue Type: Bug Components: pySpark Affects Versions: 0.7.0 Environment: Linux Ubuntu Reporter: Piotr Nestorow In a 'spark.pyspark' paragraph a StreamingContext to a Kafka stream is created. The paragraph is started and while the job is running the spark context produces correct output from the code. The problem is the job cannot be stopped in the Zeppelin web interface. Installed Kafka version: kafka_2.11-0.8.2.2 Spark Kafka jar: spark-streaming-kafka-0-8_2.11-2.1.0.jar Zeppelin: zeppelin-0.7.0-bin-all Tried: 1. Paragraph Cancel ( || button ) has no effect. 2. Zeppelin Job view Stop All has no effect 3. Another paragraph with %spark.pyspark ssc.stop(stopSparkContext=false, stopGracefully=true) is started by stays in 'Pending' 4. Restarting the 'spark' interpreter stops the job The example logic: %spark.pyspark import sys import json from pyspark import SparkContext from pyspark.streaming import StreamingContext(II()) from pyspark.streaming.kafka import KafkaUtils zkQuorum, topic, interval = ('localhost:2181', 'airport', 60) ssc = StreamingContext(sc, interval) kvs = KafkaUtils.createStream(ssc, zkQuorum, "spark-streaming-consumer", {topic: 1}) parsed = kvs.map(lambda (k, v): json.loads(v)) summed = parsed.\ filter(lambda event: 'kind' in event and event['kind']=='gate').\ map(lambda event: ('count_all', int(event['value']['passengers']))).\ reduceByKey(lambda x,y: x + y).\ map(lambda x: {'sum': x[0], "passengers": x[1]}) summed.pprint() ssc.start() ssc.awaitTermination() -- This message was sent by Atlassian JIRA (v6.3.15#6346)