Piotr Nestorow created ZEPPELIN-2156:
----------------------------------------

             Summary: Paragraph with PySpark streaming - running job cannot be 
canceled
                 Key: ZEPPELIN-2156
                 URL: https://issues.apache.org/jira/browse/ZEPPELIN-2156
             Project: Zeppelin
          Issue Type: Bug
          Components: pySpark
    Affects Versions: 0.7.0
         Environment: Linux Ubuntu
            Reporter: Piotr Nestorow


In a  'spark.pyspark' paragraph a StreamingContext to a Kafka stream is 
created. 

The paragraph is started and while the job is running the spark context 
produces correct output from the code.

The problem is the job cannot be stopped in the Zeppelin web interface.

Installed Kafka version: kafka_2.11-0.8.2.2
Spark Kafka jar: spark-streaming-kafka-0-8_2.11-2.1.0.jar
Zeppelin: zeppelin-0.7.0-bin-all

Tried:
1. Paragraph Cancel ( || button ) has no effect.

2. Zeppelin Job view Stop All has no effect

3. Another paragraph with 
%spark.pyspark
ssc.stop(stopSparkContext=false, stopGracefully=true)

is started by stays in 'Pending'

4. Restarting the 'spark' interpreter stops the job


The example logic:

%spark.pyspark
import sys
import json

from pyspark import SparkContext
from pyspark.streaming import StreamingContext(II())
from pyspark.streaming.kafka import KafkaUtils

zkQuorum, topic, interval = ('localhost:2181', 'airport', 60)

ssc = StreamingContext(sc, interval)

kvs = KafkaUtils.createStream(ssc, zkQuorum, "spark-streaming-consumer", 
{topic: 1})

parsed = kvs.map(lambda (k, v): json.loads(v))
summed = parsed.\
                filter(lambda event: 'kind' in event and 
event['kind']=='gate').\
                map(lambda event: ('count_all', 
int(event['value']['passengers']))).\
                reduceByKey(lambda x,y: x + y).\
                map(lambda x: {'sum': x[0], "passengers": x[1]})
summed.pprint()

ssc.start()
ssc.awaitTermination()



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to