Foreachpartition callback is provided with Iterator by the Spark Frameowrk – while iterator.hasNext() ……
Also check whether this is not some sort of Python Spark API bug – Python seems to be the foster child here – Scala and Java are the darlings From: John Omernik [mailto:j...@omernik.com] Sent: Friday, June 5, 2015 4:08 PM To: user Subject: Spark Streaming for Each RDD - Exception on Empty Is there pythonic/sparkonic way to test for an empty RDD before using the foreachRDD? Basically I am using the Python example https://spark.apache.org/docs/latest/streaming-programming-guide.html to "put records somewhere" When I have data, it works fine, when I don't I get an exception. I am not sure about the performance implications of just throwing an exception every time there is no data, but can I just test before sending it? I did see one post mentioning look for take(1) from the stream to test for data, but I am not sure where I put that in this example... Is that in the lambda function? or somewhere else? Looking for pointers! Thanks! mydstream.foreachRDD(lambda rdd: rdd.foreachPartition(parseRDD)) Using this example code from the link above: def sendPartition(iter): connection = createNewConnection() for record in iter: connection.send(record) connection.close() dstream.foreachRDD(lambda rdd: rdd.foreachPartition(sendPartition))