How to check whether the RDD is empty or not

2015-10-21 Thread diplomatic Guru
Hello All, I have a Spark Streaming job that should do some action only if the RDD is not empty. This can be done easily with the spark batch RDD as I could .take(1) and check whether it is empty or not. But this cannot been done in Spark Streaming DStrem JavaPairInputDStream

Re: How to check whether the RDD is empty or not

2015-10-21 Thread diplomatic Guru
I tried below code but still carrying out the action even though there is no new data. JavaPairInputDStream input = ssc.fileStream(iFolder, LongWritable.class,Text.class, TextInputFormat.class); if(input != null){ //do some action if it is not empty } On 21 October 2015 at

Re: How to check whether the RDD is empty or not

2015-10-21 Thread Tathagata Das
What do you mean by checking when a "DStream is empty"? DStream represents an endless stream of data, and at point of time checking whether it is empty or not does not make sense. FYI, there is RDD.isEmpty() On Wed, Oct 21, 2015 at 10:03 AM, diplomatic Guru wrote: >

Re: How to check whether the RDD is empty or not

2015-10-21 Thread Gerard Maas
As TD mentions, there's no such thing as an 'empty DStream'. Some intervals of a DStream could be empty, in which case the related RDD will be empty. This means that you should express such condition based on the RDD's of the DStream. Translated in code: dstream.foreachRDD{ rdd => if

Re: How to check whether the RDD is empty or not

2015-10-21 Thread diplomatic Guru
Tathagata, thank you for the response. I have two receivers in my Spark Stream job; 1 reads an endless stream of data from flume and the other reads data from HDFS directory. However, files do not get moved into HDFS frequently (let's say it gets moved every 10 minutes). This is where I need to