I have a basic spark streaming job that is watching a folder, processing any new file and updating a column family in cassandra using the new cassandra-spark-driver.
I think there is a problem with SparkStreamingContext.textFileStream... if I start my job in local mode with no files in the folder that is watched and then I copy a bunch of files, sometimes spark is continually processing those files again and again. I have noticed that it usually happens when spark doesn't detect all new files in one go... i.e. I copied 6 files and spark detected 3 of them as new and processed them; then it detected the other 3 as new and processed them. After it finished to process all 6 files, it detected again the first 3 files as new files and processed them... then the other 3... and again... and again... and again. Should I rise a JIRA issue? Regards, Luis