I have figured it out in the meantime - simply when moving file on HDFS it preserves its time stamp and on the other hand the spark filestream adapter seems to care as much about filenames as timestamps - hence NEW files with OLD time stamps will NOT be processed - yuk
The hack you can use is to a) copy the required file in a temp location and then b) move it from there to the dir monitored by spark filestream - this will ensure it is with recent timestamp -----Original Message----- From: Evo Eftimov [mailto:evo.efti...@isecc.com] Sent: Saturday, May 2, 2015 5:09 PM To: user@spark.apache.org Subject: spark filestream problem it seems that on Spark Streaming 1.2 the filestream API may have a bug - it doesn't detect new files when moving or renaming them on HDFS - only when copying them but that leads to a well known problem with .tmp files which get removed and make spark steraming filestream throw exception -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/spark-filestream-problem -tp22743.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org