>From my understanding, we should copy the file into another folder and move to source folder after copy is finished, otherwise we will read the half-copied data or meet the issue as you mentioned above.
On Wed, May 18, 2016 at 8:32 PM, Ted Yu <yuzhih...@gmail.com> wrote: > The following should handle the situation you encountered: > > diff --git > a/streaming/src/main/scala/org/apache/spark/streaming/dstream/FileInputDStream.scala > b/streaming/src/main/scala/org/apache/spark/streaming/dstream/FileInputDStream.sca > index ed93058..f79420b 100644 > --- > a/streaming/src/main/scala/org/apache/spark/streaming/dstream/FileInputDStream.scala > +++ > b/streaming/src/main/scala/org/apache/spark/streaming/dstream/FileInputDStream.scala > @@ -266,6 +266,10 @@ class FileInputDStream[K, V, F <: NewInputFormat[K, > V]]( > logDebug(s"$pathStr already considered") > return false > } > + if (pathStr.endsWith("._COPYING_")) { > + logDebug(s"$pathStr is being copied") > + return false > + } > logDebug(s"$pathStr accepted with mod time $modTime") > return true > } > > On Wed, May 18, 2016 at 2:06 AM, Yogesh Vyas <informy...@gmail.com> wrote: > >> Hi, >> I am trying to read the files in a streaming way using Spark >> Streaming. For this I am copying files from my local folder to the >> source folder from where spark reads the file. >> After reading and printing some of the files, it gives the following >> error: >> >> Caused by: >> org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): >> File does not exist: /user/hadoop/file17.xml._COPYING_ >> >> I guess the Spark Streaming file is trying to read the file before it >> gets copied completely. >> >> Does anyone knows how to handle such type of exception? >> >> Regards, >> Yogesh >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> >> >