>From my understanding, we should copy the file into another folder and move
to source folder after copy is finished, otherwise we will read the
half-copied data or meet the issue as you mentioned above.

On Wed, May 18, 2016 at 8:32 PM, Ted Yu <yuzhih...@gmail.com> wrote:

> The following should handle the situation you encountered:
>
> diff --git
> a/streaming/src/main/scala/org/apache/spark/streaming/dstream/FileInputDStream.scala
> b/streaming/src/main/scala/org/apache/spark/streaming/dstream/FileInputDStream.sca
> index ed93058..f79420b 100644
> ---
> a/streaming/src/main/scala/org/apache/spark/streaming/dstream/FileInputDStream.scala
> +++
> b/streaming/src/main/scala/org/apache/spark/streaming/dstream/FileInputDStream.scala
> @@ -266,6 +266,10 @@ class FileInputDStream[K, V, F <: NewInputFormat[K,
> V]](
>        logDebug(s"$pathStr already considered")
>        return false
>      }
> +    if (pathStr.endsWith("._COPYING_")) {
> +      logDebug(s"$pathStr is being copied")
> +      return false
> +    }
>      logDebug(s"$pathStr accepted with mod time $modTime")
>      return true
>    }
>
> On Wed, May 18, 2016 at 2:06 AM, Yogesh Vyas <informy...@gmail.com> wrote:
>
>> Hi,
>> I am trying to read the files in a streaming way using Spark
>> Streaming. For this I am copying files from my local folder to the
>> source folder from where spark reads the file.
>> After reading and printing some of the files, it gives the following
>> error:
>>
>> Caused by:
>> org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException):
>> File does not exist: /user/hadoop/file17.xml._COPYING_
>>
>> I guess the Spark Streaming file is trying to read the file before it
>> gets copied completely.
>>
>> Does anyone knows how to handle such type of exception?
>>
>> Regards,
>> Yogesh
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>>
>

Reply via email to