Did you move the file into "hdfs://helmhdfs/user/patcharee/cerdata/", or
write into it directly? `textFileStream` requires that files must be
written to the monitored directory by "moving" them from another location
within the same file system.

On Mon, Jan 25, 2016 at 6:30 AM, patcharee <patcharee.thong...@uni.no>
wrote:

> Hi,
>
> My streaming application is receiving data from file system and just
> prints the input count every 1 sec interval, as the code below:
>
> val sparkConf = new SparkConf()
> val ssc = new StreamingContext(sparkConf, Milliseconds(interval_ms))
> val lines = ssc.textFileStream(args(0))
> lines.count().print()
>
> The problem is sometimes the data received from scc.textFileStream is ONLY
> ONE line. But in fact there are multiple lines in the new file found in
> that interval. See log below which shows three intervals. In the 2nd
> interval, the new file is:
> hdfs://helmhdfs/user/patcharee/cerdata/datetime_19617.txt. This file
> contains 6288 lines. The ssc.textFileStream returns ONLY ONE line (the
> header).
>
> Any ideas/suggestions what the problem is?
>
>
> -----------------------------------------------------------------------------------------
> SPARK LOG
>
> -----------------------------------------------------------------------------------------
>
> 16/01/25 15:11:11 INFO FileInputDStream: Cleared 1 old files that were
> older than 1453731011000 ms: 1453731010000 ms
> 16/01/25 15:11:11 INFO FileInputDStream: Cleared 0 old files that were
> older than 1453731011000 ms:
> 16/01/25 15:11:12 INFO FileInputDStream: Finding new files took 4 ms
> 16/01/25 15:11:12 INFO FileInputDStream: New files at time 1453731072000
> ms:
> hdfs://helmhdfs/user/patcharee/cerdata/datetime_19616.txt
> -------------------------------------------
> Time: 1453731072000 ms
> -------------------------------------------
> 6288
>
> 16/01/25 15:11:12 INFO FileInputDStream: Cleared 1 old files that were
> older than 1453731012000 ms: 1453731011000 ms
> 16/01/25 15:11:12 INFO FileInputDStream: Cleared 0 old files that were
> older than 1453731012000 ms:
> 16/01/25 15:11:13 INFO FileInputDStream: Finding new files took 4 ms
> 16/01/25 15:11:13 INFO FileInputDStream: New files at time 1453731073000
> ms:
> hdfs://helmhdfs/user/patcharee/cerdata/datetime_19617.txt
> -------------------------------------------
> Time: 1453731073000 ms
> -------------------------------------------
> 1
>
> 16/01/25 15:11:13 INFO FileInputDStream: Cleared 1 old files that were
> older than 1453731013000 ms: 1453731012000 ms
> 16/01/25 15:11:13 INFO FileInputDStream: Cleared 0 old files that were
> older than 1453731013000 ms:
> 16/01/25 15:11:14 INFO FileInputDStream: Finding new files took 3 ms
> 16/01/25 15:11:14 INFO FileInputDStream: New files at time 1453731074000
> ms:
> hdfs://helmhdfs/user/patcharee/cerdata/datetime_19618.txt
> -------------------------------------------
> Time: 1453731074000 ms
> -------------------------------------------
> 6288
>
>
> Thanks,
> Patcharee
>

Reply via email to