thanks for reply~~
I had solved the problem and found the reason, because I used the Master
node to upload files to hdfs, this action may take up a lot of Master's
network resources. When I changed to use another computer none of the
cluster to upload these files, it got the correct result.
A very crucial thing to remember when using file stream is that the files
must be written to the monitored directory atomically. That is when the
file system show the file in its listing, the file should not be appended /
updated after that. That often causes this kind of issues, as spark
when I put 200 png files to Hdfs , I found sparkStreaming counld detect 200
files , but the sum of rdd.count() is less than 200, always between 130 and
170, I don't know why...Is this a Bug?
PS: When I put 200 files in hdfs before streaming run , It get the correct
count and right result.
Here is