If you look at the code for HdfsWordCount, you see it calls print(),
which defaults to print 10 elements from each RDD. If you are just
talking about the console output, then it is not expected to print all
words to begin with.

On Wed, Sep 24, 2014 at 2:29 AM, SK <skrishna...@gmail.com> wrote:
>
> I execute it as follows:
>
> $SPARK_HOME/bin/spark-submit   --master <master url>  --class
> org.apache.spark.examples.streaming.HdfsWordCount
> target/scala-2.10/spark_stream_examples-assembly-1.0.jar  <hdfsdir>
>
> After I start the job, I add a new test file in hdfsdir. It is a large text
> file which I will not be able to copy here. But it  probably has at least
> 100 distinct words. But the streaming output has only about 5-6 words along
> with their counts as follows. I then stop the job after some time.
>
> Time ...
>
> (word1, cnt1)
> (word2, cnt2)
> (word3, cnt3)
> (word4, cnt4)
> (word5, cnt5)
>
> Time ...
>
> Time ...
>
>
>
>
> --
> View this message in context: 
> http://apache-spark-user-list.1001560.n3.nabble.com/HdfsWordCount-only-counts-some-of-the-words-tp14929p14967.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to