Re: HdfsWordCount only counts some of the words

aka.fe2s Tue, 23 Sep 2014 23:27:07 -0700

I guess because this example is stateless, so it outputs counts only for
given RDD. Take a look at stateful word counter
StatefulNetworkWordCount.scala


On Wed, Sep 24, 2014 at 4:29 AM, SK <skrishna...@gmail.com> wrote:

>
> I execute it as follows:
>
> $SPARK_HOME/bin/spark-submit   --master <master url>  --class
> org.apache.spark.examples.streaming.HdfsWordCount
> target/scala-2.10/spark_stream_examples-assembly-1.0.jar  <hdfsdir>
>
> After I start the job, I add a new test file in hdfsdir. It is a large text
> file which I will not be able to copy here. But it  probably has at least
> 100 distinct words. But the streaming output has only about 5-6 words along
> with their counts as follows. I then stop the job after some time.
>
> Time ...
>
> (word1, cnt1)
> (word2, cnt2)
> (word3, cnt3)
> (word4, cnt4)
> (word5, cnt5)
>
> Time ...
>
> Time ...
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/HdfsWordCount-only-counts-some-of-the-words-tp14929p14967.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Re: HdfsWordCount only counts some of the words

Reply via email to