Hi,
I tried it too and it gave me a similar output. Looks like some bug with
the code. The code seems to be there since stone age though...
I tried a fix, it seems there was "." period missing while setting the conf
and when retrieving we were trying to get it with the period.
Have put the code here:
https://github.com/ayushtkn/hadoop/commit/ab7da425e204903e867855b05b7c8fc2fbdd8b0e

Patched it on top of trunk and gave it a try locally for your use case,
seems post that output is correct. Will check and raise a MAPRED Jira to
fix, If it gets reviewed & Committed you can either patch your hadoop
distro or wait for the next release which would contain a fix.

hadoop-3.4.0-SNAPSHOT % bin/hadoop jar
share/hadoop/mapreduce/hadoop-mapreduce-examples-3.4.0-SNAPSHOT.jar
aggregatewordcount
/testData /testOut 1 textinputformat


hadoop-3.4.0-SNAPSHOT % bin/hdfs dfs -cat /testOut/part-r-00000



Bye 1

Goodbye 1

Hadoop 2

Hello 2

World 2



> Does this mean that Aggregate WordCount is merely counting the number of
files in the input directory?

Not in an ideal situation, The JavaDoc says: *It reads the text input
files, breaks each line into words and counts them. The output is a locally
sorted list of words and the count of how often they occurred.*

On Mon, 2 May 2022 at 10:23, Pratyush Das <reik...@gmail.com> wrote:

> Hi,
>
> I had some questions about what the Aggregate Word Count example in the
> hadoop-mapreduce-examples-3.3.1.jar actually does.
>
> This is how I executed the AggregateWordCount example - hadoop jar
> hadoop-3.3.1/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.1.jar
> aggregatewordcount /examples-input/wordcount/ /examples-output/wordcount/ 1
> textinputformat
>
> /examples-input/wordcount/ contains 2 files - wc01.txt and wc02.txt.
>
> These are the contents of wc01.txt:
> Hello World Bye World
>
> These are the contents of wc02.txt:
> Hello Hadoop Goodbye Hadoop
>
> The generated output file - /examples-output/wordcount/part-r-00000
> contains the following line:
> record_count 2
>
> I tried adding another file - wc03.txt which changed the content of the
> generated file to:
> record_count 3
>
> Does this mean that Aggregate WordCount is merely counting the number of
> files in the input directory?
>
> Regards,
>
>
> --
> Pratyush Das
>

Reply via email to