>Am I correct in understanding then that Aggregate WordCount and WordCount do the same thing, apart from the fact that the Aggregate WordCount example uses the Aggregate framework of Hadoop? That's what I feel and the output of both are same as well. The description of both also seems to be saying that:
*aggregatewordcount*: An Aggregate based map/reduce program that counts the words in the input files. & *wordcount*: A map/reduce program that counts the words in the input files. BTW. I have created a Jira and raised a PR for this: https://issues.apache.org/jira/browse/MAPREDUCE-7376 Once it gets reviewed, you can try patching it or wait for 3.4.0 release(not anytime soon). Thanx... -Ayush On Tue, 3 May 2022 at 00:12, Pratyush Das <reik...@gmail.com> wrote: > Thanks! > > Am I correct in understanding then that Aggregate WordCount and WordCount > do the same thing, apart from the fact that the Aggregate WordCount example > uses the Aggregate framework of Hadoop? - as mentioned here in > https://stackoverflow.com/questions/24105117/how-to-execute-aggreagatewordcount-example-in-hadoop-which-uses-hadoop-aggregate#comment37203837_24105117 > > > On Mon, 2 May 2022 at 13:16, Ayush Saxena <ayush...@gmail.com> wrote: > >> Hi, >> I tried it too and it gave me a similar output. Looks like some bug with >> the code. The code seems to be there since stone age though... >> I tried a fix, it seems there was "." period missing while setting the >> conf and when retrieving we were trying to get it with the period. >> Have put the code here: >> >> https://github.com/ayushtkn/hadoop/commit/ab7da425e204903e867855b05b7c8fc2fbdd8b0e >> >> Patched it on top of trunk and gave it a try locally for your use case, >> seems post that output is correct. Will check and raise a MAPRED Jira to >> fix, If it gets reviewed & Committed you can either patch your hadoop >> distro or wait for the next release which would contain a fix. >> >> hadoop-3.4.0-SNAPSHOT % bin/hadoop jar >> share/hadoop/mapreduce/hadoop-mapreduce-examples-3.4.0-SNAPSHOT.jar >> aggregatewordcount >> /testData /testOut 1 textinputformat >> >> >> hadoop-3.4.0-SNAPSHOT % bin/hdfs dfs -cat /testOut/part-r-00000 >> >> >> >> Bye 1 >> >> Goodbye 1 >> >> Hadoop 2 >> >> Hello 2 >> >> World 2 >> >> >> >> > Does this mean that Aggregate WordCount is merely counting the number >> of files in the input directory? >> >> Not in an ideal situation, The JavaDoc says: *It reads the text input >> files, breaks each line into words and counts them. The output is a locally >> sorted list of words and the count of how often they occurred.* >> >> On Mon, 2 May 2022 at 10:23, Pratyush Das <reik...@gmail.com> wrote: >> >>> Hi, >>> >>> I had some questions about what the Aggregate Word Count example in the >>> hadoop-mapreduce-examples-3.3.1.jar actually does. >>> >>> This is how I executed the AggregateWordCount example - hadoop jar >>> hadoop-3.3.1/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.1.jar >>> aggregatewordcount /examples-input/wordcount/ /examples-output/wordcount/ 1 >>> textinputformat >>> >>> /examples-input/wordcount/ contains 2 files - wc01.txt and wc02.txt. >>> >>> These are the contents of wc01.txt: >>> Hello World Bye World >>> >>> These are the contents of wc02.txt: >>> Hello Hadoop Goodbye Hadoop >>> >>> The generated output file - /examples-output/wordcount/part-r-00000 >>> contains the following line: >>> record_count 2 >>> >>> I tried adding another file - wc03.txt which changed the content of the >>> generated file to: >>> record_count 3 >>> >>> Does this mean that Aggregate WordCount is merely counting the number of >>> files in the input directory? >>> >>> Regards, >>> >>> >>> -- >>> Pratyush Das >>> >> > > -- > Pratyush Das >