Hi Jamal, I took your input and put it in sample wordcount program and it's working just fine and giving this output.
author 3 foo234 1 text 3 foo 1 foo123 1 hello 3 this 1 world 2 When we split using String[] words = input.split("\\W+"); it takes care of all non-alphanumeric characters. Thanks and Regards, Rishi Yadav On Wed, May 29, 2013 at 2:54 PM, jamal sasha <jamalsha...@gmail.com> wrote: > Hi, > I am stuck again. :( > My input data is in hdfs. I am again trying to do wordcount but there is > slight difference. > The data is in json format. > So each line of data is: > > {"author":"foo", "text": "hello"} > {"author":"foo123", "text": "hello world"} > {"author":"foo234", "text": "hello this world"} > > So I want to do wordcount for text part. > I understand that in mapper, I just have to pass this data as json and > extract "text" and rest of the code is just the same but I am trying to > switch from python to java hadoop. > How do I do this. > Thanks >