Whatever you are trying to do should work, Here is the modified WordCount Map
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { String line = value.toString(); JSONObject line_as_json = new JSONObject(line); String text = line_as_json.getString("text"); StringTokenizer tokenizer = new StringTokenizer(text); while (tokenizer.hasMoreTokens()) { word.set(tokenizer.nextToken()); context.write(word, one); } } Pramod N <http://atmachinelearner.blogspot.in> Bruce Wayne of web @machinelearner <https://twitter.com/machinelearner> -- On Thu, May 30, 2013 at 8:42 AM, Rahul Bhattacharjee < rahul.rec....@gmail.com> wrote: > Whatever you have mentioned Jamal should work.you can debug this. > > Thanks, > Rahul > > > On Thu, May 30, 2013 at 5:14 AM, jamal sasha <jamalsha...@gmail.com>wrote: > >> Hi, >> For some reason, this have to be in java :( >> I am trying to use org.json library, something like (in mapper) >> JSONObject jsn = new JSONObject(value.toString()); >> >> String text = (String) jsn.get("text"); >> StringTokenizer itr = new StringTokenizer(text); >> >> But its not working :( >> It would be better to get this thing properly but I wouldnt mind using a >> hack as well :) >> >> >> On Wed, May 29, 2013 at 4:30 PM, Michael Segel <michael_se...@hotmail.com >> > wrote: >> >>> Yeah, >>> I have to agree w Russell. Pig is definitely the way to go on this. >>> >>> If you want to do it as a Java program you will have to do some work on >>> the input string but it too should be trivial. >>> How formal do you want to go? >>> Do you want to strip it down or just find the quote after the text part? >>> >>> >>> On May 29, 2013, at 5:13 PM, Russell Jurney <russell.jur...@gmail.com> >>> wrote: >>> >>> Seriously consider Pig (free answer, 4 LOC): >>> >>> my_data = LOAD 'my_data.json' USING >>> com.twitter.elephantbird.pig.load.JsonLoader() AS json:map[]; >>> words = FOREACH my_data GENERATE $0#'author' as author, >>> FLATTEN(TOKENIZE($0#'text')) as word; >>> word_counts = FOREACH (GROUP words BY word) GENERATE group AS word, >>> COUNT_STAR(words) AS word_count; >>> STORE word_counts INTO '/tmp/word_counts.txt'; >>> >>> It will be faster than the Java you'll likely write. >>> >>> >>> On Wed, May 29, 2013 at 2:54 PM, jamal sasha <jamalsha...@gmail.com>wrote: >>> >>>> Hi, >>>> I am stuck again. :( >>>> My input data is in hdfs. I am again trying to do wordcount but there >>>> is slight difference. >>>> The data is in json format. >>>> So each line of data is: >>>> >>>> {"author":"foo", "text": "hello"} >>>> {"author":"foo123", "text": "hello world"} >>>> {"author":"foo234", "text": "hello this world"} >>>> >>>> So I want to do wordcount for text part. >>>> I understand that in mapper, I just have to pass this data as json and >>>> extract "text" and rest of the code is just the same but I am trying to >>>> switch from python to java hadoop. >>>> How do I do this. >>>> Thanks >>>> >>> >>> >>> >>> -- >>> Russell Jurney twitter.com/rjurney russell.jur...@gmail.com datasyndrome >>> .com >>> >>> >>> >> >