Whatever you have mentioned Jamal should work.you can debug this. Thanks, Rahul
On Thu, May 30, 2013 at 5:14 AM, jamal sasha <jamalsha...@gmail.com> wrote: > Hi, > For some reason, this have to be in java :( > I am trying to use org.json library, something like (in mapper) > JSONObject jsn = new JSONObject(value.toString()); > > String text = (String) jsn.get("text"); > StringTokenizer itr = new StringTokenizer(text); > > But its not working :( > It would be better to get this thing properly but I wouldnt mind using a > hack as well :) > > > On Wed, May 29, 2013 at 4:30 PM, Michael Segel > <michael_se...@hotmail.com>wrote: > >> Yeah, >> I have to agree w Russell. Pig is definitely the way to go on this. >> >> If you want to do it as a Java program you will have to do some work on >> the input string but it too should be trivial. >> How formal do you want to go? >> Do you want to strip it down or just find the quote after the text part? >> >> >> On May 29, 2013, at 5:13 PM, Russell Jurney <russell.jur...@gmail.com> >> wrote: >> >> Seriously consider Pig (free answer, 4 LOC): >> >> my_data = LOAD 'my_data.json' USING >> com.twitter.elephantbird.pig.load.JsonLoader() AS json:map[]; >> words = FOREACH my_data GENERATE $0#'author' as author, >> FLATTEN(TOKENIZE($0#'text')) as word; >> word_counts = FOREACH (GROUP words BY word) GENERATE group AS word, >> COUNT_STAR(words) AS word_count; >> STORE word_counts INTO '/tmp/word_counts.txt'; >> >> It will be faster than the Java you'll likely write. >> >> >> On Wed, May 29, 2013 at 2:54 PM, jamal sasha <jamalsha...@gmail.com>wrote: >> >>> Hi, >>> I am stuck again. :( >>> My input data is in hdfs. I am again trying to do wordcount but there is >>> slight difference. >>> The data is in json format. >>> So each line of data is: >>> >>> {"author":"foo", "text": "hello"} >>> {"author":"foo123", "text": "hello world"} >>> {"author":"foo234", "text": "hello this world"} >>> >>> So I want to do wordcount for text part. >>> I understand that in mapper, I just have to pass this data as json and >>> extract "text" and rest of the code is just the same but I am trying to >>> switch from python to java hadoop. >>> How do I do this. >>> Thanks >>> >> >> >> >> -- >> Russell Jurney twitter.com/rjurney russell.jur...@gmail.com datasyndrome. >> com >> >> >> >