Ok got this thing working.. Turns out that -libjars should be mentioned before specifying hdfs input and output.. rather than after it.. :-/ Thanks everyone.
On Thu, May 30, 2013 at 1:35 PM, jamal sasha <jamalsha...@gmail.com> wrote: > Hi, > I did that but still same exception error. > I did: > export HADOOP_CLASSPATH=/path/to/external.jar > And then had a -libjars /path/to/external.jar added in my command but > still same error > > > On Thu, May 30, 2013 at 11:46 AM, Shahab Yunus <shahab.yu...@gmail.com>wrote: > >> For starters, you can specify them through the -libjars parameter when >> you kick off your M/R job. This way the jars will be copied to all TTs. >> >> Regards, >> Shahab >> >> >> On Thu, May 30, 2013 at 2:43 PM, jamal sasha <jamalsha...@gmail.com>wrote: >> >>> Hi Thanks guys. >>> I figured out the issue. Hence i have another question. >>> I am using a third party library and I thought that once I have created >>> the jar file I dont need to specify the dependancies but aparently thats >>> not the case. (error below) >>> Very very naive question...probably stupid. How do i specify third party >>> libraries (jar) in hadoop. >>> >>> Error: >>> Error: java.lang.ClassNotFoundException: org.json.JSONException >>> at java.net.URLClassLoader$1.run(URLClassLoader.java:202) >>> at java.security.AccessController.doPrivileged(Native Method) >>> at java.net.URLClassLoader.findClass(URLClassLoader.java:190) >>> at java.lang.ClassLoader.loadClass(ClassLoader.java:306) >>> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) >>> at java.lang.ClassLoader.loadClass(ClassLoader.java:247) >>> at java.lang.Class.forName0(Native Method) >>> at java.lang.Class.forName(Class.java:247) >>> at >>> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:820) >>> at >>> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:865) >>> at >>> org.apache.hadoop.mapreduce.JobContext.getMapperClass(JobContext.java:199) >>> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:719) >>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) >>> at org.apache.hadoop.mapred.Child$4.run(Child.java:255) >>> at java.security.AccessController.doPrivileged(Native Method) >>> at javax.security.auth.Subject.doAs(Subject.java:396) >>> at >>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093) >>> at org.apache.hadoop.mapred.Child.main(Child.java:249) >>> >>> >>> >>> On Thu, May 30, 2013 at 2:02 AM, Pramod N <npramo...@gmail.com> wrote: >>> >>>> Whatever you are trying to do should work, >>>> Here is the modified WordCount Map >>>> >>>> >>>> public void map(LongWritable key, Text value, Context context) throws >>>> IOException, InterruptedException { String line = value.toString(); >>>> >>>> JSONObject line_as_json = new JSONObject(line); >>>> String text = line_as_json.getString("text"); >>>> StringTokenizer tokenizer = new StringTokenizer(text); >>>> while (tokenizer.hasMoreTokens()) { >>>> word.set(tokenizer.nextToken()); context.write(word, one); >>>> } } >>>> >>>> >>>> >>>> >>>> >>>> Pramod N <http://atmachinelearner.blogspot.in> >>>> Bruce Wayne of web >>>> @machinelearner <https://twitter.com/machinelearner> >>>> >>>> -- >>>> >>>> >>>> On Thu, May 30, 2013 at 8:42 AM, Rahul Bhattacharjee < >>>> rahul.rec....@gmail.com> wrote: >>>> >>>>> Whatever you have mentioned Jamal should work.you can debug this. >>>>> >>>>> Thanks, >>>>> Rahul >>>>> >>>>> >>>>> On Thu, May 30, 2013 at 5:14 AM, jamal sasha <jamalsha...@gmail.com>wrote: >>>>> >>>>>> Hi, >>>>>> For some reason, this have to be in java :( >>>>>> I am trying to use org.json library, something like (in mapper) >>>>>> JSONObject jsn = new JSONObject(value.toString()); >>>>>> >>>>>> String text = (String) jsn.get("text"); >>>>>> StringTokenizer itr = new StringTokenizer(text); >>>>>> >>>>>> But its not working :( >>>>>> It would be better to get this thing properly but I wouldnt mind >>>>>> using a hack as well :) >>>>>> >>>>>> >>>>>> On Wed, May 29, 2013 at 4:30 PM, Michael Segel < >>>>>> michael_se...@hotmail.com> wrote: >>>>>> >>>>>>> Yeah, >>>>>>> I have to agree w Russell. Pig is definitely the way to go on this. >>>>>>> >>>>>>> If you want to do it as a Java program you will have to do some work >>>>>>> on the input string but it too should be trivial. >>>>>>> How formal do you want to go? >>>>>>> Do you want to strip it down or just find the quote after the text >>>>>>> part? >>>>>>> >>>>>>> >>>>>>> On May 29, 2013, at 5:13 PM, Russell Jurney < >>>>>>> russell.jur...@gmail.com> wrote: >>>>>>> >>>>>>> Seriously consider Pig (free answer, 4 LOC): >>>>>>> >>>>>>> my_data = LOAD 'my_data.json' USING >>>>>>> com.twitter.elephantbird.pig.load.JsonLoader() AS json:map[]; >>>>>>> words = FOREACH my_data GENERATE $0#'author' as author, >>>>>>> FLATTEN(TOKENIZE($0#'text')) as word; >>>>>>> word_counts = FOREACH (GROUP words BY word) GENERATE group AS word, >>>>>>> COUNT_STAR(words) AS word_count; >>>>>>> STORE word_counts INTO '/tmp/word_counts.txt'; >>>>>>> >>>>>>> It will be faster than the Java you'll likely write. >>>>>>> >>>>>>> >>>>>>> On Wed, May 29, 2013 at 2:54 PM, jamal sasha >>>>>>> <jamalsha...@gmail.com>wrote: >>>>>>> >>>>>>>> Hi, >>>>>>>> I am stuck again. :( >>>>>>>> My input data is in hdfs. I am again trying to do wordcount but >>>>>>>> there is slight difference. >>>>>>>> The data is in json format. >>>>>>>> So each line of data is: >>>>>>>> >>>>>>>> {"author":"foo", "text": "hello"} >>>>>>>> {"author":"foo123", "text": "hello world"} >>>>>>>> {"author":"foo234", "text": "hello this world"} >>>>>>>> >>>>>>>> So I want to do wordcount for text part. >>>>>>>> I understand that in mapper, I just have to pass this data as json >>>>>>>> and extract "text" and rest of the code is just the same but I am >>>>>>>> trying to >>>>>>>> switch from python to java hadoop. >>>>>>>> How do I do this. >>>>>>>> Thanks >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Russell Jurney twitter.com/rjurney russell.jur...@gmail.com >>>>>>> datasyndrome.com >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >