Re: Reading json format input

jamal sasha Thu, 30 May 2013 13:59:17 -0700

Ok got this thing working..
Turns out that -libjars should be mentioned before specifying hdfs input
and output.. rather than after it..
:-/
Thanks everyone.



On Thu, May 30, 2013 at 1:35 PM, jamal sasha <jamalsha...@gmail.com> wrote:

> Hi,
>   I did that but still same exception error.
> I did:
> export HADOOP_CLASSPATH=/path/to/external.jar
> And then had a -libjars /path/to/external.jar added in my command but
> still same error
>
>
> On Thu, May 30, 2013 at 11:46 AM, Shahab Yunus <shahab.yu...@gmail.com>wrote:
>
>> For starters, you can specify them through the -libjars parameter when
>> you kick off your M/R job. This way the jars will be copied to all TTs.
>>
>> Regards,
>> Shahab
>>
>>
>> On Thu, May 30, 2013 at 2:43 PM, jamal sasha <jamalsha...@gmail.com>wrote:
>>
>>> Hi Thanks guys.
>>>  I figured out the issue. Hence i have another question.
>>> I am using a third party library and I thought that once I have created
>>> the jar file I dont need to specify the dependancies but aparently thats
>>> not the case. (error below)
>>> Very very naive question...probably stupid. How do i specify third party
>>> libraries (jar) in hadoop.
>>>
>>> Error:
>>> Error: java.lang.ClassNotFoundException: org.json.JSONException
>>>  at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
>>> at java.security.AccessController.doPrivileged(Native Method)
>>>  at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
>>>  at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
>>>  at java.lang.Class.forName0(Native Method)
>>> at java.lang.Class.forName(Class.java:247)
>>> at
>>> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:820)
>>>  at
>>> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:865)
>>> at
>>> org.apache.hadoop.mapreduce.JobContext.getMapperClass(JobContext.java:199)
>>>  at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:719)
>>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
>>>  at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
>>> at java.security.AccessController.doPrivileged(Native Method)
>>>  at javax.security.auth.Subject.doAs(Subject.java:396)
>>> at
>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)
>>>  at org.apache.hadoop.mapred.Child.main(Child.java:249)
>>>
>>>
>>>
>>> On Thu, May 30, 2013 at 2:02 AM, Pramod N <npramo...@gmail.com> wrote:
>>>
>>>> Whatever you are trying to do should work,
>>>> Here is the modified WordCount Map
>>>>
>>>>
>>>>     public void map(LongWritable key, Text value, Context context) throws 
>>>> IOException, InterruptedException {        String line = value.toString();
>>>>
>>>>         JSONObject line_as_json = new JSONObject(line);
>>>>         String text = line_as_json.getString("text");
>>>>         StringTokenizer tokenizer = new StringTokenizer(text);        
>>>> while (tokenizer.hasMoreTokens()) {            
>>>> word.set(tokenizer.nextToken());            context.write(word, one);      
>>>>   }    }
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Pramod N <http://atmachinelearner.blogspot.in>
>>>> Bruce Wayne of web
>>>> @machinelearner <https://twitter.com/machinelearner>
>>>>
>>>> --
>>>>
>>>>
>>>> On Thu, May 30, 2013 at 8:42 AM, Rahul Bhattacharjee <
>>>> rahul.rec....@gmail.com> wrote:
>>>>
>>>>> Whatever you have mentioned Jamal should work.you can debug this.
>>>>>
>>>>> Thanks,
>>>>> Rahul
>>>>>
>>>>>
>>>>> On Thu, May 30, 2013 at 5:14 AM, jamal sasha <jamalsha...@gmail.com>wrote:
>>>>>
>>>>>> Hi,
>>>>>>   For some reason, this have to be in java :(
>>>>>> I am trying to use org.json library, something like (in mapper)
>>>>>> JSONObject jsn = new JSONObject(value.toString());
>>>>>>
>>>>>> String text = (String) jsn.get("text");
>>>>>> StringTokenizer itr = new StringTokenizer(text);
>>>>>>
>>>>>> But its not working :(
>>>>>> It would be better to get this thing properly but I wouldnt mind
>>>>>> using a hack as well :)
>>>>>>
>>>>>>
>>>>>> On Wed, May 29, 2013 at 4:30 PM, Michael Segel <
>>>>>> michael_se...@hotmail.com> wrote:
>>>>>>
>>>>>>> Yeah,
>>>>>>> I have to agree w Russell. Pig is definitely the way to go on this.
>>>>>>>
>>>>>>> If you want to do it as a Java program you will have to do some work
>>>>>>> on the input string but it too should be trivial.
>>>>>>> How formal do you want to go?
>>>>>>> Do you want to strip it down or just find the quote after the text
>>>>>>> part?
>>>>>>>
>>>>>>>
>>>>>>> On May 29, 2013, at 5:13 PM, Russell Jurney <
>>>>>>> russell.jur...@gmail.com> wrote:
>>>>>>>
>>>>>>> Seriously consider Pig (free answer, 4 LOC):
>>>>>>>
>>>>>>> my_data = LOAD 'my_data.json' USING
>>>>>>> com.twitter.elephantbird.pig.load.JsonLoader() AS json:map[];
>>>>>>> words = FOREACH my_data GENERATE $0#'author' as author,
>>>>>>> FLATTEN(TOKENIZE($0#'text')) as word;
>>>>>>> word_counts = FOREACH (GROUP words BY word) GENERATE group AS word,
>>>>>>> COUNT_STAR(words) AS word_count;
>>>>>>> STORE word_counts INTO '/tmp/word_counts.txt';
>>>>>>>
>>>>>>> It will be faster than the Java you'll likely write.
>>>>>>>
>>>>>>>
>>>>>>> On Wed, May 29, 2013 at 2:54 PM, jamal sasha 
>>>>>>> <jamalsha...@gmail.com>wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>    I am stuck again. :(
>>>>>>>> My input data is in hdfs. I am again trying to do wordcount but
>>>>>>>> there is slight difference.
>>>>>>>> The data is in json format.
>>>>>>>> So each line of data is:
>>>>>>>>
>>>>>>>> {"author":"foo", "text": "hello"}
>>>>>>>> {"author":"foo123", "text": "hello world"}
>>>>>>>> {"author":"foo234", "text": "hello this world"}
>>>>>>>>
>>>>>>>> So I want to do wordcount for text part.
>>>>>>>> I understand that in mapper, I just have to pass this data as json
>>>>>>>> and extract "text" and rest of the code is just the same but I am 
>>>>>>>> trying to
>>>>>>>> switch from python to java hadoop.
>>>>>>>> How do I do this.
>>>>>>>> Thanks
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Russell Jurney twitter.com/rjurney russell.jur...@gmail.com
>>>>>>> datasyndrome.com
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Reading json format input

Reply via email to