Re: Json Parsing in map reduce.

Shahab Yunus Thu, 30 Apr 2015 10:19:16 -0700

The reason is that the Json parsing code is in a 3rd party library which is
not included in the default  map reduce/hadoop distribution. You have to
add them in your classpath at *runtime*. There are multiple ways to do it
(which also depends upon how you plan to run and package/deploy your code.)


Check out this:
https://hadoopi.wordpress.com/2014/06/05/hadoop-add-third-party-libraries-to-mapreduce-job/
http://blog.cloudera.com/blog/2011/01/how-to-include-third-party-libraries-in-your-map-reduce-job/

Regards,
Shahab

On Thu, Apr 30, 2015 at 1:01 PM, Shambhavi Punja <spu...@usc.edu> wrote:

> Hi,
>
> I am working on an assignment on Hadoop Map reduce. I am very new to Map
> Reduce.
>
> The assignment has many sections but for now I am trying to parse JSON
> data.
>
> The input(i.e. value) to the map function is a single record of the form
>  xyz, {'abc’:’pqr1’,'abc2’:'pq1, pq2’}, {‘key’:'value1’}
> I am interested only in the getting the frequency of value1.
>
> Following is the map- reduce job.
>
> public static class Map extends MapReduceBase implements
> Mapper<LongWritable, Text, Text, IntWritable> {
>               private final static IntWritable one = new IntWritable(1);
>               private Text word = new Text();
>
>
>               public void map(LongWritable key, Text value,
> OutputCollector<Text, IntWritable> output, Reporter reporter) throws
> IOException {
>                       String line = value.toString();
>                       String[] tuple = line.split("(?<=\\}),\\s");
>                       try{
>                       JSONObject obj = new JSONObject(tuple[1]);
>                       String id = obj.getString(“key");
>                           word.set(id);
>                           output.collect(word, one);
>                       }
>                       catch(JSONException e){
>                           e.printStackTrace();
>                       }
>                   }
>             }
>
>
>
>
>         public static class Reduce extends MapReduceBase implements
> Reducer<Text, IntWritable, Text, IntWritable> {
>               public void reduce(Text key, Iterator<IntWritable> values,
> OutputCollector<Text, IntWritable> output, Reporter reporter) throws
> IOException {
>                     int sum = 0;
>                     while (values.hasNext()) {
>                           sum += values.next().get();
>                         }
>                     output.collect(key, new IntWritable(sum));
>                   }
>             }
>
> I successfully compiled the java code using the json and hadoop jars.
> Created a jar. But wen I run the Hadoop command I am getting the following
> exceptions.
>
>
> 15/04/30 00:36:49 WARN util.NativeCodeLoader: Unable to load native-hadoop
> library for your platform... using builtin-java classes where applicable
> 15/04/30 00:36:49 WARN mapred.JobClient: Use GenericOptionsParser for
> parsing the arguments. Applications should implement Tool for the same.
> 15/04/30 00:36:49 WARN snappy.LoadSnappy: Snappy native library not loaded
> 15/04/30 00:36:49 INFO mapred.FileInputFormat: Total input paths to
> process : 1
> 15/04/30 00:36:49 INFO mapred.JobClient: Running job:
> job_local1121514690_0001
> 15/04/30 00:36:49 INFO mapred.LocalJobRunner: Waiting for map tasks
> 15/04/30 00:36:49 INFO mapred.LocalJobRunner: Starting task:
> attempt_local1121514690_0001_m_000000_0
> 15/04/30 00:36:49 INFO mapred.Task:  Using ResourceCalculatorPlugin : null
> 15/04/30 00:36:49 INFO mapred.MapTask: Processing split:
> file:/Users/Shamvi/gumgum/jars/input/ab1.txt:0+305
> 15/04/30 00:36:49 INFO mapred.MapTask: numReduceTasks: 1
> 15/04/30 00:36:49 INFO mapred.MapTask: io.sort.mb = 100
> 15/04/30 00:36:49 INFO mapred.MapTask: data buffer = 79691776/99614720
> 15/04/30 00:36:49 INFO mapred.MapTask: record buffer = 262144/327680
> 15/04/30 00:36:49 INFO mapred.LocalJobRunner: Map task executor complete.
> 15/04/30 00:36:49 WARN mapred.LocalJobRunner: job_local1121514690_0001
> java.lang.Exception: java.lang.RuntimeException: Error in configuring
> object
> at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354)
> Caused by: java.lang.RuntimeException: Error in configuring object
> at
> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
> at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
> at
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:426)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366)
> at
> org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.reflect.InvocationTargetException
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:483)
> at
> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
> ... 10 more
> Caused by: java.lang.NoClassDefFoundError: org/json/JSONException
> at java.lang.Class.forName0(Native Method)
> at java.lang.Class.forName(Class.java:344)
> at
> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:810)
> at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:855)
> at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:881)
> at org.apache.hadoop.mapred.JobConf.getMapperClass(JobConf.java:968)
> at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
> ... 15 more
> Caused by: java.lang.ClassNotFoundException: org.json.JSONException
> at java.net.URLClassLoader$1.run(URLClassLoader.java:372)
> at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(URLClassLoader.java:360)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> ... 22 more
> 15/04/30 00:36:50 INFO mapred.JobClient:  map 0% reduce 0%
> 15/04/30 00:36:50 INFO mapred.JobClient: Job complete:
> job_local1121514690_0001
> 15/04/30 00:36:50 INFO mapred.JobClient: Counters: 0
> 15/04/30 00:36:50 INFO mapred.JobClient: Job Failed: NA
> Exception in thread "main" java.io.IOException: Job failed!
> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1357)
> at org.myorg.Wordcount.main(Wordcount.java:64)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:483)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:160)
>
>
> PS: When I modify the same code and exclude the JSON parsing i.e. find
> frequency of {‘key’:’value1’} section of the example input, all works well.
>
>

Re: Json Parsing in map reduce.

Reply via email to