Hi,

We are calling external map reduce program inside our pig script to perform
a specific task. Lets take the example crawling process.

-- Load the all seed urls into the relation crawldata.

*crawldata = load 'baseurls' using PigStorage( pageid: chararray,
pageurl:chararray)*
normalizedata = foreach crawldata generate pageid, normalize(pageurl)

--In the above url list, we have good urls and bad/blocklisted urls. we
need to filter the block listed urls. To filter these block listed, we have
a java map reduce program "*blocklisturls*.*jar*". So instead of writing
pig latin statement to filter this block listed urls, we will this java map
reduce program as below.

goodurls = *mapreduce blocklisturls*.*jar*
*                store *normalizedata *into '/path/input'*
*                load '/path/output' as (pageid:chararray,
pageurl:chararray)*

In the above pig latin statement is a sequence of steps:
1. store will write normalizedata into HDFS under the path of '/path/input.
2. blocklisturls java map reduce program is called on input '/path/input',
process and filters block listed urls, then write the output into HDFS
under the path of '/path/output'
3. load operator will load the data from HDFS (/path/output) into goodurls
relation

Thanks
Nagamallikarjuna




On Thu, Jul 10, 2014 at 4:42 PM, Nivetha K <[email protected]> wrote:

> Hi,
>
>    Thanks for replying.Can you please explain how mapreduce operator works
> in pig
> On 5 July 2014 10:35, Darpan R <[email protected]> wrote:
>
> > Looks like Classpath problem :java.lang.RuntimeException:
> > java.lang.ClassNotFoundException:
> > Class
> > WordCount$Map not found
> >
> > Can you make sure your jar is in the class path ?
> >
> >
> > On 4 July 2014 11:19, Nivetha K <[email protected]> wrote:
> >
> > > Hi,
> > >
> > >      I am currently working with Pig. I get struck with following
> script.
> > > A = load 'sample.txt';
> > > B = MAPREDUCE '/home/training/simp.jar' Store A into 'inputDir' Load
> > > 'outputDir' as (word:chararray, count: int) `WordCount inputDir
> > outputDir`;
> > > dump B;
> > >
> > >
> > > Error :
> > >
> > > 2014-07-04 11:17:57,811 [main] WARN
>  org.apache.hadoop.mapred.JobClient -
> > > No job jar file set.  User classes may not be found. See JobConf(Class)
> > or
> > > JobConf#setJar(String).
> > > 2014-07-04 11:18:16,313 [main] INFO
>  org.apache.hadoop.mapred.JobClient -
> > > Task Id : attempt_201407011531_0147_m_000000_2, Status : FAILED
> > > java.lang.RuntimeException: java.lang.ClassNotFoundException: Class
> > > WordCount$Map not found
> > >         at
> > > org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1774)
> > >         at
> > >
> > >
> >
> org.apache.hadoop.mapreduce.task.JobContextImpl.getMapperClass(JobContextImpl.java:191)
> > >         at
> > org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:631)
> > >         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330)
> > >         at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
> > >         at java.security.AccessController.doPrivileged(Native Method)
> > >         at javax.security.auth.Subject.doAs(Subject.java:396)
> > >         at
> > >
> > >
> >
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438)
> > >         at org.apache.hadoop.mapred.Child.main(Child.java:262)
> > > Caused by: java.lang.ClassNotFoundException: Class WordCount$Map not
> > found
> > >         at
> > >
> > >
> >
> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1680)
> > >         at
> > > org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1772)
> > >
> > >
> > >
> > > please help me to solve the problem
> > >
> > >
> > > regards,
> > >
> > > Nivetha.
> > >
> >
>



-- 
Thanks and Regards
Nagamallikarjuna

Reply via email to