Hi, We are calling external map reduce program inside our pig script to perform a specific task. Lets take the example crawling process.
-- Load the all seed urls into the relation crawldata. *crawldata = load 'baseurls' using PigStorage( pageid: chararray, pageurl:chararray)* normalizedata = foreach crawldata generate pageid, normalize(pageurl) --In the above url list, we have good urls and bad/blocklisted urls. we need to filter the block listed urls. To filter these block listed, we have a java map reduce program "*blocklisturls*.*jar*". So instead of writing pig latin statement to filter this block listed urls, we will this java map reduce program as below. goodurls = *mapreduce blocklisturls*.*jar* * store *normalizedata *into '/path/input'* * load '/path/output' as (pageid:chararray, pageurl:chararray)* In the above pig latin statement is a sequence of steps: 1. store will write normalizedata into HDFS under the path of '/path/input. 2. blocklisturls java map reduce program is called on input '/path/input', process and filters block listed urls, then write the output into HDFS under the path of '/path/output' 3. load operator will load the data from HDFS (/path/output) into goodurls relation Thanks Nagamallikarjuna On Thu, Jul 10, 2014 at 4:42 PM, Nivetha K <[email protected]> wrote: > Hi, > > Thanks for replying.Can you please explain how mapreduce operator works > in pig > On 5 July 2014 10:35, Darpan R <[email protected]> wrote: > > > Looks like Classpath problem :java.lang.RuntimeException: > > java.lang.ClassNotFoundException: > > Class > > WordCount$Map not found > > > > Can you make sure your jar is in the class path ? > > > > > > On 4 July 2014 11:19, Nivetha K <[email protected]> wrote: > > > > > Hi, > > > > > > I am currently working with Pig. I get struck with following > script. > > > A = load 'sample.txt'; > > > B = MAPREDUCE '/home/training/simp.jar' Store A into 'inputDir' Load > > > 'outputDir' as (word:chararray, count: int) `WordCount inputDir > > outputDir`; > > > dump B; > > > > > > > > > Error : > > > > > > 2014-07-04 11:17:57,811 [main] WARN > org.apache.hadoop.mapred.JobClient - > > > No job jar file set. User classes may not be found. See JobConf(Class) > > or > > > JobConf#setJar(String). > > > 2014-07-04 11:18:16,313 [main] INFO > org.apache.hadoop.mapred.JobClient - > > > Task Id : attempt_201407011531_0147_m_000000_2, Status : FAILED > > > java.lang.RuntimeException: java.lang.ClassNotFoundException: Class > > > WordCount$Map not found > > > at > > > org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1774) > > > at > > > > > > > > > org.apache.hadoop.mapreduce.task.JobContextImpl.getMapperClass(JobContextImpl.java:191) > > > at > > org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:631) > > > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330) > > > at org.apache.hadoop.mapred.Child$4.run(Child.java:268) > > > at java.security.AccessController.doPrivileged(Native Method) > > > at javax.security.auth.Subject.doAs(Subject.java:396) > > > at > > > > > > > > > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438) > > > at org.apache.hadoop.mapred.Child.main(Child.java:262) > > > Caused by: java.lang.ClassNotFoundException: Class WordCount$Map not > > found > > > at > > > > > > > > > org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1680) > > > at > > > org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1772) > > > > > > > > > > > > please help me to solve the problem > > > > > > > > > regards, > > > > > > Nivetha. > > > > > > -- Thanks and Regards Nagamallikarjuna
