Hi Claudio, Thanks very much for tracking this down. Sure, please file a JIRA, thank you.
Cheers, Chris On Nov 23, 2010, at 3:38 AM, Claudio Martella wrote: > Hello list, > > in my previous posts i reported about not being able to run nutch on a > hadoop cluster running cloudera's cdh 0.20.2+737. > (http://search.lucidimagination.com/search/document/b66fa844b87b2654/failure_running_on_hadoop#52c43d8c4137ea8c > and > http://search.lucidimagination.com/search/document/a2b151e6a7041c13/nutch_1_x_doesn_t_run_on_cloudera_s_cdh3#2991508ce0ae5d52) > > Basically the problem was hadoop not finding some nutch plugin classes > like URLNormalizer etc. > > I reported back to cloudera directly > (https://groups.google.com/a/cloudera.org/group/cdh-user/browse_thread/thread/9147acfc4d18cfaf#) > and it looks like the problem is connected to MAPREDUCE-967 (now part of > hadoop 0.21 and backported by cloudera to their cdh 0.20.2). > > What the patch does is basically modify the way MapReduce unpacks the > job's jar. The old way was to unpack the whole of it, now only classes/ > and lib/ are unpacked. This way nutch is missing the plugins/ directory. > The nutch job format should be changed accordingly. > > Todd Lipcon suggested a workaround until that moment: setting > 'mapreduce.job.jar.unpack.pattern' configuration to > "(?:classes/|lib/|plugins/).*" > > Should I file a JIRA? > > -- > Claudio Martella > Digital Technologies > Unit Research & Development - Analyst > > TIS innovation park > Via Siemens 19 | Siemensstr. 19 > 39100 Bolzano | 39100 Bozen > Tel. +39 0471 068 123 > Fax +39 0471 068 129 > claudio.marte...@tis.bz.it http://www.tis.bz.it > > Short information regarding use of personal data. According to Section 13 of > Italian Legislative Decree no. 196 of 30 June 2003, we inform you that we > process your personal data in order to fulfil contractual and fiscal > obligations and also to send you information regarding our services and > events. Your personal data are processed with and without electronic means > and by respecting data subjects' rights, fundamental freedoms and dignity, > particularly with regard to confidentiality, personal identity and the right > to personal data protection. At any time and without formalities you can > write an e-mail to priv...@tis.bz.it in order to object the processing of > your personal data for the purpose of sending advertising materials and also > to exercise the right to access personal data and other rights referred to in > Section 7 of Decree 196/2003. The data controller is TIS Techno Innovation > Alto Adige, Siemens Street n. 19, Bolzano. You can find the complete > information on the web site www.tis.bz.it. > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: chris.mattm...@jpl.nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++