Thanks, Julien! Your first recommendation worked great! On Jan 27, 2012 5:35 PM, "Julien Nioche" <[email protected]> wrote:
> of course you can also copy nutch-site.xml over to the hadoop conf dir on > the master node > > On 26 January 2012 10:33, Julien Nioche <[email protected]>wrote: > >> Hi Ali >> >> You need to modify $NUTCH_HOME/conf/nutch-site.xml and rebuild the job >> file with 'ant job'. In distributed mode the conf files are taken from >> within the job file >> >> HTH >> >> Julien >> >> >> >> The configuration files for the "local" mode are setup fine (since a >>> crawl >>> in local mode succeeded). However, for running in deploy mode (as output >>> above), since the "deploy" folder did not have any "conf" subdirectory, I >>> assumed that either: >>> a) the conf files need to be copied over under "deploy/conf", OR >>> b) the conf files need to be placed onto HDFS. >>> >>> I have verified that option (a) above does not fix the issue. So, I'm >>> assuming that the Nutch configuration files need to exist in HDFS, for >>> the >>> HDFS fetcher to run successfully? However, I don't know at what path >>> within >>> HDFS I should place these Nutch conf files, or perhaps I'm barking up the >>> wrong tree? >>> >>> If Nutch reads config files during "deploy" mode from the files under >>> "local/conf", then why is it that the local crawl worked fine, but the >>> deploy-mode crawl isn't? >>> >> >> >> >> -- >> * >> *Open Source Solutions for Text Engineering >> >> http://digitalpebble.blogspot.com/ >> http://www.digitalpebble.com >> > > > > -- > * > *Open Source Solutions for Text Engineering > > http://digitalpebble.blogspot.com/ > http://www.digitalpebble.com >

