Hi -- I am having trouble with the solrindexer parameters -- I see that Lewis had similar problems a few months ago. Any idea what I am doing wrong?
bitnami@ip-10-202-202-68:~/nutch-1.3/nutch-1.3/runtime/local$ bin/nutch > solrindex http://zimzazsearch3-1.bitnamiapp.com:8983/solr/ crawl/crawldb > crawl/linkdb crawl/segments/* > SolrIndexer: starting at 2011-10-09 00:13:24 > org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20110922143907/crawl_fetch > Input path does not exist: > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20110922143907/crawl_parse > Input path does not exist: > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20110922143907/parse_data > Input path does not exist: > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20110922143907/parse_text > Input path does not exist: > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20110922144329/crawl_fetch > Input path does not exist: > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20110922144329/crawl_parse > Input path does not exist: > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20110922144329/parse_data > Input path does not exist: > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20110922144329/parse_text > Input path does not exist: > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20111008015309/crawl_parse > Input path does not exist: > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20111008015309/parse_data > Input path does not exist: > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20111008015309/parse_text > Input path does not exist: > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/nohup.out/crawl_fetch > Input path does not exist: > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/nohup.out/crawl_parse > Input path does not exist: > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/nohup.out/parse_data > Input path does not exist: > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/nohup.out/parse_text ----------------------------------------------------- Subscribe to the Nimble Books Mailing List http://eepurl.com/czS- for monthly updates On Sat, Oct 8, 2011 at 14:22, lewis john mcgibbney < [email protected]> wrote: > Hi guys, > > I have been watching this thread intently and I am very happy to see that > there is some progress :0) > > Radim, > > Can I ask that you open a JIRA issue and submit a patch, this way we can > not > only track it, but it will also give the community a chance to test and > validate the patch prior to integration into the source. > > Thanks > > Lewis > > On Fri, Oct 7, 2011 at 5:49 PM, Ramanathapuram, Rajesh < > [email protected]> wrote: > > > Hi Radim, > > > > Thank you so much for this. I am not familiar with commit process to the > > core. > > Is there someone who can help us get this committed and help resolve > this > > issue? > > > > Thanks for all your help. > > > > Rajesh Ramana > > > > -----Original Message----- > > From: Radim Kolar [mailto:[email protected]] > > Sent: Thursday, October 06, 2011 2:18 PM > > To: [email protected] > > Subject: Re: Nutch not crawling URLs with spanish accented characters ( > ñ) > > > > - The REGEX normalizer transforms the special characters, but fails to > > substitute ‘%F1’ or ‘%C3%B1’ for ‘ñ’ > > - The fetcher is having trouble interpreting the links with special > > character ‘ñ’. > > > > i can add this transformation to basic-url normalizer if somebody is > > willing to commit it. > > > > > > -- > *Lewis* >

