Hi I have duly updated both the Nutch Gotchas [1] and the tutorial [2] to incorporate these gotchas which have been highlighted. Thanks for pointing these out.
[1] http://wiki.apache.org/nutch/NutchGotchas [2] http://wiki.apache.org/nutch/RunningNutchAndSolr On Tue, Jul 12, 2011 at 12:03 AM, Jerry E. Craig, Jr. < jcr...@inforeverse.com> wrote: > Just from a total noob standpoint (just installed my first LAMP box over > the last month) realizing that I needed to look in the Runtime folder when I > downloaded the tar.gz file was a HUGE step. > > Then we all run the Crawl at least to make sure things work. The main > tutorial was missing the [-solr] part of the crawl command line to get that > to index. It wasn't after someone helped me here and pointed me to the > actual documents that I found it. > > Those were the 2 big things for me as a total noob, otherwise I'm really > happy to have at least that part working. Now, my stupid CentOS install > only has libxml2 2.6.15 and I need 2.6.17 for php and I'm a few revisions > off on libcurl also. I have NO idea how to go back and fix that. Not sure > if I should just try to upgrade to php53 and hope for the best or what. > But, that's more of a solr / php question than a Nutch question I think. > > > -----Original Message----- > From: Markus Jelsma [mailto:markus.jel...@openindex.io] > Sent: Monday, July 11, 2011 3:19 PM > To: user@nutch.apache.org > Cc: lewis john mcgibbney > Subject: Re: Nutch Gotchas as of release 1.3 > > Well, now i'm thinking of it: yes. > > - there were three (incl. myself) people mentioning the problem described > in NUTCH-1016; > - a few users don't seem to catch the part of the tutorial telling them to > add their robot to the config > - missing crawl-urlfilter > - mails about missing solrUrl > > I think quite a few users still rely on the crawl command instead of > running a script. > > > Hello list, > > > > Do we have any suggestions we wish to discuss regarding the above? > > > > thanks > -- *Lewis*