Re: Storage of data between crawls

2011-07-28 Thread Chris Alexander
Cheers Lewis, perhaps I should attempt to rephrase the question. Clearly Nutch must download and store the contents of a page during a crawl. However, once you have indexed this content, does Nutch keep this data, or is it cleaned up, automatically or is there a command to do it? Thanks Chris

not able to start nutch 1.3

2011-07-28 Thread Piyush Garg
Hi All, I have installed nutch 1.3, hadoop-0.20.2 and I was going through the tutorial http://wiki.apache.org/nutch/NutchTutorial I have done as it is mentioned there and edited $NUTCH_HOME/conf/nutch-site.xml as well as nutch-default.xml and $NUTCH_HOME/runtime/local/conf/nutch-site.xml but did

Re: not able to start nutch 1.3

2011-07-28 Thread Julien Nioche
http://wiki.apache.org/nutch/NutchTutorial = 1. Edit $NUTCH_HOME/conf/nutch-site.xml (or $*NUTCH_HOME/runtime/local/conf/nutch-site.xml with version = 1.3*) and add Have modified the WIKI now that the tutorial is for 1.3 only Thanks Julien On 28 July 2011 11:25, Piyush Garg

Re: not able to start nutch 1.3

2011-07-28 Thread Piyush Garg
Sorry, got it working, I was using runtime/deploy/bin/nutch instead of runtime/local/bin/nutch Thanks Julien for your response. On Thu, Jul 28, 2011 at 4:35 PM, Julien Nioche lists.digitalpeb...@gmail.com wrote: http://wiki.apache.org/nutch/NutchTutorial = 1. Edit

Re: Storage of data between crawls

2011-07-28 Thread lewis john mcgibbney
Well when Nutch undertakes a generate fetch and parse e.g. the steps that generate segment data for indexing, the data is stored in various forms within the segment. There is much more purpose to the segment that explained in this reply however it does not add to this particular thread. If you

RE: Nutch not indexing full collection

2011-07-28 Thread Chip Calhoun
Thanks! This has solved half of my problem. I am now indexing material from every document I want. However, I'm still not indexing words from toward the end of longer documents. I'm not sure what else I could be missing. The current contents of my nutch-site.xml are: ?xml version=1.0?

Re: nutch 1.3 + solr server

2011-07-28 Thread Way Cool
Sorry, my bad, I posted an empty war file yesterday. Click herehttp://tutorials.waycoolsearch.com/search/solr.warto download the war file and let me know if it doesn't work for you. On Wed, Jul 27, 2011 at 1:16 AM, Way Cool way1.wayc...@gmail.com wrote: As promised, I customized Solr browse