Cheers Lewis, perhaps I should attempt to rephrase the question.
Clearly Nutch must download and store the contents of a page during a crawl.
However, once you have indexed this content, does Nutch keep this data, or
is it cleaned up, automatically or is there a command to do it?
Thanks
Chris
Hi All,
I have installed nutch 1.3, hadoop-0.20.2 and I was going through the
tutorial http://wiki.apache.org/nutch/NutchTutorial
I have done as it is mentioned there and edited $NUTCH_HOME/conf/nutch-site.xml
as well as nutch-default.xml and
$NUTCH_HOME/runtime/local/conf/nutch-site.xml
but did
http://wiki.apache.org/nutch/NutchTutorial =
1.
Edit $NUTCH_HOME/conf/nutch-site.xml (or
$*NUTCH_HOME/runtime/local/conf/nutch-site.xml
with version = 1.3*) and add
Have modified the WIKI now that the tutorial is for 1.3 only
Thanks
Julien
On 28 July 2011 11:25, Piyush Garg
Sorry, got it working, I was using runtime/deploy/bin/nutch instead of
runtime/local/bin/nutch
Thanks Julien for your response.
On Thu, Jul 28, 2011 at 4:35 PM, Julien Nioche
lists.digitalpeb...@gmail.com wrote:
http://wiki.apache.org/nutch/NutchTutorial =
1.
Edit
Well when Nutch undertakes a generate fetch and parse e.g. the steps that
generate segment data for indexing, the data is stored in various forms
within the segment. There is much more purpose to the segment that explained
in this reply however it does not add to this particular thread.
If you
Thanks! This has solved half of my problem. I am now indexing material from
every document I want. However, I'm still not indexing words from toward the
end of longer documents. I'm not sure what else I could be missing.
The current contents of my nutch-site.xml are:
?xml version=1.0?
Sorry, my bad, I posted an empty war file yesterday. Click
herehttp://tutorials.waycoolsearch.com/search/solr.warto download
the war file and let me know if it doesn't work for you.
On Wed, Jul 27, 2011 at 1:16 AM, Way Cool way1.wayc...@gmail.com wrote:
As promised, I customized Solr browse
7 matches
Mail list logo