Hi,
Can anybody brief me how to delete already stored fields from index??
Thnx
Ratnesh, V2Solutions India
--
View this message in context:
http://www.nabble.com/how-to-get-rid-of-some-of-the-fields-that-are-indexed-by-default-eg.-content%2Ctitle%2Curl-etc.-tf3512921.html#a9810570
Sent from
I've been trying to create a plugin for Confluence where I use the
Nutch API. The configuration class causes a lot of headaches for me.
If I create one there is an assumption about the location of the
hadoop-default.xml and hadoop-site.xml. This doesn't fit my setup
particularly well. I was
We are running a search service on the internet using two machines. We
have a crawler machine which crawls the web and merges new documents
found into the Lucene index. We have a searcher machine which allows
users to perform searches on the Lucene index.
Periodically, we would copy the newest
Unfortunately I don't have lots of solutions for you, because I'm still not
having a so big index! But it sounds like the weak point is disk access
during the copy?
Try to cache the index in memory? (needs a lot of ram!)
Or having two HDD on your searcher, one for current index, the other for
2007/4/3, Chun Wei Ho [EMAIL PROTECTED]:
As the index have been growing in size, we have been noticing that the
search response time on the searcher machine increases drastically
when an index (about 15GB) is being copied from the crawler to the
searcher. Both machines run Fedora Core 4 and are
how about using load balancing system on the search server ?
each time you can only update one of the balance server. the others would be
running smoothly.
在 星期二 03 四月 2007 22:39,Chun Wei Ho 写道:
We are running a search service on the internet using two machines. We
have a crawler machine
Hi,
I would like to know if know if it is a good idea to use nutch web carwler?
Basically, this is what I need:
1. I have a list of web site
2. I want the web crawler to go thru each site, parser the anchor. if
it is the same domain, go thru the same step for 3 level.
3. For each link, write to
I have total certainty that nutch is what are you looking for. Take a look
to nutch's documentation for more details and you will see :).
On 4/3/07, Meryl Silverburgh [EMAIL PROTECTED] wrote:
Hi,
I would like to know if know if it is a good idea to use nutch web
carwler?
Basically, this is