how to get rid of some of the fields that are indexed by default eg. content,title,url etc.

2007-04-03 Thread Ratnesh,V2Solutions India
Hi, Can anybody brief me how to delete already stored fields from index?? Thnx Ratnesh, V2Solutions India -- View this message in context: http://www.nabble.com/how-to-get-rid-of-some-of-the-fields-that-are-indexed-by-default-eg.-content%2Ctitle%2Curl-etc.-tf3512921.html#a9810570 Sent from

Configuration frustrations

2007-04-03 Thread Trond Andersen
I've been trying to create a plugin for Confluence where I use the Nutch API. The configuration class causes a lot of headaches for me. If I create one there is an assumption about the location of the hadoop-default.xml and hadoop-site.xml. This doesn't fit my setup particularly well. I was

Index updates between machines

2007-04-03 Thread Chun Wei Ho
We are running a search service on the internet using two machines. We have a crawler machine which crawls the web and merges new documents found into the Lucene index. We have a searcher machine which allows users to perform searches on the Lucene index. Periodically, we would copy the newest

Re: Index updates between machines

2007-04-03 Thread cybercouf
Unfortunately I don't have lots of solutions for you, because I'm still not having a so big index! But it sounds like the weak point is disk access during the copy? Try to cache the index in memory? (needs a lot of ram!) Or having two HDD on your searcher, one for current index, the other for

Re: Index updates between machines

2007-04-03 Thread Tomi N/A
2007/4/3, Chun Wei Ho [EMAIL PROTECTED]: As the index have been growing in size, we have been noticing that the search response time on the searcher machine increases drastically when an index (about 15GB) is being copied from the crawler to the searcher. Both machines run Fedora Core 4 and are

Re: Index updates between machines

2007-04-03 Thread david euler
how about using load balancing system on the search server ? each time you can only update one of the balance server. the others would be running smoothly. 在 星期二 03 四月 2007 22:39,Chun Wei Ho 写道: We are running a search service on the internet using two machines. We have a crawler machine

Using nutch as a web crawler

2007-04-03 Thread Meryl Silverburgh
Hi, I would like to know if know if it is a good idea to use nutch web carwler? Basically, this is what I need: 1. I have a list of web site 2. I want the web crawler to go thru each site, parser the anchor. if it is the same domain, go thru the same step for 3 level. 3. For each link, write to

Re: Using nutch as a web crawler

2007-04-03 Thread Lourival Júnior
I have total certainty that nutch is what are you looking for. Take a look to nutch's documentation for more details and you will see :). On 4/3/07, Meryl Silverburgh [EMAIL PROTECTED] wrote: Hi, I would like to know if know if it is a good idea to use nutch web carwler? Basically, this is