Lucene index sizes and performance

2007-07-07 Thread Chun Wei Ho
We are currently running a search service with a single Lucene index of about 10 GB. We would like to find out: (a) What is the usual index size of everyone else? How large have Lucene index gone in prodution environments, and is there a sort of a optimal size that Lucene indexes should be? (b)

Re: Scaling up to several machines with Lucene

2007-07-07 Thread Chun Wei Ho
you done profiling on your application such that you are sure moving Lucene off the machine is going to help that much? Cheers, Grant ps, the mailing lists strips attachments. On Jun 28, 2007, at 10:19 AM, Samuel LEMOINE wrote: > Chun Wei Ho a écrit : >> Hi, >> >> We are

Scaling up to several machines with Lucene

2007-06-28 Thread Chun Wei Ho
Hi, We are currently running a Tomcat web application serving searches over our Lucene index (10GB) on a single server machine (Dual 3GHz CPU, 4GB RAM). Due to performance issues and to scale up to handle more traffic/search requests, we are getting another server machine. We are looking at two

Re: Index updates between machines

2007-04-06 Thread Chun Wei Ho
Thanks for the ideas. We are testing out the methods and changes suggested to see if they work with our current set up, and are checking if the disks are the bottleneck in this case, but feel free to drop more hints. :) At the moment we are copying the index at an offpeak hour, but we would also

Index updates between machines

2007-04-03 Thread Chun Wei Ho
We are running a search service on the internet using two machines. We have a crawler machine which crawls the web and merges new documents found into the Lucene index. We have a searcher machine which allows users to perform searches on the Lucene index. Periodically, we would copy the newest ve

Optimizing search speed & performance for a 10G Index.

2006-12-07 Thread Chun Wei Ho
Hi, We run a search engine based on Lucene 1.9.1 / Nutch 0.7.2. Our index has approximately 2 million documents and the physical size of it is about 10 GB. We run it as a tomcat web application on a Fedora Core 4 server with duo Xeon 3.2GHz processors and 4GB RAM. We receive about 46500 web sear

Classifieds rotation - weighting Lucene results by previous show frequency?

2006-08-07 Thread Chun Wei Ho
We are starting to run a small index of classifieds alongside our main search items. The classifieds are also in a lucene index. We show classifieds that match the user's search criteria, which means we do a lucene search on that index and show the top few results. We also keep track of the number

QueryFilter and Memory

2006-07-13 Thread Chun Wei Ho
Hi, I've been trying to adjust the weightings for my searches (thanks Chris for his replies on that thread), and have been using ConstantScoreQuery to even out scores from portions in my query that I want to match but not to contribute to the ranking of that result. I convert a BooleanQuery/Term

Reducing the boost for a particular Term

2006-07-10 Thread Chun Wei Ho
I have a index from which I have a number of documents from authors, but would like to drop the relevance/score for documents from one particular author using the query. That is for documents returned by querying: (content:"miracle cure"), I would like to reduce the relevancy of authorid:3024 How

Giving weight to partial matches

2006-06-21 Thread Chun Wei Ho
I am performing searches on an index that includes a title field and a content field, and return results only if either title or content matches ALL the words searched. So searching for "miracle cure for cancer" might yield: (+title:miracle +title:cure +title:for +title:cancer)^5.0 (+content:mira

Getting all the matching documents for a search

2006-06-01 Thread Chun Wei Ho
Hi, I use Hits to search for and get documents matching a particular query, e.g.: Hits hits = indexSearcher.search(new TermQuery(new Term("startswith","A"))); but it is not returning all the matching documents in the index. From experimentation it appears to return about less than half the match

Updating documents in index with some fields not stored

2006-05-10 Thread Chun Wei Ho
I would like to make some updates to values within my large index. I understand that I have to delete and re-insert each document to be changed to do that. However I do have some large fields that are unstored (only indexed and no, these are not the fields that I am wanting to change), which means

Adding a new search field but needs searching for all

2006-05-10 Thread Chun Wei Ho
I have a large Lucene index that I am planning on adding one or more search fields, and perform searches on them. How do I include results from the other documents that do not have the new field? For example, I have 10 million documents in a index, and I update 200 of them adding the field "b" =

Obtain terms for only particular field(s)

2006-05-04 Thread Chun Wei Ho
Hi, I have a pretty large index and I would like to obtain all the Terms for only one or two particular fields. As I understand - IndexReader.terms() returns a termEnum of all the terms in the index, and I would have to iterate through all of them to pick out the ones from the fields that I want

Simpler QueryParser

2006-03-20 Thread Chun Wei Ho
I am wondering if anyone has existing code for a simpler QueryParser - one that does not create the more complex prefix/fuzzy/range queries, but still allow the usual term/boolean queries. I use QueryParser to directly parse user input (allowing for more flexible specification of include/exclude a

Hardware Requirements for a large index?

2006-02-15 Thread Chun Wei Ho
Hi, I am in the process of deciding specs for a crawling machine and a searching machine (two machines), which will support merging/indexing and searching operations on a single Lucene index that may scale to about several million pages (at which it would be about 2-10 GB, assuming linear growth w

Re: Suggesting refine searches with Lucene

2006-02-13 Thread Chun Wei Ho
ull; > > > public Query getQuery() { > return query; > } > > > public void setQuery(Query query) { > this.query = query; > } > > > public String toString(){ > return query.toString(); > } > >

Suggesting refine searches with Lucene

2006-02-13 Thread Chun Wei Ho
Hi, I am trying to suggest refine searches for my Lucene search. For example, if a search turned out too many searches, it would list a number of document title subsequences that occurred frequently in the results of the previous search, as possible candidates for refining the search. Does anyone

Help: tweaking search - reducing IDF skew and implementing score cutoff

2006-02-09 Thread Chun Wei Ho
Hi, I am running a search for something akin to a news site, when each news document has a date, title, keywords/bylines, summary fields and then the actual content. Using Lucene for this database of documents, it seems that: 1. The relevancy score is skewed drastically by the actual number of ne

Distributed vs Merged Searching

2006-01-31 Thread Chun Wei Ho
I am deploying a web application serving searches on a Lucene index, and am deciding between distributing search between several machines or single searching, and was hoping that someone could tell me from their experiences: + Is there anything particular to watch out for if using distributed sear

Re: Getting the document number (with IndexReader)

2006-01-26 Thread Chun Wei Ho
Thanks for the info :) One last related question. If I delete documents using a IndexReader(), can I assume that the internal document numbers of other undeleted documents (obtained using the same IndexReader instance) will not change until I call IndexReader.close()?

Re: Getting the document number (with IndexReader)

2006-01-26 Thread Chun Wei Ho
Hi, Thanks for the help, just a few more questions: On 1/26/06, Paul Elschot <[EMAIL PROTECTED]> wrote: > On Thursday 26 January 2006 09:15, Chun Wei Ho wrote: > > I am attempting to prune an index by getting each document in turn and > > then checking/deleting it: &

Getting the document number (with IndexReader)

2006-01-26 Thread Chun Wei Ho
I am attempting to prune an index by getting each document in turn and then checking/deleting it: IndexReader ir = IndexReader.open(path); for(int i=0;i