RE: Question regarding sorting and memory consumption in lucene

2008-10-10 Thread Robert Stewart
I have had a similar problem. What I do is load all the date field values at index startup, convert dates (timestamps) to a Julian date (# of seconds since 1970/1/1). Then I pre-sort that array using a very fast O(n) distribution sort, and then keep an array of integers which is the pre-sorted p

custom tag scoring question

2008-10-08 Thread Robert Stewart
We have a custom "tagger" application which identifies certain entities (such as companies, etc.) and applies a "relevance" value to each entity based upon overall relevance in some document. Then we index these "tags" into Lucene index by storing them in an indexed field (same name, different

RE: Replicating Lucene Index with out SOLR

2008-08-28 Thread Robert Stewart
We don't use Solr, since we run on Windows ;(, but we did implement very similar snapshot replication. We have 2 master index servers building indexes, partitioned by document. Every 1 minute, we stop index writer, create a local snapshot (on the master server), in directory named MMDDHHM

RE: windows file system cache

2008-08-19 Thread Robert Stewart
iginal Message- From: Mark Miller [mailto:[EMAIL PROTECTED] Sent: Monday, August 18, 2008 10:03 AM To: java-user@lucene.apache.org Subject: Re: windows file system cache Mark Miller wrote: > Mark Miller wrote: >> Robert Stewart wrote: >>> Anyone else run on Windows? We have index a

windows file system cache

2008-08-16 Thread Robert Stewart
Anyone else run on Windows? We have index around 26 GB in size. Seems file system cache ends up taking up nearly all available RAM (26 GB out of 32 GB on 64-bit box). Lucene process is around 5 GB, so very little left over for queries, etc, and box starts swapping during searches. I think ch

when do internal doc IDs change?

2008-08-12 Thread Robert Stewart
We have a problem where using FieldCache (or using TermEnum/TermDocs directly) in order to pre-cache several fields. It is a bottleneck, because we open a new searchable index snapshot very frequently (every minute). Each time we get a new snapshot of our master index (basically a copy using h

updating existing field values

2008-08-07 Thread Robert Stewart
Given an existing index, is there any way to update the value of a particular field across all documents, without deleting/re-indexing documents? For instance, if I have a date field, and I need to offset all dates based on change to stored time zone (subtract 12 hours from each value). The so

RE: Fastest way to get just the "bits" of matching documents

2008-07-28 Thread Robert Stewart
ser@lucene.apache.org Subject: Re: Fastest way to get just the "bits" of matching documents Op Thursday 24 July 2008 23:00:33 schreef Robert Stewart: > Queries are very complex in our case, some have up to 100 or more > clauses (over several fields), including disjunctions and pro

caching fields for query performance

2008-07-25 Thread Robert Stewart
If I have a frequently queried field, which has a single value per document (such as language), how can I pre-cache all field values, such that the underlying query processing always uses memory cache (never disk i/o) for that particular field? I don't know if it is possible without some custom

RE: Fastest way to get just the "bits" of matching documents

2008-07-24 Thread Robert Stewart
y have bottelneck in Scoring (I doubt it) then you have 2: - Wait for Paul to come back from Holidays, he wanted to make "pure Boolean" queries, without Scoring, possible :) - Invest in faster CPU/Memory have fun eks - Original Message > From: Robert Stewart <

RE: Using lucene to search a bunch of keywords?

2008-07-23 Thread Robert Stewart
You need to invert the process. Using Lucene may not be the best option... You need to make your document a key into an index of key words. I've done the same thing, but not with Lucene. You need to pass through the document and for each word (token) lookup in some index (hashtable) to find po

Fastest way to get just the "bits" of matching documents

2008-07-22 Thread Robert Stewart
I need to execute a boolean query and get back just the bits of all the matching documents. I do additional filtering (date ranges and entitlements) and then do my own sorting later on. I know that using QueryFilter.Bits() will still compute scores for all matching documents. I do not want to