I have had a similar problem. What I do is load all the date field values at
index startup, convert dates (timestamps) to a Julian date (# of seconds since
1970/1/1). Then I pre-sort that array using a very fast O(n) distribution sort,
and then keep an array of integers which is the pre-sorted p
We have a custom "tagger" application which identifies certain entities (such
as companies, etc.) and applies a "relevance" value to each entity based upon
overall relevance in some document.
Then we index these "tags" into Lucene index by storing them in an indexed
field (same name, different
We don't use Solr, since we run on Windows ;(, but we did
implement very similar snapshot replication. We have 2 master index servers
building indexes, partitioned by document. Every 1 minute, we stop index
writer, create a local snapshot (on the master server), in directory named
MMDDHHM
iginal Message-
From: Mark Miller [mailto:[EMAIL PROTECTED]
Sent: Monday, August 18, 2008 10:03 AM
To: java-user@lucene.apache.org
Subject: Re: windows file system cache
Mark Miller wrote:
> Mark Miller wrote:
>> Robert Stewart wrote:
>>> Anyone else run on Windows? We have index a
Anyone else run on Windows? We have index around 26 GB in size. Seems file
system cache ends up taking up nearly all available RAM (26 GB out of 32 GB on
64-bit box). Lucene process is around 5 GB, so very little left over for
queries, etc, and box starts swapping during searches. I think ch
We have a problem where using FieldCache (or using TermEnum/TermDocs directly)
in order to pre-cache several fields. It is a bottleneck, because we open a
new searchable index snapshot very frequently (every minute). Each time we get
a new snapshot of our master index (basically a copy using h
Given an existing index, is there any way to update the value of a particular
field across all documents, without deleting/re-indexing documents? For
instance, if I have a date field, and I need to offset all dates based on
change to stored time zone (subtract 12 hours from each value). The so
ser@lucene.apache.org
Subject: Re: Fastest way to get just the "bits" of matching documents
Op Thursday 24 July 2008 23:00:33 schreef Robert Stewart:
> Queries are very complex in our case, some have up to 100 or more
> clauses (over several fields), including disjunctions and pro
If I have a frequently queried field, which has a single value per document
(such as language), how can I pre-cache all field values, such that the
underlying query processing always uses memory cache (never disk i/o) for that
particular field? I don't know if it is possible without some custom
y have bottelneck in Scoring (I doubt it) then you have 2:
- Wait for Paul to come back from Holidays, he wanted to make "pure Boolean"
queries, without Scoring, possible :)
- Invest in faster CPU/Memory
have fun
eks
- Original Message
> From: Robert Stewart <
You need to invert the process. Using Lucene may not be the best option... You
need to make your document a key into an index of key words. I've done the
same thing, but not with Lucene. You need to pass through the document and for
each word (token) lookup in some index (hashtable) to find po
I need to execute a boolean query and get back just the bits of all the
matching documents. I do additional filtering (date ranges and entitlements)
and then do my own sorting later on. I know that using QueryFilter.Bits() will
still compute scores for all matching documents. I do not want to
12 matches
Mail list logo