I agree. this falls into the area where technical limit is reached. Time to
modify the spec.
I thought about this issue over this couple of days, there is really NO
silver bullet. If the field is multi-value field and the distinct field
values are not too many, you might reduce memory usage by st
: I'm wondering then if the Sorting infrastructure could be refactored
: to allow with some sort of policy/strategy where one can choose a
: point where one is not willing to use memory for sorting, but willing
...
: To accomplish this would require a substantial change to the
: FieldSor
A memory saving optimization would be to not load the corresponding
String[] in the string index (as discussed previously), but there is
currently no way to tell the FieldCachethat the strings are unneeded.
The String values are only needed for merging results in a
MultiSearcher.
Yep, which hap
On 4/9/07, jian chen <[EMAIL PROTECTED]> wrote:
But, on a higher level, my idea is really just to create an array of
integers for each sort field. The array length is NumOfDocs in the index.
Each integer corresponds to a displayable string value. For example, if you
have a field of different colo
Hi, Paul,
I think to warm-up or not, it needs some benchmarking for specific
application.
For the implementation of the sort fields, when I talk about norms in
Lucene, I am thinking we could borrow the same implmentation of the norms to
do it.
But, on a higher level, my idea is really just to c
In our application, we have to sync up the index pretty frequently,
the
warm-up of the index is killing it.
Yep, it speeds up the first sort, but at the cost of making all the
others slower (maybe significantly so). That's obviously not ideal
but could make use of sorts in larger index
Hi, Paul,
Thanks for your reply. For your previous email about the need for disk based
sorting solution, I kind of agree about your points. One incentive for your
approach is that we don't need to warm-up the index anymore in case that the
index is huge.
In our application, we have to sync up th
Paul Smith wrote:
I don't disagree with the premise that it involves substantial I/O and
would increase the time taken to sort, and why this approach shouldn't
be the default mechanism, but it's not too difficult to build a disk I/O
subsystem that can allocate many spindles to service this and
Now, if we could use integers to represent the sort field values,
which is
typically the case for most applications, maybe we can afford to
have the
sort field values stored in the disk and do disk lookup for each
document
matched? The look up of the sort field value will be as simple as
On 10/04/2007, at 4:18 AM, Doug Cutting wrote:
Paul Smith wrote:
Disadvantages to this approach:
* It's a lot more I/O intensive
I think this would be prohibitive. Queries matching more than a
few hundred documents will take several seconds to sort, since
random disk accesses are requir
Hi, Doug,
I have been thinking about this as well lately and have some thoughts
similar to Paul's approach.
Lucene has the norm data for each document field. Conceptually it is a byte
array with one byte for each document field. At query time, I think the norm
array is loaded into memory the fir
Paul Smith wrote:
Disadvantages to this approach:
* It's a lot more I/O intensive
I think this would be prohibitive. Queries matching more than a few
hundred documents will take several seconds to sort, since random disk
accesses are required per matching document. Such an approach is only
A discussion on the user list brought my mind to the longer term
scalability issues of Lucene. Lucene is inherently memory efficient,
except for sorting, when the inverted index nature of the index works
against the required nature of having a value for each object to sort
against.
I'm h
13 matches
Mail list logo