Re: Sort Performance Problems across large dataset

2005-01-24 Thread Matt Quail
Peter, Currently we can issue a simple search query and expect a response back in about 0.2 seconds (~3,000 results) You may want to try something like the following (I do this in FishEye, seems to be performant for moderately large field-spaces). Use a custom HitCollector, and store all the ma

Re: How to handle range queries over large ranges and avoid Too Many Boolean clauses

2004-05-18 Thread Matt Quail
> Is there a simpler, easier way to do this? Yes. I have started implementing a "QuickRangeQuery" class, that doesn't have the BooleanQuery limitation, but scores every matching document as 1.0. I will see if I can get it finished in the next 24 hours, and post back to this thread. =Matt PS: I'

Re: hierarchical search

2004-05-17 Thread Matt Quail
Fredrik, I would tackle your problem like this: Say that that field you want to index is "path". I would turn this into *three* indexed fields: 1) multiple path prefixes ("pre-paths") 2) multiple path suffixes ("post-paths") 3) the number of "components" in the path ("path-size"). For example, for

Re: Memory Requirements

2004-05-13 Thread Matt Quail
BitSet) for queries where I am not interested in the score. Apart from that, I'm not aware of any other methods for reducing the memory consumption. =Matt Sascha Ottolski wrote: Am Donnerstag, 13. Mai 2004 12:56 schrieb Matt Quail: I noticed that most users have +- 1G of RAM to run L

Re: Memory Requirements

2004-05-13 Thread Matt Quail
I noticed that most users have +- 1G of RAM to run Lucene. Does anyone have experiences running it on a 128MB or 256MB machine? I regularly test my app that uses Lucene by passing -Xmx8m to the JVM; this is on a box with 1G of ram, but the JVM never more than 8M. My app runs fine (though there is

Re: Mixing database and lucene searches

2004-05-11 Thread Matt Quail
Is it possible to use float and date ranges in that case? Or maybe I should just read the details in the manual and stop asking stupid questions. :-) There is no such thing as a stupid question ;-) At the end of the day, Lucene just handles strings; and it handles them lexocographically. The Date

Re: Mixing database and lucene searches

2004-05-11 Thread Matt Quail
Eric Jain wrote: To ask a silly question: What approach does Lucene use for ranges and sorting? A range such as '10-60' is expanded into a boolean query containing all terms that are in the index and lie within the specified range, e.g. '10 or 11 or 20 or 59'. Yes, using a range search requires

Re: Mixing database and lucene searches

2004-05-10 Thread Matt Quail
But presumably if you make sure that the field "name" is indexed in Lucene (not necessarily stored), it's far quicker to do a two-stage process, where you search Lucene for "text:foo AND name:matt" and then just fill-in any other metadata with a fast little DB lookup for the small returned set of p

Re: Mixing database and lucene searches

2004-05-10 Thread Matt Quail
Glen Stampoultzis wrote: Just one comment about your strategy for combining db and lucene searches. It seems that it would slow down significantly the larger the results, although I can't see a better way to go about it. For example if the lucene search matched 100 records and the database searche

Re: Mixing database and lucene searches

2004-05-10 Thread Matt Quail
Glen Stampoultzis wrote: Anyone have any strategies for dealing with this? I'm wondering whether it's better to replicate searchable fields in the lucene index. This means being very careful that updates get done in two places so it is not ideal. If you *can* manage to update your index when the

Re: Range searches for numbers

2004-05-06 Thread Matt Quail
Reece, What's the best way to store numbers for range searching? If someone has some info about this I'd love to see it. I implemented a "LongField" that encodes any +ve or -ve long into a string that sorts correctly. I posted that class here: http://www.mail-archive.com/[EMAIL PROTECTED]/msg0479

Re: Presentation in Mtl

2004-04-14 Thread Matt Quail
I too gave a Lucene presentation to my local JUG (Canberra, Australia) last night. It also went over very well. Lucene totally rocks! =Matt Stephane James Vaucher wrote: Hi everyone, I did a presentation tonight in Montreal at a java users group metting. I've got to say that they were maybe 4 c

Re: code works with 1.3-rc1 but not with 1.3-final??

2004-03-22 Thread Matt Quail
Or use IndexWriter.setUseCompundFile(true) to reduce the number of files created by Lucene. http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/index/IndexWriter.html#setUseCompoundFile(boolean) =Matt Kevin A. Burton wrote: Dan wrote: I have some code that creates a lucene index. It has

Re: PrefixQuery and hieracical queries problem

2004-03-19 Thread Matt Quail
o.File; import java.io.IOException; /** * This code is in the public domain * * @author Matt Quail (http://madbean.com/) */ public class HieracicalTreeExample { private static final Analyzer ANALYZER = new WhitespaceAnalyzer(); private static final File sIndexDir = new File("d

Re: PrefixQuery and hieracical queries problem

2004-03-19 Thread Matt Quail
Dennis Thrysøe wrote: The only alternative I can think of would be to store a whitespace seperated list of all ancestors along with a document: /foo /foo/bar /foo/bar/baz I think you will find that this kind of approach works very well (as it has for me). But instead of adding one field named "p

Re: java.io.tmpdir as lock dir .... once again

2004-03-02 Thread Matt Quail
I had to do something similar to make the application works with lucene 1.3 final when upgrading from 1.3 RC1. I think it is better to maintain back compatiable so existing users are not affected too much when a new release is available. I'd like to "me too" this sentiment. That change caused me a

Re: Iterating TermEnum backwards

2004-02-26 Thread Matt Quail
I know I could "invert" my dates (something like MAX_LONG - date) to get the REVERSE order, but I want to be able to do "least recent" and "most recent". Why not have two date fields, one inverted and one not? PS: my current solution is to do a binary search between MIN and MAX, halving my searc

Iterating TermEnum backwards

2004-02-25 Thread Matt Quail
Hi all, Is there any way to iterate through a TermEnum backwards? Okay, I know that there isn't a way to do this via the TermEnum class, but is it "implementable" on top of the underlying Lucene datastore? My particular problem is this: I have an index of documents, each document has a "date" fie

1.3-final: now giving me java.io.FileNotFoundException (Too many open files)

2004-01-21 Thread Matt Quail
I'm getting the following stack trace from lucene-1.3-final running on JDK 1.4.2_03-b02 on linux java.io.FileNotFoundException: /home/matt/blah/idx/_123n.tis (Too many open files) at java.io.RandomAccessFile.open(Native Method) at java.io.RandomAccessFile.(RandomAccessFile.java:20