Re: some thoughts about adding transactions.

2005-01-11 Thread Scott Ganyo
I didn't want to let this drop this on the floor, but I haven't had the time to craft a response to it either. So, just for the record I agree that transactions would be nice. I think that it is important that the solution address change visibility and concurrent transactions within multiple

Re: dotLucene (port of Jakarta Lucene to C#)

2004-12-01 Thread Scott Ganyo
Why does it seem to you that C# is faster than Java? In any case, generally the bottleneck isn't the VM. It's the I/O to the disks... Scott The reasonable man adapts himself to the world; the unreasonable one persists in trying to adapt the world to himself. Therefore all progress depends on t

Re: BooleanQuery - Too Many Clases on date range.

2004-10-01 Thread Scott Ganyo
You can use: BooleanQuery.setMaxClauseCount(int maxClauseCount); to increase the limit. On Sep 30, 2004, at 8:24 PM, Chris Fraschetti wrote: I recently read in regards to my problem that date_field:[0820483200 TO 110448] is evluated into a series of boolean queries ... which has a cap of 1024 .

Re: Open-ended range queries

2004-06-10 Thread Scott ganyo
Well, I do like the *, but apparently there are some people that are using this with the null... Scott On Jun 10, 2004, at 7:15 PM, Erik Hatcher wrote: On Jun 10, 2004, at 4:54 PM, Scott ganyo wrote: It looks to me like Revision 1.18 broke it. It seems this could be it: revision 1.18 date: 2002

Re: Open-ended range queries

2004-06-10 Thread Scott ganyo
It looks to me like Revision 1.18 broke it. On Jun 10, 2004, at 3:26 PM, Erik Hatcher wrote: On Jun 10, 2004, at 4:07 PM, Terry Steichen wrote: Well, I'm using 1.4 RC3 and the "null" range upper limit works just fine for searches in two of my fields; one is in the form of a cannonical date (eg, 2

Re: Open-ended range queries

2004-06-10 Thread Scott ganyo
At one point it definitely supported null for either term. I think that has been removed/forgotten in the later revisions of the QueryParser... Scott On Jun 10, 2004, at 1:24 PM, Erik Hatcher wrote: On Jun 10, 2004, at 2:13 PM, Terry Steichen wrote: Actually, QueryParser does support open-ended

Re: DocumentWriter, StopFilter should use HashMap... (patch)

2004-03-11 Thread Scott ganyo
I don't buy it. HashSet is but one implementation of a Set. By choosing the HashSet implementation you are not only tying the class to a hash-based implementation, you are trying the interface to *that specific* hash-based implementation or it's subclasses. In the end, either you buy the con

Re: Index advice...

2004-02-10 Thread Scott ganyo
I have. While document.add() itself doesn't increase over time, the merge does. Ways of partially overcoming this include increasing the mergeFactor (but this will increase the number of file handles used), or building blocks of the index in memory and then merging them to disk. This has bee

Re: Paid support for Lucene

2004-01-29 Thread Scott ganyo
I am willing as well. Scott On Jan 29, 2004, at 12:04 PM, Boris Goldowsky wrote: Strangely, the web site does not seem to list any vendors who provide incident support for Lucene. That can't be right, can it? Can anyone point me to organizations that would be willing to provide support for Luce

Re: BooleanQuery question

2004-01-16 Thread Scott ganyo
No, you don't need required or prohibited, but you can't have both. Here is a rundown: * A required clause will allow a document to be selected if and only if it contains that clause and will exclude any documents that don't. * A prohibited clause will exclude any documents that contain that

Re: java.io.IOException: Bad file number

2003-11-10 Thread Scott Ganyo
I don't think adding extensive locking is necessary. What you are probably experiencing is that you've closed the index before you're done using it. If you aren't careful to close the index only after all searches on it have been completed, you'll get an error like this. Scott [EMAIL PROTECT

Re: Multiple writers

2003-10-29 Thread Scott Ganyo
Offhand, I would say that using 2 directories and merging them is exactly what you waht. It really shouldn't be all that complicated and Lucene should handle the synchronization for you... Scott Dror Matalon wrote: Hi folks, We're in the process of adding search to our online RSS aggregator.

Re: Limit on number of required/prohibited clauses

2003-09-05 Thread Scott Ganyo
Hi Eugene, Yes. Doug (Cutting) added this to eliminate OutOfMemory errors that apparently some people were having. Unfortunately, it causes backward-compatibility issues if you were used to using version 1.2. So, you'll need to add a call like this: BooleanQuery.setMaxClauseCount(Integer.MA

Re: Reuse IndexSearcher?

2003-08-19 Thread Scott Ganyo
Yes. You can (and should for best performance) reuse an IndexSearcher as long as you don't need access to changes made to the index. An open IndexSearcher won't pick up changes to the index, so if you need to see the changes, you will need to open a new searcher at that point. Scott Aviran M

Re: Make Lucene Index distributable

2003-08-18 Thread Scott Ganyo
Be careful with option 1. NFS and the Lucene file-based locking mechanism don't get along extremely well. (See the archives for details...) Scott Lienhard, Andrew wrote: I can think of three options: 1) Single index dir on a shared drive (NFS, etc.) which is mounted on each app server. 2)

Re: NLucene up to date ?

2003-07-31 Thread Scott Ganyo
Do these implementations maintain file compatibility with the Java version? Scott Erik Hatcher wrote: I'd love to see there be quality implementations of the Lucene API in other languages, that are up to date with the latest Java codebase. I'm embarking on a Ruby port, which I'm hosting at rub

Re: Luke - Lucene Index Browser

2003-07-14 Thread Scott Ganyo
Nifty cool! I'm gonna like this, I can tell already! I'm having a really hard time actually using Luke, though, as all the window panes and table columns are apparently of fixed size. Do you think you could through in the ability to resize the various window panes and table columns? This wou

Re: Directory implementation using NIO

2003-07-07 Thread Scott Ganyo
Wonderful! I can't wait to try this. I'll try to provide some comparisons as I get to it, but I'd love to hear from anyone else that tries this... Thanks, Scott Francesco Bellomi wrote: Hi, I developed a Directory implementation that accesses an index stored on the filesystem using memory-ma

Re: Lucene Benchmarks and Information

2002-12-20 Thread Scott Ganyo
FYI: The best thing I've found for both increasing speed and reducing file handles is to use an IndexWriter on a RamDirectory for indexing and then use FileWriter.addIndexes() to write the result to disk. This is subject to the amount of memory you have available, of course... Scott Armbrust,

Re: Incremental indexing

2002-12-05 Thread Scott Ganyo
+1. Support for transactions in Lucene are high on my list of desirable features as well. I would love to have time to look into adding this, but lately... well, you know how that goes. Scott Eric Jain wrote: If you want to update a set of documents, you can remove their previous version firs

Re: optimize()

2002-11-27 Thread Scott Ganyo
We generally optimize only after a full index (re-)build or during periods where the index is not being unused. Scott Leo Galambos wrote: Unoptimized index is not a problem for document additions, they take constant time, regardless of the size of the index and regardless of whether the index is

Re: Updating documents

2002-11-22 Thread Scott Ganyo
Not each time you search, but if you've modified the index since you opened the searcher, you need to create a new searcher to get the changes. Scott Rob Outar wrote: There is a reloading issue but I do not think lastModified is it: static long lastModified(Directory directory) Retur

Re: How does delete work?

2002-11-22 Thread Scott Ganyo
It just marks the record as deleted. The record isn't actually removed until the index is optimized. Scott Rob Outar wrote: Hello all, I used the delete(Term) method, then I looked at the index files, only one file changed "_1tx.del" I found references to the file still in some of the ind

Re: Fun project?

2002-11-21 Thread Scott Ganyo
I'm rather partial to Jini for distributed systems, but I agree that JXTA would definitely be the way to go on this type of peer-to-peer scenario. Scott [EMAIL PROTECTED] wrote: I'll be doing something very similar some time in the next 12 months for the project I'm working on. I'll be more th

Re: Searching Ranges

2002-11-11 Thread Scott Ganyo
n imagine how this improves the avg efficiency in my case if i have 1 terms in "references". although i may be doing something that was either not intended or ill-designed. thanks, any thoughts? alex On Mon, 2002-11-11 at 10:50, Scott Ganyo wrote: >Hi Alex, > >I just looked

Re: Searching Ranges

2002-11-11 Thread Scott Ganyo
Hi Alex, I just looked at this and had the following thought: The RangeQuery must continue to iterate after the first match is found in order to match everything within the specified range. In other words, if you have a range of "a" to "d", you can't stop with "a", you need to continue to "d"

Re: Your experiences with Lucene

2002-10-29 Thread Scott Ganyo
Actually, 10k isn't very large. We have indexes with more than 1M records. It hasn't been a problem. Scott Tim Jones wrote: Hi, I am currently starting work on a project that requires indexing and searching on potentially thousands, maybe tens of thousands, of text documents. I'm hoping tha

RE: Concurency in Lucene

2002-10-17 Thread Scott Ganyo
This sounds like an excellent start and would certainly be useful in a number of scenarios, but it is not quite as generally useful as it could be given its asynchronous nature. Generally expected database behavior is that when a change is committed (and not before) it is immediately viewable in

RE: Using Filters in Lucene

2002-07-31 Thread Scott Ganyo
Cool. But instead of adding a new class, why not change Hits to inherit from Filter and add the bits() method to it? Then one could "pipe" the output of one Query into another search without modifying the Queries... Scott > -Original Message- > From: Doug Cutting [mailto:[EMAIL PROTECT

Forked files? was: RE: Too many open files?

2002-07-23 Thread Scott Ganyo
thing? It would seem that if there was an efficient implementation of a forked file, perhaps that could be used instead of the set of files that Lucene currently uses to represent a segment. Scott > -Original Message- > From: Scott Ganyo [mailto:[EMAIL PROTECTED]] > Sent: Tuesda

RE: Too many open files?

2002-07-23 Thread Scott Ganyo
Are you closing the searcher after each when done? No: Waiting for the garbage collector is not a good idea. Yes: It could be a timeout on the OS holding the files handles. Either way, the only real option is to avoid thrashing the searchers... Scott > -Original Message- > From: Hang

RE: Too many open files?

2002-07-23 Thread Scott Ganyo
Yup. Cache and reuse your Searcher as much as possible. Scott > -Original Message- > From: Hang Li [mailto:[EMAIL PROTECTED]] > Sent: Tuesday, July 23, 2002 9:59 AM > To: Lucene Users List > Subject: Too many open files? > > > > > > I have seen a lot postings about this topic. Any fi

RE: CachedSearcher

2002-07-16 Thread Scott Ganyo
done with them rather than allowing finalization to take care of it. Scott > -Original Message- > From: Doug Cutting [mailto:[EMAIL PROTECTED]] > Sent: Tuesday, July 16, 2002 11:56 AM > To: Lucene Users List > Subject: Re: CachedSearcher > > > Scott Ganyo w

RE: CachedSearcher

2002-07-16 Thread Scott Ganyo
I'd like to see the finalize() methods removed from Lucene entirely. In a system with heavy load and lots of gc, using finalize() causes problems. To wit: 1) I was at a talk at JavaOne last year where the gc performance experts from Sun (the engineers actually writing the HotSpot gc) were givin

RE: IndexReader Pool

2002-07-08 Thread Scott Ganyo
Deadlocks could be created if the order in which locks are obtained is not consistent. Note, though, that the locks are obtained in the same order each time throughout. (BTW: The inner lock is merely needed because the wait/notify calls need to own the monitor.) Naturally, you are free to make

RE: IndexReader Pool

2002-07-05 Thread Scott Ganyo
You are correct. Actually, there have been a few bug fixes since that was posted. Here's a diff to an updated version: @@ -19,11 +19,21 @@ */ public class IndexAccessControl { - public static final Analyzer LUCENE_ANALYZER = new LuceneAnalyzer(); + private static Analyzer s_defa

RE: Stress Testing Lucene

2002-06-27 Thread Scott Ganyo
x27;m goig to need some > insider > help to get through this one. > > N. > > -Original Message- > From: Scott Ganyo [mailto:[EMAIL PROTECTED]] > Sent: Wednesday, June 26, 2002 7:15 PM > To: 'Lucene Users List' > Subject: RE: Stress Testing Lucene > >

RE: Stress Testing Lucene

2002-06-26 Thread Scott Ganyo
1) Are you sure that the index is corrupted? Maybe the file handles just haven't been released yet. Did you try to reboot and try again? 2) To avoid the too-many files problem: a) increase the system file handle limits, b) make sure that you reuse IndexReaders as much as you can rather across r

RE: Boolean Query + Memory Monster

2002-06-13 Thread Scott Ganyo
Use the java -Xmx option to increase your heap size. Scott > -Original Message- > From: Nader S. Henein [mailto:[EMAIL PROTECTED]] > Sent: Thursday, June 13, 2002 12:20 PM > To: [EMAIL PROTECTED] > Subject: Boolean Query + Memory Monster > > > > I have 1 Geg of memory on the machine w

RE: Queryparser croaking on "[" and "]"

2002-02-20 Thread Scott Ganyo
Actually, [] denotes an inclusive range of Terms. Anyway, why not change the syntax if this is bad...? Scott > -Original Message- > From: Brian Goetz [mailto:[EMAIL PROTECTED]] > Sent: Wednesday, February 20, 2002 10:08 AM > To: Lucene Users List > Subject: Re: Queryparser croaking on "

RE: JDK 1.1 vs 1.2+

2002-01-22 Thread Scott Ganyo
+1 > -Original Message- > From: Matt Tucker [mailto:[EMAIL PROTECTED]] > Sent: Tuesday, January 22, 2002 11:06 AM > To: 'Lucene Users List' > Subject: RE: JDK 1.1 vs 1.2+ > > > Hey all, > > I'd just like to chime in support for dropping JDK 1.1, > especially if it > would aid i18n in

Re: Industry Use of Lucene?

2001-12-06 Thread Scott Ganyo
We use Lucene extensively as a core part of our ASP product here at eTapestry. In fact, we've built our database query engine on top of it. We have been extremely pleased with the results. Scott Jeff Kunkle asks: > Does anyone know of any companies or agencies using Lucene for their > products

RE: Memory Usage?

2001-11-12 Thread Scott Ganyo
I think something like this would be a HUGE boon for us. We do a lot of complex queries on a lot of different indexes and end up suffering from severe garbage collection issues on our system. I'd be willing to help out in any way to make this issue go away as soon as possible. Scott > -Ori

RE: Indexing problem

2001-11-02 Thread Scott Ganyo
an I get Doug's example of indexing in memory and then > writing it out > to disk? I just recently subscribed to this list and I can't > find it in the > archive. > > Thanks. > Paul > > -Original Message- > From: Scott Ganyo [mailto:[EMAIL

RE: Problems with prohibited BooleanQueries

2001-11-02 Thread Scott Ganyo
ft side of a BooleanQuery subtract. Sure, it works, but it ain't pretty... Scott > -Original Message- > From: Doug Cutting [mailto:[EMAIL PROTECTED]] > Sent: Thursday, November 01, 2001 10:49 AM > To: 'Lucene Users List' > Subject: RE: Problems with prohibi

RE: Indexing problem

2001-11-02 Thread Scott Ganyo
Yes. You have too many open files. There are a few things you can try. 1) Increase the number of file handles your system has available. Yes, there is a setting for this in Windows. 2) Make sure that you have the IndexWriter.maxMergeDocs set to Integer.MAX_VALUE (the default). 3) Try smalle

RE: Brackets in query syntax?

2001-11-01 Thread Scott Ganyo
e anything about the range query in the syntax BNF. > > In regards to the exception, I would expect that searching on > the query "[]" > or "name:[]" would either find all documents or no documents, > not throw an > exception? > > > -Original Mes

RE: Problems with prohibited BooleanQueries

2001-11-01 Thread Scott Ganyo
How difficult would it be to get BooleanQuery to do a standalone NOT, do you suppose? That would be very useful in my case. Scott > -Original Message- > From: Doug Cutting [mailto:[EMAIL PROTECTED]] > Sent: Wednesday, October 31, 2001 2:36 PM > To: 'Lucene Users List' > Subject: RE: Pro

RE: Brackets in query syntax?

2001-11-01 Thread Scott Ganyo
[ and ] are used for RangeQuery. They indicate an inclusive range. For example: "name:[adam-scott]" > -Original Message- > From: Paul Friedman [mailto:[EMAIL PROTECTED]] > Sent: Wednesday, October 31, 2001 2:03 PM > To: '[EMAIL PROTECTED]' > Subject: Brackets in query syntax? > > > Ar

RE: new Lucene release: 1.2 RC2

2001-10-23 Thread Scott Ganyo
Message- > From: Doug Cutting [mailto:[EMAIL PROTECTED]] > Sent: Friday, October 19, 2001 9:33 PM > To: Scott Ganyo; '[EMAIL PROTECTED]' > Subject: RE: new Lucene release: 1.2 RC2 > > > > From: Scott Ganyo [mailto:[EMAIL PROTECTED]] > > Sent: Friday, Octobe

RE: new Lucene release: 1.2 RC2

2001-10-19 Thread Scott Ganyo
Oops... and the WildcardQuery issues that Robert Lebowitz just reported. > -Original Message- > From: Scott Ganyo [mailto:[EMAIL PROTECTED]] > Sent: Friday, October 19, 2001 5:28 PM > To: 'Doug Cutting'; '[EMAIL PROTECTED]' > Subject: RE: new Lucene rel

RE: new Lucene release: 1.2 RC2

2001-10-19 Thread Scott Ganyo
Well, we know of at least two issues: 1) RAMDirectory not merging properly (reported by me) 2) Indexes left in an inconsistent state on crash (i don't remember who reported this) Are these to be left as known issues for 1.2? Thanks, Scott > -Original Message- > From: Doug Cutting [mail

RE: Trying To Understand Query Syntax Details

2001-10-16 Thread Scott Ganyo
Not sure about the rest, but if you've stored your dates in mmdd format, you can use a RangeQuery like so: dateField:[20011001-null] This would return all dates on or after October 1, 2001. Scott > -Original Message- > From: W. Eliot Kimber [mailto:[EMAIL PROTECTED]] > Sent: Tuesda

RE: File Handles issue

2001-10-16 Thread Scott Ganyo
> > P.S. At one point I tried doing an in-memory index using the > > RAMDirectory > > and then merging it with an on-disk index and it didn't work. The > > RAMDirectory never flushed to disk... leaving me with an > > empty index. I > > think this is because of a bug in the mechanism that is >

RE: File Handles issue

2001-10-15 Thread Scott Ganyo
Thanks for the detailed information, Doug! That helps a lot. Based on what you've said and on taking a closer look at the code, it looks like by setting mergeFactor and maxMergeDocs to Integer.MAX_VALUE, an entire index will be built in a single segment completely in memory (using the RAMDirecto

File Handles issue

2001-10-11 Thread Scott Ganyo
We're having a heck of a time with too many file handles around here. When we create large indexes, we often get thousands of temporary files in a given index! Even worse, we just plain run out of file handles--even on boxes where we've upped the limits as much as we think we can! We've played