Re: If you could have one feature in Lucene...

2010-02-24 Thread Simon Wistow
On Wed, Feb 24, 2010 at 08:42:02AM -0500, Grant Ingersoll said: > What would it be? Adding, deleting and updating of individual fields in a document. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For addit

Re: Unexpected results searching for phrase with stop words

2009-11-12 Thread Simon Wistow
On Thu, Nov 12, 2009 at 09:20:30PM +0100, Uwe Schindler said: > Which version of Lucene are you using and which Version constant do you pass > to Analyzer and Query Parser? In 2.9.0 there was a bug/incorrect setting > between the query parser and the Version.LUCENE_CURRENT / Version.LUCENE_29 > set

Unexpected results searching for phrase with stop words

2009-11-12 Thread Simon Wistow
I have a document with the title "Here, there be dragons" and a body. When I search for Here, there be dragons (no quotes) with a title boost of 2.0 and a body boost of 0.8 I get the document as the first hit which is what I'd expect. However, if change the query to "Here, there be dragons"

Typical Indexing performance

2008-06-03 Thread Simon Wistow
I know this is one of those "How long is a piece of string?" questions but I'm curious as to the order of magnitude of indexing performance. http://lucene.apache.org/java/docs/benchmarks.html seems to indicate about 100-120 docs/s is pretty good for average sized documents (say, an email or som

Re: Atomicity and AutoCommit

2008-02-27 Thread Simon Wistow
On Wed, Feb 27, 2008 at 09:38:55AM -0500, Michael McCandless said: > > When you previously saw corruption was it due to an OS or machine > crash (or power cord got pulled)? If so, you were likely hitting > LUCENE-1044, which is fixed on the trunk version of Lucene (to be 2.4 > at some point) but

Atomicity and AutoCommit

2008-02-27 Thread Simon Wistow
I currently have a set up that indexes into RAM and then periodically merges that into a disk based index. Searches are done from the disk based index and deletes are handled by keeping a list of deleted documents, filtering out search results and applying the deletes to the index at merge tim

Some questions on transactions

2007-09-12 Thread Simon Wistow
I'm looking at doing a system which is looks something like this - I have an IndexSearcher open with a on-disk index but all writes go to a RAM based IndexWriter. Periodically I do 1. Close IndexSearcher 2. Open new IndexWriter in same location 3. Use addIndexes with old

Re: Recovering from a Crash

2007-07-25 Thread Simon Wistow
On Wed, Jul 25, 2007 at 05:49:41AM -0400, Michael McCandless said: > Ahhh, OK. But do you have a segments_N file? Yup. > Yes, this is perfect. This is the "simple" option I described. The > more complex option is to use a custom deletion policy which enables > you to safely do backups (even i

Re: Recovering from a Crash

2007-07-25 Thread Simon Wistow
On Wed, Jul 25, 2007 at 05:19:31AM -0400, Michael McCandless said: > It's somewhat spooky that you have a write.lock present because that > means you backed up while a writer was actively writing to the index > which is a bit dangerous because if the timing is unlucky (backup does > an "ls" but bef

Re: Recovering from a Crash

2007-07-25 Thread Simon Wistow
On Wed, Jul 25, 2007 at 10:08:56AM +0100, me said: > The data appears to be there - please tell me that I'm doing something > stupid and I can recover from this. It appears by deleting the write.lock files everything has recovered. Is this best practice? Have I just done something so terribly wr

Recovering from a Crash

2007-07-25 Thread Simon Wistow
We were affected by the great SF outage yesterday and apparently the indexing machine crashed without being shutdown properly. I've taken a backup of the indexes which has the usual smattering of write.lock segments.gen, .cfs, .fdt, .fnm and .fdx etc files and looks to be about the right size.

Performance of DbDirectory

2007-06-13 Thread Simon Wistow
I recently had a thought to do with DbDirectory - specifically would it be possible to use something like Oracle's inbuilt replication to have mutiple Reader machine being able to read the index with automatic partitioning, redundancy and failover? Also, what is performance like for DbDirectory

Re: Searching on a Rapidly changing Index

2007-05-24 Thread Simon Wistow
On Thu, May 24, 2007 at 09:28:30AM -0400, Erick Erickson said: > If that's unacceptable, you can *still* open up a new reader in the > background and warm it up before using it. "immediately" then > becomes 5-10 seconds or so. This is currently what I'm doing using a list of previous performed qu

Searching on a Rapidly changing Index

2007-05-24 Thread Simon Wistow
I've built a Lucene system that gets rapidly updated - documents are supposed to be searchable immeidately after they've been indexed. As such I have a Writer that puts new index, update and delete tasks into a queue and then has a thread which consumes them and applies them to the index using

Better parsing of Queries

2007-04-04 Thread Simon Wistow
I'm looking for some advice on dealing with malformed queries. If a user searches for "yow!" then I get an exception from the query parser. I can get round this by using QueryParser.escape(query) first but then that prevents them from searching using other bits of the the query syntax such as "

Re: flush, optimize and FileNotFound exceptions

2007-04-03 Thread Simon Wistow
On Tue, Apr 03, 2007 at 08:31:20AM -0400, Michael McCandless said: > Optimize actually does its own flush before optimizing, so you don't > need to call it yourself and in fact calling it after optimize will > just be a harmless no-op. Ah, that's good to know. > You should be worried about this

flush, optimize and FileNotFound exceptions

2007-04-03 Thread Simon Wistow
I have an Indexer which inserts tasks onto a queue and then has a thread which consumes the tasks (Index, Update or Delete) and executes them. If the Indexer is shut down it stops the thread, waits until it's finished its current task and then consumes any other tasks on the queue. Then it runs

Spelling Correction api

2007-01-10 Thread Simon Wistow
I've been reading through the spelling correction API and I'm confused. It looks like you tell it the directory to hold the spelling correction DB and then give it an IndexReader and a field to retrieve spelling suggestions from. But then I'd have to redo that operation everytime a new document

Re: Search Suggestions

2006-12-15 Thread Simon Wistow
On Fri, Dec 15, 2006 at 04:01:33PM +0530, Kapil Chhabra said: > I have implemented such a feature. Just to add on to what Bhavin said, > your results would be more relevant if you index only 2 & 3 token > phrases and display a 3 token suggestion if the current search keyword > consists of 2 toke

Search Suggestions

2006-12-14 Thread Simon Wistow
Yahoo! has a search suggestion feature so that if you search for say 'shoes' then it also reccomends payless shoes, jordan shoes, aldo shoes, nike shoes, bakers shoes and a bunch of others. Has anyone built something like that in Lucene? Simon ---

Re: Searching documents on big index by using ParallelMultiSearcher is slow...

2006-10-04 Thread Simon Wistow
On Wed, Oct 04, 2006 at 01:55:06PM +, eks dev said: > have you considered hadoop "light" mesagging RPC, should have > significantly smaller latencies than RMI Yes, it's one of the things I'm looking at. - To unsubscribe, e-

Re: Searching documents on big index by using ParallelMultiSearcher is slow...

2006-10-04 Thread Simon Wistow
On Wed, Oct 04, 2006 at 08:14:38AM -0400, Haines, Ronald C. (LNG-DAY) said: > I too am interested in learning more about a large scale distributed > Lucene model. I'm also building a large scale (billions of documents) Lucene index. Prelimary experimentation with a RemoteSearch/ParallelMultiSear