Lucene scoring and short fields

2008-02-07 Thread daniel rosher
Hi All, Given that Lucene scoring can favour shorter fields in documents, in the past we've had to pad out 'unreasonably' short fields to a set minimum (with basically nonsense words), I'm wondering how others might have dealt with this issue. Another option is to have a custom Similarity class

Re: Lucene scoring and short fields

2008-02-07 Thread Chris Hostetter
: (with basically nonsense words), I'm wondering how others might have : dealt with this issue. : : Another option is to have a custom Similarity class with an altered : lengthNorm method? that is what i would recommend ... it's exactly what SweetSpotSimilarity does (you define a platuea of

Which analyzer

2008-02-07 Thread spring
Hi, I have a huge number of documents which contain mainly numbers and dates (german format dd.MM.), like this: Tgr. gilt ab 01.01.99 01.01.99 01.01.99 01.01.99 01.01.99 01.01.99 01.01.99 01.01.99 01.01.99 01.01.99 01.01.99 01.01.99 46X0 01 0480101080512070010

Distributed Indexes

2008-02-07 Thread Ruslan Sivak
I'm wondering if this is a problem that lucene users have already tackled. I have four copies of the application using a lucene index. They are located on two physical servers with two copies on each server accessing two copies of the lucene index. I use Windows FRS (File Replication

Re: Performance guarantees and index format

2008-02-07 Thread Chris Hostetter
: I think this would be too messy - currently we can be sure of the simple rule : that documents added to the index get incrementally higher docids, i.e. the : higher the docid the more recent is the document. I think it would be much : simpler to write a FilterIndexReader that simply reverses

Re: Distributed Indexes

2008-02-07 Thread Ruslan Sivak
My index is only 4mb. Is there a SQL backend for Lucene? Russ Michael McCandless wrote: If you're able to tell Windows FRS which specific files to copy, then SnapshotDeletionPolicy (in 2.3) should work for this. It basically protects a consistent snapshot of your index, ensuring those

Re: Which analyzer

2008-02-07 Thread Erick Erickson
*How* do you want to search them? If it's simply exact matches, then WhitespaceAnalyzer should work fine. But if you want to, for example, look at date ranges or number ranges, you'll have to be more clever. What do you want to accomplish? Best Erick On Feb 7, 2008 3:25 PM, [EMAIL PROTECTED]

Re: Distributed Indexes

2008-02-07 Thread Michael McCandless
If you're able to tell Windows FRS which specific files to copy, then SnapshotDeletionPolicy (in 2.3) should work for this. It basically protects a consistent snapshot of your index, ensuring those files will not be deleted, while not blocking further updates to the index. Mike Ruslan

Re: Distributed Indexes

2008-02-07 Thread Ruslan Sivak
No, FRS copies the whole directory. It's fairly fast, but if there is a modification on both servers at the same time, there will be issues. Russ Michael McCandless wrote: If you're able to tell Windows FRS which specific files to copy, then SnapshotDeletionPolicy (in 2.3) should work for

Re: Distributed Indexes

2008-02-07 Thread Erick Erickson
With an index that small, I wonder why you bother with so many copies? What kind of load are you hitting it with and how complex are the queries? Because unless you have *very* high query rate, I'd look at why my queries were taking so long before complexifying things this way. Best Erick On

Lucene syntax query matched against a string content

2008-02-07 Thread Nilesh Bansal
Hi, I want to create a function, which takes in a query string (in lucene syntax), and a string as content and returns back if the query matches the content or not. This would mean, query = +(apache) +(lucene OR httpd) will match content = HTTPD by Apache foundation is one of the most popular