Re: question about grouping text

2009-03-25 Thread Grant Ingersoll
Hi MFM, This comes down to a preprocessing step that you would have to do before putting into Lucene, although I suppose you might be able to identify it during analysis and use the TeeTokenFilter and the SinkTokenizer. Once you do this, then you can add them as fields on a Document. I

Re: Assertion Error in TermsHashPerField.comparePostings - Lucene 2.4

2009-03-25 Thread Jason Rutherglen
LuceneError when executed should reproduce the failure. The contrib/benchmark libraries are required. MultiThreadDocAdd is a multithreaded indexing utility class. On Wed, Mar 25, 2009 at 1:06 PM, Jason Rutherglen < jason.rutherg...@gmail.com> wrote: > Each document is being created in a single

Re: MergePolicy public but SegmentInfos package protected?

2009-03-25 Thread Marvin Humphrey
On Wed, Mar 25, 2009 at 06:15:35AM -0400, Michael McCandless wrote: > I'm torn. MergePolicy (and MergeScheduler) are "expected" to be > something expert users could alter; their API is designed to be > exposed & stable. I think they should be visilbe in the javadocs. > > But, unfortunately, to

Deadlock with concurrent merges and IndexWriter [Lucene 2.4]

2009-03-25 Thread Jeremy Volkman
Just ran into this. I'm using Lucene 2.4 in the following manner: 1. Open IndexWriter 2. Add documents 3. Delete documents 4. Close IndexWriter I haven't touched the out-of-the-box settings WRT merging. A JVM stacktrace shows the following: "Lucene Merge Thread #0" daemon prio=10 tid=0x5

Re: Assertion Error in TermsHashPerField.comparePostings - Lucene 2.4

2009-03-25 Thread Jason Rutherglen
Each document is being created in a single thread, and the fields of the document are not being updated elsewhere. I haven't posted the full code yet as it needs to cleaned up. Thanks Mike! On Tue, Mar 24, 2009 at 2:43 PM, Michael McCandless < luc...@mikemccandless.com> wrote: > It looks like y

Re: Lucene index architecture question

2009-03-25 Thread Ian Lea
My vote would be for 2 indexes, one on each server. We do this by having a master update-only index which happens to be on one of the search servers, but could be anywhere, and that is copied to read only versions on the 2 search servers, via rsync. When changes have been installed the search dae

Re: Lucene index architecture question

2009-03-25 Thread Danil Ε’ORIN
You can use solr (http://lucene.apache.org/solr/) Index on one machine and distribute the index to many. On Wed, Mar 25, 2009 at 18:18, kgeeva wrote: > > I have an application clustered on two servers. Is the best practice to have > two lucene indexes - one on each server for the app or is it bes

Lucene index architecture question

2009-03-25 Thread kgeeva
I have an application clustered on two servers. Is the best practice to have two lucene indexes - one on each server for the app or is it best to have one index (on one physical path) which can be shared by both servers? Both the indexes need to be in sync 24/7. I would need to do updates and sea

Re: query & doc boost difference

2009-03-25 Thread Erick Erickson
Could you provide more information about what you expect and what you are seeing? As well as an example of what you've tried? Just saying "it didn't work" doesn't give us much to go on Best Erick On Wed, Mar 25, 2009 at 5:02 AM, m.harig wrote: > > Hello all > Can anyone tell me what i

Re: query & doc boost difference

2009-03-25 Thread Ian Lea
In http://archives.devshed.com/forums/java-118/giving-different-boost-to-different-terms-in-a-same-document-2109816.html Erick quotes Hoss as saying Index time field boosts are a way to express things like "this documents title is worth twice as much as the title of most documents". Query time bo

Re: MergePolicy public but SegmentInfos package protected?

2009-03-25 Thread Michael McCandless
I'm torn. MergePolicy (and MergeScheduler) are "expected" to be something expert users could alter; their API is designed to be exposed & stable. I think they should be visilbe in the javadocs. But, unfortunately, to do their job they must use other package private APIs (SegmentInfos) which we i

Re: Term level boosting

2009-03-25 Thread Grant Ingersoll
In contrib/analysis there are also some TokenFilters that provide examples of using Payloads. See the org.apache.lucene.analysis.payloads package: http://lucene.apache.org/java/2_4_1/api/contrib-analyzers/org/apache/lucene/analysis/payloads/package-summary.html -Grant On Mar 24, 2009, at 4

query & doc boost difference

2009-03-25 Thread m.harig
Hello all Can anyone tell me what is the difference between query.setBoost() and doc.setBoost()... More over if use query.setBoost(4.0f) am not able to boost my results . which one makes my results better please anyone help me out of this... -- View this message in context:

Re: Index Partitioning

2009-03-25 Thread Shashi Kant
Thanks Chris, your suggestion is very appropriate and I am happy to share my work with the Lucene community, Regards, Shashi On Tue, Mar 24, 2009 at 7:15 PM, Chris Hostetter wrote: > > : This is perfect, exactly what I was looking for. Thanks much Andrzej! > > if you code that up and it works o