Question about idf computation for different fields

2014-06-18 Thread Boyan Liu
Hello, I have a question about idf computation for different fields: As we know, idf = Math.log(numDocs/(docFreq+1)) + 1.0 docFreq is field specific, however, numDocs is a shared number for all fields. for example: Assume there are 1M docs, mean numDocs=10^6 all of the docs have field_1, but only

searching multiple remote indices

2014-06-18 Thread Christian Reuschling
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi, we currently migrate from Lucene 3.5.0 to Lucene 4. So far so good, but in one project we have the need to access multiple indices, that can be also remote ones. In the past, we solved this by using the Searcher interface, and implemented a subcl

Re: Lucene QueryParser/Analyzer inconsistency

2014-06-18 Thread Luis Pureza
Thanks, that did work. On Tue, Jun 17, 2014 at 8:49 PM, Jack Krupansky wrote: > Yeah, this is kind of tricky and confusing! Here's what happens: > > 1. The query parser "parses" the input string into individual source > terms, each delimited by white space. The escape is removed in this > proc

Re: IndexWriter#updateDocument(Term, Document)

2014-06-18 Thread Michael McCandless
Your first case is supposed to work; if it doesn't it's a bad bug :) Can you reduce it to a small example? Mike McCandless http://blog.mikemccandless.com On Wed, Jun 18, 2014 at 10:08 AM, Clemens Wyss DEV wrote: > I would like to perform a batch update on an index. In order to omit > duplica

IndexWriter#updateDocument(Term, Document)

2014-06-18 Thread Clemens Wyss DEV
I would like to perform a batch update on an index. In order to omit duplicate entries I am making use of IndexWriter#updateDocument(Term, Document) open an IndexWriter; foreach( element in elementsToBeUpdatedWhichHaveDuplicates ) { doc = element.toDoc(); indexWriter.updateDocument( uniqueTermFor

RE: Search degradation on Windows when upgrading from lucene 3.6 to lucene 4.7.2

2014-06-18 Thread De Simone, Alessandro
Hi! We have switched from Lucene 3.6 to >=Lucene 4.7 (java7) and we are also experiencing a distinct slowdown using the same dataset. We are running the software under Windows 2008R2. In our case, we have identified that there a lot more IO calls (= number of time the buffer is refilled in Ind