Re: Repeatability of results

2012-04-02 Thread Benson Margulies
On Mon, Apr 2, 2012 at 5:33 PM, Michael McCandless wrote: > Hmm that's odd. > > If the scores were identical I'd expect different sort order, since we > tie-break by internal docID. > > But if the scores are different... the insertion order shouldn't > matter.  And, the score should not change as

Re: Repeatability of results

2012-04-02 Thread Michael McCandless
Hmm that's odd. If the scores were identical I'd expect different sort order, since we tie-break by internal docID. But if the scores are different... the insertion order shouldn't matter. And, the score should not change as a function of insertion order... Do you have a small test case? Mike

Repeatability of results

2012-04-02 Thread Benson Margulies
We've observed something that, in some ways, is not surprising. If you take a set of documents that are close in 'score' to some query, and shuffle them in different orders and then see what results you get in what order from the reference query, the scores will vary according to the insertio

RE: TVD, TVX and TVF files

2012-04-02 Thread Luis Paiva
Sorry Mike, I pasted the old code. I've already included something like this to index with TermVector: String xpto = fr.toString(); doc.add(new Field("contents2", xpto, Field.Store.YES, Field.Index.ANALYZED, Field.TermVector.YES))

Re: TVD, TVX and TVF files

2012-04-02 Thread Michael McCandless
As far as I can see, you are not indexing term vectors in the code below? Your Fields don't have TermVector.*... Can you boil this down to a small test case showing the missing term vector files...? Mike McCandless http://blog.mikemccandless.com On Mon, Apr 2, 2012 at 1:28 PM, Luis Paiva wrot

RE: TVD, TVX and TVF files

2012-04-02 Thread Luis Paiva
Thank you for your help. I still haven't found a solution yet. I'm copying all my code below. BTW, I'm working with lucene version 3.5.0 @Mike: Yes i do close it :) I have some files created, that are: .fdt, .fdx, .fnm, .frq, .nrm, .prx, .tii, .tis. Don't know why the files T* are not created.

Re: delete entries from posting list Lucene 4.0

2012-04-02 Thread Andrzej Bialecki
On 29/03/2012 11:14, Andrzej Bialecki wrote: The problem in our implementation is that we use a within-document term frequency (the number of occurrences of t in the current document) and not a collection-wide term frequency... so, it looks to me that the fix would be to first fully traverse the

RE: Lucene 4 - POS and Syntactic Tagging

2012-04-02 Thread Paul Hill
> Mark McGuire wrote: > I'm working on a project where I need to tag both the part of speech and > other syntactic information on tokens To pick up on this thread from a few weeks back. I've never done this myself, but I think that your desire to put extra information that is not really a token