On Mon, Apr 2, 2012 at 5:33 PM, Michael McCandless
wrote:
> Hmm that's odd.
>
> If the scores were identical I'd expect different sort order, since we
> tie-break by internal docID.
>
> But if the scores are different... the insertion order shouldn't
> matter. And, the score should not change as
Hmm that's odd.
If the scores were identical I'd expect different sort order, since we
tie-break by internal docID.
But if the scores are different... the insertion order shouldn't
matter. And, the score should not change as a function of insertion
order...
Do you have a small test case?
Mike
We've observed something that, in some ways, is not surprising.
If you take a set of documents that are close in 'score' to some query,
and shuffle them in different orders
and then see what results you get in what order from the reference query,
the scores will vary according to the insertio
Sorry Mike,
I pasted the old code. I've already included something like this to index
with TermVector:
String xpto = fr.toString();
doc.add(new Field("contents2", xpto,
Field.Store.YES,
Field.Index.ANALYZED,
Field.TermVector.YES))
As far as I can see, you are not indexing term vectors in the code
below? Your Fields don't have TermVector.*...
Can you boil this down to a small test case showing the missing term
vector files...?
Mike McCandless
http://blog.mikemccandless.com
On Mon, Apr 2, 2012 at 1:28 PM, Luis Paiva wrot
Thank you for your help.
I still haven't found a solution yet. I'm copying all my code below.
BTW, I'm working with lucene version 3.5.0
@Mike: Yes i do close it :) I have some files created, that are: .fdt, .fdx,
.fnm, .frq, .nrm, .prx, .tii, .tis.
Don't know why the files T* are not created.
On 29/03/2012 11:14, Andrzej Bialecki wrote:
The problem in our implementation is that we use a within-document term
frequency (the number of occurrences of t in the current document) and
not a collection-wide term frequency... so, it looks to me that the fix
would be to first fully traverse the
> Mark McGuire wrote:
> I'm working on a project where I need to tag both the part of speech and
> other syntactic information on tokens
To pick up on this thread from a few weeks back.
I've never done this myself, but I think that your desire to put extra
information that is not really a token