Hi,
if you want to store word+value pairs then use lucene scoring to weight
the words with higher vaules against them, you should look at using payloads
and the DelimitedPayloadTokenFilter which lets you specify e.g.
word1|value1 word2|value2 ...
and the values are stored as payloads against the w
hi
when you are indexing, use termvectors
org.apache.lucene.document.Field.TermVector
set this in the Field object constructor when you create your Field objects
at index time.
i've never done it but i'm pretty sure these can be retrieved at
search time using one of the
IndexReader.getTermFreqVec
Hi Li Li
If you want to support some query types and not others you should
overide/extend the queryparser so that you throw an exception / makes
a different query type instead.
Similarity doesn't do the actual scoring, it's used by the Query
classes (actually the Scorer implementation used by the
Hi,
i was looking at another post which had this presentation in - it has
a nice section
on termfreqvectors:
http://www.cnlp.org/presentations/slides/advancedluceneeu.pdf
bec :)
On 2 June 2010 13:56, Rebecca Watson wrote:
> hi
>
> when you are indexing, use te
There are index time boosts ie calculated at index time and search
time boosts. The field f always relates to the field(s) that the term
t appears in.
My understanding is that--
Norm(t,d) includes the index time boosts for each field but I think t
is only included in this calc in terms of field.ge
Hi aad,
See the search.payload package if you want examples of reading in payloads
at query time for scoring purposes, but returning the payload/ using it to
highlight will require you to write more custom lucene classes.
We work with synonyms too, but rather than store the synonym in payload lik
Eg 'institute' regardless of which
term in the set matched.
Ie we know the first term in the same position
Was the original one, but leverage the in built
Highlighter for simplicity (ESP in solr).
bec :)
Sent from my iPhone
On 09/06/2010, at 10:32 AM, Rebecca Watson wrote:
Hi aad
hi,
i had similar issues migrating to using the new collectors... we use a custom
hitcollector too where we accessed document fields to aid in scoring docs.
when migrating - i chose to extend the Collector class where:
.collect method still extended pretty much as before
in the new abstract met
discusses global docid/docid for current index in the example:
http://lucene.apache.org/java/2_9_0/api/all/index.html
bec :)
On 12 June 2010 10:52, Rebecca Watson wrote:
> hi,
>
> i had similar issues migrating to using the new collectors... we use a custom
> hitcollector too where
i guess you are using lucene 2.9 or below if you're talking about
Tokens still...
here's some old code i used to use (not sure if i wrote it or grabbed it from
online examples - its been a while since i used it!)
that grabbed the set of tokens given field name +
text to analyse (for any class that
hi alex,
sounds like you are going to tackle a similar problem to what we're
trying to do
in our XML too -- as it looks like you've got a one-to-many type relationship
you want to search over but return based on the top-level document -- similar to
an an XML i.e. structured doc search problem
--
hi li,
i looked at doing something similar - where we only index the text
but retrieve search results / highlight from files -- we ended up giving
up because of the amount of customisation required in solr -- mainly
because we wanted the distributed search functionality in solr which
meant making
hi,
> 1) Although Lucene uses tf to calculate scoring it seems to me that term
> frequency has not been normalized. Even if I index several documents, it
> does not normalize tf value. Therefore, since the total number of words
> in index documents are varied, can't there be a fault in Lucene's sc
13 matches
Mail list logo