I want documents which have more number of query term matches to be
returned, not the one with less number of terms to matter.
that is, if I submit the query " the quickbrown fox" have two
documents doc1: brown fox
doc2: the quick brown fox jumps over the lazzy dog.
I want the search result be doc2
Lucene's field sorting will be faster in 2.9.
First, the warming time of a reopened reader (after a writer has
committed some changes) will typically be very much faster. Second,
you'll be able to optionally turn off score computation when sorting
by field. Finally, the actual cost of sorting pe
I think one way to realize it is to run a phrase query.In your example for
example:
if you run phrasequery with " the quick brown fox" you will only have maches
like "*"+"the quickbrown fox"+"*",the * is any other string. that's mean a doc
will be considered as a match only if the doc contans
Hi,
New to this group.
Question:
Generally sites like wikipeadia have a template and every page follows it.
These templates contains the word that occurs in every page.
For example wikipedia template has the list of language in the left panel.
Now these words gets indexed every tim
Hi Aditya,
You can you any HTML parser if you are getting/crawling an page from wikipedia
and ignore those sections which are repetitive.
If you are using Jericho parser here is what you can do.
URL u = new URL("any english wikipedia page");
Source src = new Source(u.openConnecti
Why not remove that content from every doc during indexing?
Or, if that's too harsh, you could massively reduce the score for hits
in that section, eg during indexing store payloads on those term
occurrences falling within the common section, and then use
BoostingTermQuery to down-weight those hit
thanks it solves
On 5/2/09, Kamal Najib wrote:
> I think one way to realize it is to run a phrase query.In your example for
> example:
> if you run phrasequery with " the quick brown fox" you will only have
> maches like "*"+"the quickbrown fox"+"*",the * is any other string. that's
> mean a doc