not the length that matters, but the content

2009-05-02 Thread Seid Mohammed
I want documents which have more number of query term matches to be returned, not the one with less number of terms to matter. that is, if I submit the query " the quickbrown fox" have two documents doc1: brown fox doc2: the quick brown fox jumps over the lazzy dog. I want the search result be doc2

Re: Search result ordering

2009-05-02 Thread Michael McCandless
Lucene's field sorting will be faster in 2.9. First, the warming time of a reopened reader (after a writer has committed some changes) will typically be very much faster. Second, you'll be able to optionally turn off score computation when sorting by field. Finally, the actual cost of sorting pe

Re: not the length that matters, but the content

2009-05-02 Thread Kamal Najib
I think one way to realize it is to run a phrase query.In your example for example: if you run phrasequery with " the quick brown fox" you will only have maches like "*"+"the quickbrown fox"+"*",the * is any other string. that's mean a doc will be considered as a match only if the doc contans

REPOST from another list: Question related to improving search results

2009-05-02 Thread Aditya
Hi, New to this group. Question: Generally sites like wikipeadia have a template and every page follows it. These templates contains the word that occurs in every page. For example wikipedia template has the list of language in the left panel. Now these words gets indexed every tim

Re: REPOST from another list: Question related to improving search results

2009-05-02 Thread Vaijanathrao
Hi Aditya, You can you any HTML parser if you are getting/crawling an page from wikipedia and ignore those sections which are repetitive. If you are using Jericho parser here is what you can do. URL u = new URL("any english wikipedia page"); Source src = new Source(u.openConnecti

Re: REPOST from another list: Question related to improving search results

2009-05-02 Thread Michael McCandless
Why not remove that content from every doc during indexing? Or, if that's too harsh, you could massively reduce the score for hits in that section, eg during indexing store payloads on those term occurrences falling within the common section, and then use BoostingTermQuery to down-weight those hit

Re: not the length that matters, but the content

2009-05-02 Thread Seid Mohammed
thanks it solves On 5/2/09, Kamal Najib wrote: > I think one way to realize it is to run a phrase query.In your example for > example: > if you run phrasequery with " the quick brown fox" you will only have > maches like "*"+"the quickbrown fox"+"*",the * is any other string. that's > mean a doc