Re: Tf and Df in lucene

2015-06-15 Thread Shay Hummel
Erick and Ahmet - thank you Shay On Mon, Jun 15, 2015 at 6:19 PM Ahmet Arslan wrote: > Hi, > > If you are interested in summed up tf values of multiple terms, > I suggest to extend SimilarityBase class to return raw freq as score. > > float score(BasicStats stats, float freq, float docLen){ > r

Re: CachingWrapperQuery performance

2015-06-15 Thread Adrien Grand
Hi Anton, Thanks for reporting this. It is indeed a bit surprising given that both classes work in a very similar way. Can you confirm that the response times that you are reporting both happen on Lucene 5.2 (even with CachingWrapperFilter) and on a "hot cache" (so that they don't include the gene

Re: Tf and Df in lucene

2015-06-15 Thread Ahmet Arslan
Hi, If you are interested in summed up tf values of multiple terms, I suggest to extend SimilarityBase class to return raw freq as score. float score(BasicStats stats, float freq, float docLen){ return freq; } When you use this similarity, search for three term query, scores will summed tf val

Re: Tf and Df in lucene

2015-06-15 Thread Erick Erickson
In a word, no. Terms are, by definition, whatever a "token" is. Tokens are delimited by, say, the WhitespaceTokenizer so a-priori can't do what you want. Unless... you do "something special". In this case, "something special" would be put shingles (See ShingleFilter in Lucene or ShingleFilterFacto

Re: Tf and Df in lucene

2015-06-15 Thread Shay Hummel
Hi Ahmet Thank you for the reply. Can the term reflect a multi word expression? For example: I want to find the term frequency \ document frequency of "united states" (two terms) or "free speech zones" (three terms). Shay On Mon, Jun 15, 2015 at 4:55 PM Ahmet Arslan wrote: > Hi Hummel, > > reg

Re: Tf and Df in lucene

2015-06-15 Thread Ahmet Arslan
Hi Hummel, regarding df, Term term = new Term(field, word); TermStatistics termStatistics = searcher.termStatistics(term, TermContext.build(reader.getContext(), term)); System.out.println(query + "\t totalTermFreq \t " + termStatistics.totalTermFreq()); System.out.println(query + "\t docFreq \t

[ANNOUNCE] Apache Lucene 5.2.1 released

2015-06-15 Thread Shalin Shekhar Mangar
15 June 2015, Apache Luceneā„¢ 5.2.1 available The Lucene PMC is pleased to announce the release of Apache Lucene 5.2.1 Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-t

CachingWrapperQuery performance

2015-06-15 Thread Anton Lyska
Hi, I have performance issues with CachingWrapperQuery with lucene 5.2 and dont know how to solve it. Prehistory: I have search with different parameters, where some parameters are used more frequently then others. For these params I used filters(and cached them), and my search looked li