Re: SpanTermQuery getSpans

2014-04-02 Thread Martin Líška
Gregory, that was indeed my problem. Thank you very much for your support. Martin This is a reply to http://mail-archives.apache.org/mod_mbox/lucene-java-user/201404.mbox/%3CCAASL1-8jRbEG%3DLi96eDLY-Pr_zwev6vk4vk4BW_ryKF1Dnb4KA%40mail.gmail.com%3E On 1 April 2014 23:52, Martin Líška wrote: >

RE: Strange behavior of ShingleFilter in Lucene 4.6

2014-04-02 Thread Allison, Timothy B.
I agree entirely with Robert about not doubling up on the filter, wrapper. To stop unigrams, consider setOutputUnigrams(false). -Original Message- From: Robert Muir [mailto:rcm...@gmail.com] Sent: Wednesday, April 02, 2014 2:50 PM To: java-user Subject: Re: Strange behavior of ShingleFi

Re: Strange behavior of ShingleFilter in Lucene 4.6

2014-04-02 Thread Robert Muir
either remove the shingleanalyzer or the additional filter... On Wed, Apr 2, 2014 at 2:44 PM, Natalia Connolly wrote: > Hi Robert, > >No, I did not… I just needed the filter to stop it from outputting > unigrams; otherwise I was getting "This", "this is", "is", "is a ", and so > on. Is ther

Re: Strange behavior of ShingleFilter in Lucene 4.6

2014-04-02 Thread Robert Muir
Did you really mean to shingle twice (shingleanalyzerwrapper just wraps the analyzer with a shinglefilter, then the code wraps that with another shinglefilter again) ? On Wed, Apr 2, 2014 at 1:42 PM, Natalia Connolly wrote: > Hello, > >I am very confused about what ShingleFilter seems to be d

Re: Strange behavior of ShingleFilter in Lucene 4.6

2014-04-02 Thread Natalia Connolly
Hi Robert, No, I did not… I just needed the filter to stop it from outputting unigrams; otherwise I was getting "This", "this is", "is", "is a ", and so on. Is there another way I could do it? Thank you, Natalia On Wed, Apr 2, 2014 at 2:40 PM, Robert Muir wrote: > Did you really

Strange behavior of ShingleFilter in Lucene 4.6

2014-04-02 Thread Natalia Connolly
Hello, I am very confused about what ShingleFilter seems to be doing in Lucene 4.6. What I would like to do is extract all possible bigrams from a sentence. So if the sentence is "This is a dog", I want "This is", "is a ", "a dog". Here is my code: StringTokenizer itr = new StringTok

[ANNOUNCE] Apache Lucene 4.7.1 released

2014-04-02 Thread Steve Rowe
April 2014, Apache Lucene™ 4.7.1 available The Lucene PMC is pleased to announce the release of Apache Lucene 4.7.1 Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-