March 2013, Apache Lucene™ 4.2 available
The Lucene PMC is pleased to announce the release of Apache Lucene 4.2
Apache Lucene is a high-performance, full-featured text search engine
library written entirely in Java. It is a technology suitable for
nearly any application that requires full-text sea
Hi,
I was following tutorail at
http://searchhub.org/2009/05/26/accessing-words-around-a-positional-match-in-lucene/
for couting number of spans of a query in a document.
But the defination of getSpan(IndexReader) in the SpanQuery is changed
to getSpan(IndexReaderContext, Bits, Map) with no inform
On 03/11/2013 01:22 PM, Michael McCandless wrote:
On Mon, Mar 11, 2013 at 9:32 AM, Carsten Schnober
wrote:
Am 11.03.2013 13:38, schrieb Michael McCandless:
On Mon, Mar 11, 2013 at 7:08 AM, Uwe Schindler wrote:
Set the rewrite method to e.g. SCORING_BOOLEAN_QUERY_REWRITE,
If you are interested, here is the solution with the "fake" query as rewrite.
Just use GetTermsRewrite as rewrite method. The MTQ then rewrites to
TermHolderQuery (cast to that) and you can get the terms using getTerms():
/** A fake query that is just used to collect all term instances for the
I think we have here different problems:
Carsten wants to just collect the terms a MTQ visits, so using BooleanQuery to
do this is fine, unless you hit the limit. If you don’t execute the query, the
limit can be as high as possible (but it’s a static limit affecting all
instances). To do the sa
On Mon, Mar 11, 2013 at 9:32 AM, Carsten Schnober
wrote:
> Am 11.03.2013 13:38, schrieb Michael McCandless:
>> On Mon, Mar 11, 2013 at 7:08 AM, Uwe Schindler wrote:
>>
>>> Set the rewrite method to e.g. SCORING_BOOLEAN_QUERY_REWRITE, then this
>>> should work (after rewrite your query is a Boole
Great links. Thanks Ian.
Good to know that Lucene v4, has a smaller heap foot print.
On Mon, Mar 11, 2013 at 11:18 AM, Ian Lea wrote:
> It's not that simple. More to do with number of terms than raw index
> size. Of course your large index may well have more terms than a
> smaller one.
>
> S
Am 11.03.2013 13:38, schrieb Michael McCandless:
> On Mon, Mar 11, 2013 at 7:08 AM, Uwe Schindler wrote:
>
>> Set the rewrite method to e.g. SCORING_BOOLEAN_QUERY_REWRITE, then this
>> should work (after rewrite your query is a BooleanQuery, which supports
>> extractTerms()).
>
> ... as long a
Am 11.03.2013 14:13, schrieb Uwe Schindler:
>> Regarding the application of IndexSearcher.rewrite(Query) instead: I don't
>> see a way to set the rewrite method there because the Query's rewrite
>> method does not seem to apply to IndexSearcher.rewrite().
>
> Replace:
>> BooleanQuery bq = (Boolea
> Set terms = new HashSet<>();
> MultiTermQuery query = new RegexpQuery(new Term("text", query));
> query.setRewriteMethod(MultiTermQuery.SCORING_BOOLEAN_QUERY_RE
> WRITE);
> BooleanQuery bq = (BooleanQuery) query.rewrite(reader);
> bq.extractTerms(terms);
>
>
> Regarding the application of Index
On Mon, Mar 11, 2013 at 7:08 AM, Uwe Schindler wrote:
> Set the rewrite method to e.g. SCORING_BOOLEAN_QUERY_REWRITE, then this
> should work (after rewrite your query is a BooleanQuery, which supports
> extractTerms()).
... as long as you don't exceed the max number of terms allowed by BQ
(10
On Mon, Mar 11, 2013 at 7:33 AM, Nils Knappmeier
wrote:
> Hi,
>
>> This is tricky.
>>
>> You could build a separate suggester per category/zip code (or,
>> possibly prefix-code each suggestion with the category/zip code into
>> one suggester), but likely this will blow up (ie, if the same
>> sugge
Am 11.03.2013 12:08, schrieb Uwe Schindler:
> This works for this query, but in general you have to rewrite until it is
> completely rewritten: A while loop that exits when the result of the rewrite
> is identical to the original query. IndexSearcher.rewrite() does this for
> you.
>
>> 3. Wri
Hi,
This is tricky.
You could build a separate suggester per category/zip code (or,
possibly prefix-code each suggestion with the category/zip code into
one suggester), but likely this will blow up (ie, if the same
suggestion often appears across zip codes / categories). If your
suggestions are
On Mon, Mar 11, 2013 at 3:41 PM, Uwe Schindler wrote:
> In that case, it should be fine. Otherwise you would need to reindex.
>
> Thank you Uwe.
> -
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: u...@thetaphi.de
>
>
> > -Original Message-
>
Hi,
> Hi,
> I'm trying to get the terms that match a certain RegexpQuery. My (naive)
> approach:
>
> 1. Create a RegexpQuery from the queryString (e.g. "abc.*"):
> Query q = new RegexpQuery(new Term("text", queryString));
>
> 2. Rewrite the Query using the IndexReader reader:
> q = q.rewrite(rea
On Mon, Mar 11, 2013 at 6:31 AM, Nils Knappmeier
wrote:
> Dear all,
>
> I have a request to implement an auto-suggest feature for our lucene based
> product.
> We have upgraded to Lucene 4.1 and intend to use the AnalyzingSuggester, but
> we cannot determine the correct way of using it for our req
You could call the .getTermsEnum() on the query itself, and then step
through the terms and save them?
But this method is protected ... so you could make a subclass w/ a new
method that calls it and returns it to you.
Mike McCandless
http://blog.mikemccandless.com
On Mon, Mar 11, 2013 at 6:41 A
Hi,
I'm trying to get the terms that match a certain RegexpQuery. My (naive)
approach:
1. Create a RegexpQuery from the queryString (e.g. "abc.*"):
Query q = new RegexpQuery(new Term("text", queryString));
2. Rewrite the Query using the IndexReader reader:
q = q.rewrite(reader);
3. Write the ter
Dear all,
I have a request to implement an auto-suggest feature for our lucene
based product.
We have upgraded to Lucene 4.1 and intend to use the AnalyzingSuggester,
but we cannot determine the correct way of using it for our request.
We have problems with two aspects:
1) The suggester shou
In that case, it should be fine. Otherwise you would need to reindex.
-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de
> -Original Message-
> From: Uwe Schindler [mailto:u...@thetaphi.de]
> Sent: Monday, March 11, 2013 8:42 AM
> To:
On Mon, Mar 11, 2013 at 1:11 PM, Uwe Schindler wrote:
> If you use StandardAnalyzer, you are in trouble unless you use
> StandardAnalyzer with Version.LUCENE_23 and you are using non-western
> language. If you change your code to use Version.LUCENE_41, you have to
> reindex.
>
> Thank you Uwe. We
It's not that simple. More to do with number of terms than raw index
size. Of course your large index may well have more terms than a
smaller one.
See http://blog.mikemccandless.com/2010/07/lucenes-ram-usage-for-searching.html
and
http://searchhub.org/2011/09/14/estimating-memory-and-storage-
This character lies in the CJK_UNIFIED_IDEOGRAPHS_EXTENSION_A block.
Added extensions detection, I assume (not really knowing) that all of these
characters are not phonetic as well.
import java.lang.Character.UnicodeBlock;
import java.util.Arrays;
import java.util.HashSet;
import java.util.Set;
i
If you use StandardAnalyzer, you are in trouble unless you use StandardAnalyzer
with Version.LUCENE_23 and you are using non-western language. If you change
your code to use Version.LUCENE_41, you have to reindex.
-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eM
25 matches
Mail list logo