The Lucene PMC is pleased to announce the release of Apache Lucene 9.6.0.
Apache Lucene is a high-performance, full-featured search engine library
written entirely in Java. It is a technology suitable for nearly any
application that requires structured search, full-text search, faceting,
neares
23 May 2022 - Apache Lucene™ 9.2.0 available
The Lucene PMC is pleased to announce the release of Apache Lucene 9.2.0.
Apache Lucene is a high-performance, full-featured search engine library
written entirely in Java. It is a technology suitable for nearly any
application that requires structu
around the IntervalQuery that boosts by the
> number of terms added as sibling should clauses? Other suggestions?
>
> Uwe
>
> -
> Uwe Schindler
> Achterdiek 19, D-28357 Bremen
> https://www.thetaphi.de <https://www.thetaphi.de/>
> eMail: u...@thetaphi.de
Your filtered query should work the same as a SpanFirst, yes. I didn’t add a
shortcut just because you can do it this way, but feel free to add it if you
think it’s useful!
Re sloppy phrases, this one is trickier. The closest you can get at the moment
is an unordered near, but that’s not the
## 24 March 2020, Apache Lucene 8.5.0 available
The Lucene PMC is pleased to announce the release of Apache Lucene 8.5.0.
Apache Lucene is a high-performance, full-featured text search engine library
written entirely in Java. It is a technology suitable for nearly any
application that requires
4 April 2018, Apache Lucene™ 7.3.0 available
The Lucene PMC is pleased to announce the release of Apache Lucene 7.3.0
Apache Lucene is a high-performance, full-featured text search engine
library
written entirely in Java. It is a technology suitable for nearly any
application
that requires full
You can use SearcherLifetimeManager to keep track of specific IndexSearcher
instances - see Mike’s blog at
http://blog.mikemccandless.com/2011/11/searcherlifetimemanager-prevents-broken.html
<http://blog.mikemccandless.com/2011/11/searcherlifetimemanager-prevents-broken.html>
Alan Wo
Hi Bernd,
You add a separate StoredField with the same name.
> On 25 Oct 2017, at 11:11, Bernd Fehling
> wrote:
>
> With Lucene 6.6.2 I'm trying to get a LongPoint value indexed and stored.
>
> Old code:
> LegacyLongField dateField = new LegacyLongField("modified", lastModified,
> Field.Stor
javascript functions.
Alan Woodward
www.flax.co.uk
> On 12 Oct 2017, at 23:25, Michael McCandless
> wrote:
>
> Hi Mike,
>
> It looks like FunctionValues is a very old API used by many function
> queries, while DoubleValuesSource is relatively new (introduced in
> https:
You have a LowercaseFilter before your SynonymFilter, which means that the
entities in your SynonymMap need to be all lowercase or they won’t be matched.
Alan Woodward
www.flax.co.uk
> On 25 Jul 2017, at 07:52, Christian Kaufhold
> wrote:
>
> Hi,
>
> I am not able to a
The contract to create a Weight is to repeatedly call rewrite() until the query
is no longer changing, and then call createWeight -
IndexSearcher.createNormalizedWeight() will do this for you.
Alan Woodward
www.flax.co.uk
> On 6 Jul 2017, at 12:34, Ranganath B N wrote:
>
> Th
You need to call SpanNearQuery.rewrite(), and then call createWeight() on the
resulting query.
Alan Woodward
www.flax.co.uk
> On 6 Jul 2017, at 11:54, Ranganath B N wrote:
>
> Hi Adrien,
>
> This SpanQuery spt2 will be a component of the SpanQueryarray input to
> the
Hi,
You should be able to use AnalyzerWrapper for this, adding your TokenFilters in
wrapComponents().
Alan Woodward
www.flax.co.uk
> On 23 Jun 2017, at 14:33, Nicola Buso wrote:
>
> Hi,
>
> maybe it's a known question but I could not find and answer.
> I need to base
Hi Michael,
You want to set the positionIncrementGap - either wrap your analyzer with an
AnalyzerWrapper that overrides getPositionIncrementGap(), or use a
CustomAnalyzer builder and set it there.
Alan Woodward
www.flax.co.uk
> On 12 Jan 2017, at 10:57, Michael Wilkowski wrote:
>
>
I’ve done this before by appending a special token to text fields via a
TokenFilter. It hasn’t caused a noticeable problem with term stats, and
field:* still works because the token is only added if the document in question
actually has data in that particular field.
Alan Woodward
Using StandardTokenizer should remove punctuation as well.
Alan Woodward
www.flax.co.uk
> On 28 Nov 2016, at 16:06, Thomas Johnson wrote:
>
> We are using Lucene 5.0. Some of our documents are getting indexed with a
> comma after the value. For example “John Doe, bob smith, and
Hi Chris,
I’ve been working sporadically on a webservice API called marple:
https://github.com/flaxsearch/marple <https://github.com/flaxsearch/marple>.
Very much a project in development, but more testers and contributors are
always welcome!
Alan Woodward
www.flax.co.uk
> On 10
You need to use a SortedNumericDocValuesField, which allows for multiple
numeric values to be stored per-document. I’m not sure if that’s in Lucene
5.0, though, you may need to upgrade to something more recent.
Alan Woodward
www.flax.co.uk
> On 31 Oct 2016, at 15:34, Fielder, Todd Patr
Hi Rob, I think you posted this to the wrong mailing list?
Alan Woodward
www.flax.co.uk
> On 28 Oct 2016, at 12:13, Rob Audenaerde wrote:
>
> Hi all,
>
> I have a DataTable which, in onConfigure(), sets a selected item. I want
> another (detail) panel, outside of this comp
Hi,
You need to add a NumericDocValuesField here as well - Point is for searching,
Stored is for display.
Alan Woodward
www.flax.co.uk
> On 21 Oct 2016, at 10:54, Ludovic Bertin wrote:
>
> Hi,
>
> When I'm trying to launch search with ordering, but it fails with excep
This looks like a bug - can you open a JIRA ticket?
Alan Woodward
www.flax.co.uk
On 13 May 2016, at 22:33, Daniel Bigham wrote:
> I am experimenting with supporting synonyms on the query side by doing query
> expansion.
>
> For example, the query "open webpage"
Try adding your multiple SpanNearQuery objects to a BooleanQuery?
Alan Woodward
www.flax.co.uk
On 12 May 2016, at 20:35, Daniel Bigham wrote:
> I'm very interested in SpanNearQuery, because it allows for quite powerful
> phrasal searching.
>
> However, unlike BooleanQuery, t
using MultiFields.getMergedFieldInfos()
instead.
Alan Woodward
www.flax.co.uk
On 29 Apr 2016, at 08:57, j.Pardos wrote:
> Hello,
>
> The suggested change worked in part: Luke now shows me the field contents, so
> it's correctly stored, for sure. However, when I ask the IndexReader for the
You should add a StoredField with the same name containing the value:
doc.add(new DoublePoint(name, Double.parseDouble(value));
doc.add(new StoredField(name, Double.parseDouble(value));
Alan Woodward
www.flax.co.uk
On 28 Apr 2016, at 13:10, j.Pardos wrote:
> Hello all,
>
> I need
Hi Eva,
This looks like a bug in WeightedSpanTermExtractor, which is rewriting your
PhraseQuery into a SpanNearQuery without checking how many terms there are.
Could you open a JIRA ticket?
Alan Woodward
www.flax.co.uk
> On 18 Apr 2016, at 16:27, Eva Popenda wrote:
>
> Hi,
>
Depending on the type of field, you can normally do:
Field myField = …
index.addField(fieldName, myField.tokenStream(null, null))
I agree that this could be a bit nicer, though. MemoryIndex doesn't support
DocValues yet either, although I think there is an open ticket to add that.
)
)
Alan Woodward
www.flax.co.uk
> On 30 Dec 2015, at 20:46, Brian V Zayas wrote:
>
> Hello-
>
> I'm trying to configure a search that captures a term but excludes search
> results that contain that same term if the term only appears in proximity
> to certain o
You may be able to do something along the lines of PayloadScoreQuery? That
overrides the scorer to factor in the value of payloads at each position. In
fact, a generic PositionScoringQuery would be a nice addition to the span
queries.
Alan Woodward
www.flax.co.uk
On 17 Dec 2015, at 13:58
Could you rewrite the query into a searcher-specific Weight, and then call
extractTerms()? ie, do:
Weight w = searcher.createNormalizedWeight(query, true);
Set terms = new HashSet<>();
w.extractTerms(terms);
if (terms.size() > 0)
doStuff();
Alan Woodward
www.flax.co.uk
The second parameter passed to SpanCollector.collectLeaf() is the position,
rather than an index of any kind, which I think is going to mess things up for
you. But other than that, you've got the right idea. :-)
Alan Woodward
www.flax.co.uk
On 3 Nov 2015, at 00:26, Allison, Timothy B.
If you're using 5.3, you can wrap everything with a PayloadScoreQuery. Before
that you'll need to use PayloadTermQuery or PayloadNearQuery, but I'd advise
upgrading as you'll get better performance and slightly more sane APIs.
Alan Woodward
www.flax.co.uk
On 22 Oct 2
Maybe instead of hacking BooleanWeight, you should use a version of
SpanPayloadCheckQuery? There isn't anything that combines checking and scoring
for payloads at the moment, but I don't think it would be too difficult to
write one.
Alan Woodward
www.flax.co.uk
On 22 Oct 2015
You should be able to use a FilterScorer that wraps a ConjunctionScorer and
overrides score().
Alan Woodward
www.flax.co.uk
On 22 Oct 2015, at 13:43, Sheng wrote:
> Thanks for the reply and suggestion. If I search for term A and term B with
> a BooleanQuery in Lucene, normally Lucene r
together.
Alan Woodward
www.flax.co.uk
> On 8 Oct 2015, at 01:22, Trejkaz wrote:
>
> Hi all.
>
> I have a situation where I want to look up some DocValues for each hit
> in the search.
>
> I have a few ways I could go about this:
>
>1. Use search() as normal an
>
> The second question if where I should put in place of "???". The API says
> "pass a prior PostingsEnum for possible reuse", but I don't get how to create
> an instance of it.
You can just pass null.
Alan Wood
Hi,
SpanNearQuery will also take into account the ‘width’ of the match, so that
terms that are closer together will score more highly. Is that what you’re
looking for?
Alan Woodward
www.flax.co.uk
On 10 Sep 2015, at 10:43, aurelien.mazo...@francelabs.com wrote:
> Hi all,
>
> Span
What version of lucene are you using? From Lucene 5.1 you can tell queries to
not report scores, which will give you the speedup you require here.
Alan Woodward
www.flax.co.uk
On 30 Jul 2015, at 05:22, 丁儒 wrote:
>
>
> It seems that ConstantScoreQuery use the Weight and Score of
You'll still need to call rewrite, but it needs to be done per-reader, so
you'll need to cache the queries *before* they're rewritten, and then call
rewrite whenever you create a new IndexReader. Otherwise you'll get incorrect
scores, and possibly missed hits as
itten queries somehow?
Alan Woodward
www.flax.co.uk
On 8 Jun 2015, at 10:49, Anna Maier wrote:
> Hi,
>
> we ran into a memory problem with TermQuery: in our program, we build a
> TermQuery object from the user input and pass it around, to be able to
> different things, like execute
ver a very large collection.
Alan Woodward
www.flax.co.uk
On 30 May 2014, at 11:20, Nicola Buso wrote:
> Hi Alan,
>
> just to make it more typical (yes there are not IndexWriters open on
> that indexes) how solr is caching results? the first thing I would like
> to do is to store t
If the index is truly unchanging (ie there's no IndexWriter open on it) then I
guess the document numbers will be stable across reopens. But this is a pretty
specialized situation, and the docs are really there to warn you off trying to
rely on this for more typical uses.
Alan Woo
ng as the subindexes
are passed to the MultiReader constructor in the same order on both machines,
the docBase assigned to each reader context should be the same.
Alan Woodward
www.flax.co.uk
On 29 May 2014, at 14:29, Nicola Buso wrote:
> Hi,
>
> from the javadocs:
>
>
really get stats for the index
as a whole at the moment.
Thanks,
Alan Woodward
www.flax.co.uk
Hi Siraj,
At the moment luwak is based on a fork of lucene
(https://github.com/flaxsearch/lucene-solr-intervals, itself based on work done
in LUCENE-2878), which we use to report exact match positions. I'm hoping to
get it working with the main lucene classes soon, though.
Alan Woo
Cross-posting this from the solr mailing list.
>
> We've now released the library we mentioned in our presentation at Lucene
> Revolution: https://github.com/flaxsearch/luwak
>
> You can use this to apply tens of thousands of stored Lucene queries to an
> incoming document in a second or so
Hi Shahak,
BooleanQuery.setMinimumNumberShouldMatch might help you here.
Alan Woodward
www.flax.co.uk
On 18 Nov 2013, at 18:35, Shahak Nagiel wrote:
> Initially, I queried our (v4.4) index with a single MultiFieldQueryParser and
> Operator.AND to ensure that all search terms appeared
IIRC, SpanQueries try and match on the smallest interval possible. So if
you've got T1 … T1 … T2, then SpanNear(T1, T2) will match from the second T1.
Alan Woodward
www.flax.co.uk
On 9 Jul 2013, at 09:56, Sébastien Druon wrote:
> Thanks Alan,
>
> Do you know if the search
You can use Integer.MAX_VALUE as the slop parameter.
Alan Woodward
www.flax.co.uk
On 9 Jul 2013, at 07:55, Sébastien Druon wrote:
> Hello,
>
> I am looking for a way to search for a token appearing after another and
> retrieve tehir positions.
>
> ex: T1 (...)*
Hi Glen,
You want the SynonymFilter:
http://lucene.apache.org/core/4_2_0/analyzers-common/org/apache/lucene/analysis/synonym/SynonymFilter.html
Alan Woodward
www.flax.co.uk
On 3 May 2013, at 19:14, Glen Newton wrote:
> Hello,
>
> I know I've seen it go by on this list and
might find that BinaryDocValues are a better fit
here, but it's difficult to tell without knowing what your actual use case is.
Alan Woodward
www.flax.co.uk
On 23 Apr 2013, at 15:06, Carsten Schnober wrote:
> Am 23.04.2013 15:27, schrieb Alan Woodward:
>> There's the SpanPosition
There's the SpanPositionCheckQuery family - SpanRangeQuery, SpanFirstQuery,
etc. Is that the sort of thing you're looking for?
Alan Woodward
www.flax.co.uk
On 23 Apr 2013, at 13:36, Carsten Schnober wrote:
> Am 23.04.2013 13:47, schrieb Carsten Schnober:
>> I'm tryin
orer that calls advance().
The other thing to look at would be sorted segments, see
https://issues.apache.org/jira/browse/LUCENE-4752.
Alan Woodward
www.flax.co.uk
On 4 Apr 2013, at 02:56, Otis Gospodnetic wrote:
> Hi,
>
> When Lucene scores matching documents, what is the order
Hi Paul,
You need to call tokenizer.reset() before you call incrementToken()
Alan Woodward
www.flax.co.uk
On 26 Feb 2013, at 12:26, Paul Taylor wrote:
> This works in 3.6, but in 4.1 fails whats wrong with the code
>
> public void testTokenization() throws IO
Hi Igor,
You could try wrapping the two cases in a SpanNotQuery:
SpanNot(SpanNear(runs, cat, 10), SpanNear(runs, cat, 3))
That should return documents that have runs within 10 positions of cat, as long
as they don't overlap with runs within 3 positions of cat.
Alan Woo
Hi Vignesh,
You might want to have a look at something we put together last year:
http://www.flax.co.uk/blog/2012/06/12/clade-a-freely-available-open-source-taxonomy-and-autoclassification-tool/.
Alan Woodward
a...@flax.co.uk
On 15 Jan 2013, at 05:33, VIGNESH S wrote:
> Hi All,
>
&g
Have a look at ShingleFilter:
http://lucene.apache.org/core/3_6_0/api/all/org/apache/lucene/analysis/shingle/ShingleFilter.html
On 21 Dec 2012, at 08:42, Xi Shen wrote:
> I have to use the white space and word delimiter to process the input
> first. I tried many combination, and it seems to me
Hi parnab,
You want to look at the similarities package:
http://lucene.apache.org/core/4_0_0-ALPHA/core/org/apache/lucene/search/similarities/package-summary.html
Alan Woodward
On 9 Oct 2012, at 20:04, parnab kumar wrote:
> Hi All,
>How do i incorporate machine learned r
subquery will be "B A", and the only span for SpanNear(A, C, 5)
> will be "A x x x x C", and those two are not adjacent, so there's no
> match for the outer SpanNear.
>
> Also, while we're exploring your solution, do you also have a rule to
> cover "
I've just had to implement exactly this - the solution I came up with was to
translate:
A w/5 (B and C) -> SpanNear(A, spanNear(A, B, 5), spanNear(A, C, 5), 0)
A w/5 (B or C) -> OR(spanNear(A, B, 5), spanNear(A, C, 5))
More complex queries (such as (A AND B) w/5 (C AND D)) are dealt with by
app
part of this for me, but I was hoping to only use lucene classes).
Thanks,
Alan Woodward
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
t;>>
>>> We don't yet have a way to drive a query from an FST, but that would
>>> be an interesting addition. EG you could then support weights as
>>> well, to decide how the terms are scored (if certain OCR errors are
>>> more likely than others).
>>
>> We're only allowing expansions within an edit distance of 1, which should
>> keep the numbers of terms down.
>
> Ahh, ok. So even if the term has two occurrences of cl, only one of
> them is allowed to substitute d?
Yes, exactly - "cloocl" will be expanded to "doocl" and "clood" only. I
;> 1) expand query term to sorted list of possible matches
>> 2) create an FST over those matches
>> 3) plug this FST into an AutomatonQuery subclass.
>>
>> 1) is easy. It's 2) and 3) I'm having trouble with.
>>
he various
bits together.
I'm thinking it should work like this:
1) expand query term to sorted list of possible matches
2) create an FST over those matches
3) plug this FST into an AutomatonQuery subclass.
1) is easy. It's 2) and 3) I'm having t
Hi Yuval,
You can just override Similarity, rather than DefaultSimilarity - that way you
don't burn any CPU cycles on TF/IDF calculations.
Alan
On 22 Feb 2012, at 07:17, Yuval Kesten wrote:
> Hi Em,
> 1. Regarding the performances - the similarity class (And my subtype as well)
> gets the IDF
On 13 Feb 2012, at 12:16, Robert Muir wrote:
> On Mon, Feb 13, 2012 at 6:39 AM, Alan Woodward
> wrote:
>> Hello,
>>
>> (I'm not interested in Tf or Idf here)
>> I've already extended DefaultSimilarity
>
> In this case, then extending Defau
o override
TFIDFSimilary#sloppySimScorer to return a custom SloppySimScorer instance.
However, this method has been declared final. Am I going about this the wrong
way? Or should the SimScorer methods on TDIDFSimilarity be unfinalized?
I'm using Lucene trunk, r1241355.
Thanks,
67 matches
Mail list logo