[ANNOUNCE] Apache Lucene 9.6.0 released

2023-05-10 Thread Alan Woodward
The Lucene PMC is pleased to announce the release of Apache Lucene 9.6.0. Apache Lucene is a high-performance, full-featured search engine library written entirely in Java. It is a technology suitable for nearly any application that requires structured search, full-text search, faceting, neares

[ANNOUNCE] Apache Lucene 9.2.0 released

2022-05-24 Thread Alan Woodward
23 May 2022 - Apache Lucene™ 9.2.0 available The Lucene PMC is pleased to announce the release of Apache Lucene 9.2.0. Apache Lucene is a high-performance, full-featured search engine library written entirely in Java. It is a technology suitable for nearly any application that requires structu

Re: IntervalQuery replacement for SpanFirstQuery? Closest replacement for slops?

2021-10-10 Thread Alan Woodward
around the IntervalQuery that boosts by the > number of terms added as sibling should clauses? Other suggestions? > > Uwe > > - > Uwe Schindler > Achterdiek 19, D-28357 Bremen > https://www.thetaphi.de <https://www.thetaphi.de/> > eMail: u...@thetaphi.de

Re: IntervalQuery replacement for SpanFirstQuery? Closest replacement for slops?

2020-09-21 Thread Alan Woodward
Your filtered query should work the same as a SpanFirst, yes. I didn’t add a shortcut just because you can do it this way, but feel free to add it if you think it’s useful! Re sloppy phrases, this one is trickier. The closest you can get at the moment is an unordered near, but that’s not the

[ANNOUNCE] Apache Lucene 8.5.0 released

2020-03-24 Thread Alan Woodward
## 24 March 2020, Apache Lucene 8.5.0 available The Lucene PMC is pleased to announce the release of Apache Lucene 8.5.0. Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires

[ANNOUNCE] Apache Lucene 7.3.0 released

2018-04-04 Thread Alan Woodward
4 April 2018, Apache Lucene™ 7.3.0 available The Lucene PMC is pleased to announce the release of Apache Lucene 7.3.0 Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full

Re: Lucene pagination using searchAfter while index is updated

2017-11-17 Thread Alan Woodward
You can use SearcherLifetimeManager to keep track of specific IndexSearcher instances - see Mike’s blog at http://blog.mikemccandless.com/2011/11/searcherlifetimemanager-prevents-broken.html <http://blog.mikemccandless.com/2011/11/searcherlifetimemanager-prevents-broken.html> Alan Wo

Re: howto get LongPoint stored

2017-10-25 Thread Alan Woodward
Hi Bernd, You add a separate StoredField with the same name. > On 25 Oct 2017, at 11:11, Bernd Fehling > wrote: > > With Lucene 6.6.2 I'm trying to get a LongPoint value indexed and stored. > > Old code: > LegacyLongField dateField = new LegacyLongField("modified", lastModified, > Field.Stor

Re: FunctionValues vs DoubleValuesSource

2017-10-13 Thread Alan Woodward
javascript functions. Alan Woodward www.flax.co.uk > On 12 Oct 2017, at 23:25, Michael McCandless > wrote: > > Hi Mike, > > It looks like FunctionValues is a very old API used by many function > queries, while DoubleValuesSource is relatively new (introduced in > https:

Re: synonyms

2017-07-25 Thread Alan Woodward
You have a LowercaseFilter before your SynonymFilter, which means that the entities in your SynonymMap need to be all lowercase or they won’t be matched. Alan Woodward www.flax.co.uk > On 25 Jul 2017, at 07:52, Christian Kaufhold > wrote: > > Hi, > > I am not able to a

Re: SpanMultiTermQueryWrapper issue in lucene 6.6.0

2017-07-06 Thread Alan Woodward
The contract to create a Weight is to repeatedly call rewrite() until the query is no longer changing, and then call createWeight - IndexSearcher.createNormalizedWeight() will do this for you. Alan Woodward www.flax.co.uk > On 6 Jul 2017, at 12:34, Ranganath B N wrote: > > Th

Re: SpanMultiTermQueryWrapper issue in lucene 6.6.0

2017-07-06 Thread Alan Woodward
You need to call SpanNearQuery.rewrite(), and then call createWeight() on the resulting query. Alan Woodward www.flax.co.uk > On 6 Jul 2017, at 11:54, Ranganath B N wrote: > > Hi Adrien, > > This SpanQuery spt2 will be a component of the SpanQueryarray input to > the

Re: Extending Analyzer at runtime

2017-06-23 Thread Alan Woodward
Hi, You should be able to use AnalyzerWrapper for this, adding your TokenFilters in wrapComponents(). Alan Woodward www.flax.co.uk > On 23 Jun 2017, at 14:33, Nicola Buso wrote: > > Hi, > > maybe it's a known question but I could not find and answer. > I need to base

Re: Heavy usage of final in Lucene classes

2017-01-12 Thread Alan Woodward
Hi Michael, You want to set the positionIncrementGap - either wrap your analyzer with an AnalyzerWrapper that overrides getPositionIncrementGap(), or use a CustomAnalyzer builder and set it there. Alan Woodward www.flax.co.uk > On 12 Jan 2017, at 10:57, Michael Wilkowski wrote: > >

Re: Opposite of SpanFirstQuery - Searching for documents by last term in a field

2016-12-14 Thread Alan Woodward
I’ve done this before by appending a special token to text fields via a TokenFilter. It hasn’t caused a noticeable problem with term stats, and field:* still works because the token is only added if the document in question actually has data in that particular field. Alan Woodward

Re: How to ignore a ,

2016-11-28 Thread Alan Woodward
Using StandardTokenizer should remove punctuation as well. Alan Woodward www.flax.co.uk > On 28 Nov 2016, at 16:06, Thomas Johnson wrote: > > We are using Lucene 5.0. Some of our documents are getting indexed with a > comma after the value. For example “John Doe, bob smith, and

Re: Luke alternative

2016-11-10 Thread Alan Woodward
Hi Chris, I’ve been working sporadically on a webservice API called marple: https://github.com/flaxsearch/marple <https://github.com/flaxsearch/marple>. Very much a project in development, but more testers and contributors are always welcome! Alan Woodward www.flax.co.uk > On 10

Re: Multivalued DocValuesField

2016-10-31 Thread Alan Woodward
You need to use a SortedNumericDocValuesField, which allows for multiple numeric values to be stored per-document. I’m not sure if that’s in Lucene 5.0, though, you may need to upgrade to something more recent. Alan Woodward www.flax.co.uk > On 31 Oct 2016, at 15:34, Fielder, Todd Patr

Re: wicket datatable, row selection, update another component

2016-10-28 Thread Alan Woodward
Hi Rob, I think you posted this to the wrong mailing list? Alan Woodward www.flax.co.uk > On 28 Oct 2016, at 12:13, Rob Audenaerde wrote: > > Hi all, > > I have a DataTable which, in onConfigure(), sets a selected item. I want > another (detail) panel, outside of this comp

Re: Migration Lucene 4 -> Lucene 6

2016-10-21 Thread Alan Woodward
Hi, You need to add a NumericDocValuesField here as well - Point is for searching, Stored is for display. Alan Woodward www.flax.co.uk > On 21 Oct 2016, at 10:54, Ludovic Bertin wrote: > > Hi, > > When I'm trying to launch search with ordering, but it fails with excep

Re: Synonym Query Expansion / Gaps / UnsupportedOperationException wrt SpanNearQuery

2016-05-14 Thread Alan Woodward
This looks like a bug - can you open a JIRA ticket? Alan Woodward www.flax.co.uk On 13 May 2016, at 22:33, Daniel Bigham wrote: > I am experimenting with supporting synonyms on the query side by doing query > expansion. > > For example, the query "open webpage"

Re: SpanNearQuery, Multiple Fields

2016-05-12 Thread Alan Woodward
Try adding your multiple SpanNearQuery objects to a BooleanQuery? Alan Woodward www.flax.co.uk On 12 May 2016, at 20:35, Daniel Bigham wrote: > I'm very interested in SpanNearQuery, because it allows for quite powerful > phrasal searching. > > However, unlike BooleanQuery, t

Re: Storing numeric fields in Apache 6

2016-04-29 Thread Alan Woodward
using MultiFields.getMergedFieldInfos() instead. Alan Woodward www.flax.co.uk On 29 Apr 2016, at 08:57, j.Pardos wrote: > Hello, > > The suggested change worked in part: Luke now shows me the field contents, so > it's correctly stored, for sure. However, when I ask the IndexReader for the

Re: Storing numeric fields in Apache 6

2016-04-28 Thread Alan Woodward
You should add a StoredField with the same name containing the value: doc.add(new DoublePoint(name, Double.parseDouble(value)); doc.add(new StoredField(name, Double.parseDouble(value)); Alan Woodward www.flax.co.uk On 28 Apr 2016, at 13:10, j.Pardos wrote: > Hello all, > > I need

Re: Problem with NGramAnalyzer, PhraseQuery and Highlighter

2016-04-18 Thread Alan Woodward
Hi Eva, This looks like a bug in WeightedSpanTermExtractor, which is rewriting your PhraseQuery into a SpanNearQuery without checking how many terms there are. Could you open a JIRA ticket? Alan Woodward www.flax.co.uk > On 18 Apr 2016, at 16:27, Eva Popenda wrote: > > Hi, >

Re: Is MemoryIndex and Spatial stuff combination supported?

2016-01-20 Thread Alan Woodward
Depending on the type of field, you can normally do: Field myField = … index.addField(fieldName, myField.tokenStream(null, null)) I agree that this could be a bit nicer, though. MemoryIndex doesn't support DocValues yet either, although I think there is an open ticket to add that.

Re: Syntax question

2015-12-30 Thread Alan Woodward
) ) Alan Woodward www.flax.co.uk > On 30 Dec 2015, at 20:46, Brian V Zayas wrote: > > Hello- > > I'm trying to configure a search that captures a term but excludes search > results that contain that same term if the term only appears in proximity > to certain o

Re: propagate Query.rewrite call to super.rewrite after 5.4 upgrade

2015-12-17 Thread Alan Woodward
You may be able to do something along the lines of PayloadScoreQuery? That overrides the scorer to factor in the value of payloads at each position. In fact, a generic PositionScoringQuery would be a nice addition to the span queries. Alan Woodward www.flax.co.uk On 17 Dec 2015, at 13:58

Re: Determine whether a MatchAllQuery or a Query with atleast one Term

2015-11-30 Thread Alan Woodward
Could you rewrite the query into a searcher-specific Weight, and then call extractTerms()? ie, do: Weight w = searcher.createNormalizedWeight(query, true); Set terms = new HashSet<>(); w.extractTerms(terms); if (terms.size() > 0) doStuff(); Alan Woodward www.flax.co.uk

Re: extracting charoffsets from SpanWeight's getSpans() in 5.3.1?

2015-11-03 Thread Alan Woodward
The second parameter passed to SpanCollector.collectLeaf() is the position, rather than an index of any kind, which I think is going to mess things up for you. But other than that, you've got the right idea. :-) Alan Woodward www.flax.co.uk On 3 Nov 2015, at 00:26, Allison, Timothy B.

Re: ConjunctionScorer access

2015-10-22 Thread Alan Woodward
If you're using 5.3, you can wrap everything with a PayloadScoreQuery. Before that you'll need to use PayloadTermQuery or PayloadNearQuery, but I'd advise upgrading as you'll get better performance and slightly more sane APIs. Alan Woodward www.flax.co.uk On 22 Oct 2

Re: ConjunctionScorer access

2015-10-22 Thread Alan Woodward
Maybe instead of hacking BooleanWeight, you should use a version of SpanPayloadCheckQuery? There isn't anything that combines checking and scoring for payloads at the moment, but I don't think it would be too difficult to write one. Alan Woodward www.flax.co.uk On 22 Oct 2015

Re: ConjunctionScorer access

2015-10-22 Thread Alan Woodward
You should be able to use a FilterScorer that wraps a ConjunctionScorer and overrides score(). Alan Woodward www.flax.co.uk On 22 Oct 2015, at 13:43, Sheng wrote: > Thanks for the reply and suggestion. If I search for term A and term B with > a BooleanQuery in Lucene, normally Lucene r

Re: Recommendation for doing a search plus collecting extra information?

2015-10-11 Thread Alan Woodward
together. Alan Woodward www.flax.co.uk > On 8 Oct 2015, at 01:22, Trejkaz wrote: > > Hi all. > > I have a situation where I want to look up some DocValues for each hit > in the search. > > I have a few ways I could go about this: > >1. Use search() as normal an

Re: offsets of a term in a document

2015-09-21 Thread Alan Woodward
> > The second question if where I should put in place of "???". The API says > "pass a prior PostingsEnum for possible reuse", but I don't get how to create > an instance of it. You can just pass null. Alan Wood

Re: Use SloppyPhraseScorer in SpanNearQuery

2015-09-10 Thread Alan Woodward
Hi, SpanNearQuery will also take into account the ‘width’ of the match, so that terms that are closer together will score more highly. Is that what you’re looking for? Alan Woodward www.flax.co.uk On 10 Sep 2015, at 10:43, aurelien.mazo...@francelabs.com wrote: > Hi all, > > Span

Re: ignore score and weight in lucene search

2015-07-30 Thread Alan Woodward
What version of lucene are you using? From Lucene 5.1 you can tell queries to not report scores, which will give you the speedup you require here. Alan Woodward www.flax.co.uk On 30 Jul 2015, at 05:22, 丁儒 wrote: > > > It seems that ConstantScoreQuery use the Weight and Score of

Re: Memory problem with TermQuery

2015-06-08 Thread Alan Woodward
You'll still need to call rewrite, but it needs to be done per-reader, so you'll need to cache the queries *before* they're rewritten, and then call rewrite whenever you create a new IndexReader. Otherwise you'll get incorrect scores, and possibly missed hits as

Re: Memory problem with TermQuery

2015-06-08 Thread Alan Woodward
itten queries somehow? Alan Woodward www.flax.co.uk On 8 Jun 2015, at 10:49, Anna Maier wrote: > Hi, > > we ran into a memory problem with TermQuery: in our program, we build a > TermQuery object from the user input and pass it around, to be able to > different things, like execute

Re: MultiReader docid reliability

2014-05-30 Thread Alan Woodward
ver a very large collection. Alan Woodward www.flax.co.uk On 30 May 2014, at 11:20, Nicola Buso wrote: > Hi Alan, > > just to make it more typical (yes there are not IndexWriters open on > that indexes) how solr is caching results? the first thing I would like > to do is to store t

Re: MultiReader docid reliability

2014-05-30 Thread Alan Woodward
If the index is truly unchanging (ie there's no IndexWriter open on it) then I guess the document numbers will be stable across reopens. But this is a pretty specialized situation, and the docs are really there to warn you off trying to rely on this for more typical uses. Alan Woo

Re: MultiReader docid reliability

2014-05-30 Thread Alan Woodward
ng as the subindexes are passed to the MultiReader constructor in the same order on both machines, the docBase assigned to each reader context should be the same. Alan Woodward www.flax.co.uk On 29 May 2014, at 14:29, Nicola Buso wrote: > Hi, > > from the javadocs: > >

Getting individual field sizes from an index

2014-03-21 Thread Alan Woodward
really get stats for the index as a whole at the moment. Thanks, Alan Woodward www.flax.co.uk

Re: Reverse Matching

2014-02-17 Thread Alan Woodward
Hi Siraj, At the moment luwak is based on a fork of lucene (https://github.com/flaxsearch/lucene-solr-intervals, itself based on work done in LUCENE-2878), which we use to report exact match positions. I'm hoping to get it working with the main lucene classes soon, though. Alan Woo

Introducing Luwak for high-performance stored Lucene queries

2013-12-06 Thread Alan Woodward
Cross-posting this from the solr mailing list. > > We've now released the library we mentioned in our presentation at Lucene > Revolution: https://github.com/flaxsearch/luwak > > You can use this to apply tens of thousands of stored Lucene queries to an > incoming document in a second or so

Re: QueryParser.Operator with BooleanQuery

2013-11-19 Thread Alan Woodward
Hi Shahak, BooleanQuery.setMinimumNumberShouldMatch might help you here. Alan Woodward www.flax.co.uk On 18 Nov 2013, at 18:35, Shahak Nagiel wrote: > Initially, I queried our (v4.4) index with a single MultiFieldQueryParser and > Operator.AND to ensure that all search terms appeared

Re: Search for a token appearing after another

2013-07-09 Thread Alan Woodward
IIRC, SpanQueries try and match on the smallest interval possible. So if you've got T1 … T1 … T2, then SpanNear(T1, T2) will match from the second T1. Alan Woodward www.flax.co.uk On 9 Jul 2013, at 09:56, Sébastien Druon wrote: > Thanks Alan, > > Do you know if the search

Re: Search for a token appearing after another

2013-07-09 Thread Alan Woodward
You can use Integer.MAX_VALUE as the slop parameter. Alan Woodward www.flax.co.uk On 9 Jul 2013, at 07:55, Sébastien Druon wrote: > Hello, > > I am looking for a way to search for a token appearing after another and > retrieve tehir positions. > > ex: T1 (...)*

Re: Index-time term expansion

2013-05-03 Thread Alan Woodward
Hi Glen, You want the SynonymFilter: http://lucene.apache.org/core/4_2_0/analyzers-common/org/apache/lucene/analysis/synonym/SynonymFilter.html Alan Woodward www.flax.co.uk On 3 May 2013, at 19:14, Glen Newton wrote: > Hello, > > I know I've seen it go by on this list and

Re: Reading Payloads

2013-04-23 Thread Alan Woodward
might find that BinaryDocValues are a better fit here, but it's difficult to tell without knowing what your actual use case is. Alan Woodward www.flax.co.uk On 23 Apr 2013, at 15:06, Carsten Schnober wrote: > Am 23.04.2013 15:27, schrieb Alan Woodward: >> There's the SpanPosition

Re: Reading Payloads

2013-04-23 Thread Alan Woodward
There's the SpanPositionCheckQuery family - SpanRangeQuery, SpanFirstQuery, etc. Is that the sort of thing you're looking for? Alan Woodward www.flax.co.uk On 23 Apr 2013, at 13:36, Carsten Schnober wrote: > Am 23.04.2013 13:47, schrieb Carsten Schnober: >> I'm tryin

Re: Document scoring order?

2013-04-04 Thread Alan Woodward
orer that calls advance(). The other thing to look at would be sorted segments, see https://issues.apache.org/jira/browse/LUCENE-4752. Alan Woodward www.flax.co.uk On 4 Apr 2013, at 02:56, Otis Gospodnetic wrote: > Hi, > > When Lucene scores matching documents, what is the order

Re: ArrayIndexOutOfBoundsException trying to use tokenizer in Lucene 4.1

2013-02-26 Thread Alan Woodward
Hi Paul, You need to call tokenizer.reset() before you call incrementToken() Alan Woodward www.flax.co.uk On 26 Feb 2013, at 12:26, Paul Taylor wrote: > This works in 3.6, but in 4.1 fails whats wrong with the code > > public void testTokenization() throws IO

Re: SpanNearQuery with two boundaries

2013-01-18 Thread Alan Woodward
Hi Igor, You could try wrapping the two cases in a SpanNotQuery: SpanNot(SpanNear(runs, cat, 10), SpanNear(runs, cat, 3)) That should return documents that have runs within 10 positions of cat, as long as they don't overlap with runs within 3 positions of cat. Alan Woo

Re: Reg Lucene Naive Bayesian classifier.

2013-01-15 Thread Alan Woodward
Hi Vignesh, You might want to have a look at something we put together last year: http://www.flax.co.uk/blog/2012/06/12/clade-a-freely-available-open-source-taxonomy-and-autoclassification-tool/. Alan Woodward a...@flax.co.uk On 15 Jan 2013, at 05:33, VIGNESH S wrote: > Hi All, > &g

Re: Which token filter can combine 2 terms into 1?

2012-12-21 Thread Alan Woodward
Have a look at ShingleFilter: http://lucene.apache.org/core/3_6_0/api/all/org/apache/lucene/analysis/shingle/ShingleFilter.html On 21 Dec 2012, at 08:42, Xi Shen wrote: > I have to use the white space and word delimiter to process the input > first. I tried many combination, and it seems to me

Re: machine learned ranking of documents

2012-10-09 Thread Alan Woodward
Hi parnab, You want to look at the similarities package: http://lucene.apache.org/core/4_0_0-ALPHA/core/org/apache/lucene/search/similarities/package-summary.html Alan Woodward On 9 Oct 2012, at 20:04, parnab kumar wrote: > Hi All, >How do i incorporate machine learned r

Re: Approches/semantics for arbitrarily combining boolean and proximity search operators?

2012-05-17 Thread Alan Woodward
subquery will be "B A", and the only span for SpanNear(A, C, 5) > will be "A x x x x C", and those two are not adjacent, so there's no > match for the outer SpanNear. > > Also, while we're exploring your solution, do you also have a rule to > cover "

Re: Approches/semantics for arbitrarily combining boolean and proximity search operators?

2012-05-17 Thread Alan Woodward
I've just had to implement exactly this - the solution I came up with was to translate: A w/5 (B and C) -> SpanNear(A, spanNear(A, B, 5), spanNear(A, C, 5), 0) A w/5 (B or C) -> OR(spanNear(A, B, 5), spanNear(A, C, 5)) More complex queries (such as (A AND B) w/5 (C AND D)) are dealt with by app

Preserving TokenFilters

2012-03-12 Thread Alan Woodward
part of this for me, but I was hoping to only use lucene classes). Thanks, Alan Woodward - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Building FST-like automaton queries

2012-02-28 Thread Alan Woodward
t;>> >>> We don't yet have a way to drive a query from an FST, but that would >>> be an interesting addition. EG you could then support weights as >>> well, to decide how the terms are scored (if certain OCR errors are >>> more likely than others).

Re: Building FST-like automaton queries

2012-02-28 Thread Alan Woodward
>> >> We're only allowing expansions within an edit distance of 1, which should >> keep the numbers of terms down. > > Ahh, ok. So even if the term has two occurrences of cl, only one of > them is allowed to substitute d? Yes, exactly - "cloocl" will be expanded to "doocl" and "clood" only. I

Re: Building FST-like automaton queries

2012-02-28 Thread Alan Woodward
;> 1) expand query term to sorted list of possible matches >> 2) create an FST over those matches >> 3) plug this FST into an AutomatonQuery subclass. >> >> 1) is easy. It's 2) and 3) I'm having trouble with. >>

Building FST-like automaton queries

2012-02-28 Thread Alan Woodward
he various bits together. I'm thinking it should work like this: 1) expand query term to sorted list of possible matches 2) create an FST over those matches 3) plug this FST into an AutomatonQuery subclass. 1) is easy. It's 2) and 3) I'm having t

Re: Custom lucene scoring - Dot product between field boost and query boost

2012-02-22 Thread Alan Woodward
Hi Yuval, You can just override Similarity, rather than DefaultSimilarity - that way you don't burn any CPU cycles on TF/IDF calculations. Alan On 22 Feb 2012, at 07:17, Yuval Kesten wrote: > Hi Em, > 1. Regarding the performances - the similarity class (And my subtype as well) > gets the IDF

Re: Overriding SloppySimScorer

2012-02-13 Thread Alan Woodward
On 13 Feb 2012, at 12:16, Robert Muir wrote: > On Mon, Feb 13, 2012 at 6:39 AM, Alan Woodward > wrote: >> Hello, >> >> (I'm not interested in Tf or Idf here) >> I've already extended DefaultSimilarity > > In this case, then extending Defau

Overriding SloppySimScorer

2012-02-13 Thread Alan Woodward
o override TFIDFSimilary#sloppySimScorer to return a custom SloppySimScorer instance. However, this method has been declared final. Am I going about this the wrong way? Or should the SimScorer methods on TDIDFSimilarity be unfinalized? I'm using Lucene trunk, r1241355. Thanks,