Re: Profiling lucene 5.2.0 based tool

2016-02-22 Thread Rob Audenaerde
Hi Sandeep, How many threads do you use to do the indexing? The benchmarks of Lucene are done on >20 threads IIRC. -Rob On Tue, Feb 23, 2016 at 8:01 AM, sandeep das wrote: > Hi, > > I've implemented a tool using lucene-5.2.0 to index my CSV files. The tool > is reading data from CSV files(resi

Re: Lucene Facets performance problems (version 4.7.2)

2016-02-26 Thread Rob Audenaerde
Hi Simona, In addition to Ericks' questions: Are you talking about *search* time or facet-collection time? And how many results are in your result set? I have some experience with collecting facets from large results set, these are typically slow (as they have to retrieve all the relevant facet

Re: GROUP BY in Lucene

2016-03-19 Thread Rob Audenaerde
Hi Gimantha, You don't need to store the aggregates and don't need to retrieve Documents. The aggregates are calculated during collection using the BinaryDocValues from the facet-module. What I do, is that I need to store values in the facets using AssociationFacetFields. (for example FloatAssocia

clone RAMDirectory

2016-06-30 Thread Rob Audenaerde
Hi all, For increasing the speed of some of my application tests, I want to re-use/copy a pre-populated RAMDirectory over and over. I'm on Lucene 6.0.1 It seems an RAMDirectory can be a copy of a FSDirectory, but not of another RAMDirectory. Also RAMDirectory is not Clonable. What would be the

Re: clone RAMDirectory

2016-06-30 Thread Rob Audenaerde
Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > > > -Original Message- > > From: Rob Audenaerde [mailto:rob.audenae...@gmail.com] > > Sent: Thursday, June 30, 2016 12:00 PM > > To: java-user@luce

wicket datatable, row selection, update another component

2016-10-28 Thread Rob Audenaerde
Hi all, I have a DataTable which, in onConfigure(), sets a selected item. I want another (detail) panel, outside of this component, to react on that selection i.e. set it's visibility and render details of the selected item. What I see is that the onConfigure() of the detail component is called B

Re: wicket datatable, row selection, update another component

2016-10-28 Thread Rob Audenaerde
Whoops! You are correct! Sorry 'bout that. On Fri, Oct 28, 2016 at 1:26 PM, Alan Woodward wrote: > Hi Rob, I think you posted this to the wrong mailing list? > > Alan Woodward > www.flax.co.uk > > > > On 28 Oct 2016, at 12:13, Rob Audenaerde > wrote: > >

commit frequency guideline?

2016-11-30 Thread Rob Audenaerde
Hi all, Currently we call commit() many times on our index (about 5M docs, where some 10.000-100.000 modifications during the day). The commit times typically get more expensive when the index grows, up to several seconds, so we want to reduce the number of calls. (Historically, we had Lucene com

Re: commit frequency guideline?

2016-11-30 Thread Rob Audenaerde
on > too, e.g. Kafka, so that your application doesn't need to keep track > of which docs were not yet committed. > > Mike McCandless > > http://blog.mikemccandless.com > > > On Wed, Nov 30, 2016 at 8:50 AM, Rob Audenaerde > wrote: > > Hi all, > >

Autocomplete using facet labels?

2017-04-12 Thread Rob Audenaerde
uthor / John Doe' 'Author / Joan Deville' ... Are there built-in options to create such an autocomplete? Or do I have to build it myself? I prefer not to do a search on all the matching documents and collect facets for those, because that is not very fast Any hints? Thanks in advan

Re: Autocomplete using facet labels?

2017-04-12 Thread Rob Audenaerde
; literal, i.e. it's case-sensitive but you can send terms.prefix=jo and > case things properly on the app side. > > Best, > Erick > > On Wed, Apr 12, 2017 at 6:33 AM, Rob Audenaerde > wrote: > > I have a Lucene (6.4.2) index with about 2-5M documents, and each >

Re: Lucene update performance

2017-05-09 Thread Rob Audenaerde
Do you update each entire document? (vs updating numeric docvalues?) That is implemented as 'delete and add' so I guess that will be slower than clean sheet indexing. Not sure if it is 3x slower, that seems a bit much? On Tue, May 9, 2017 at 3:24 PM, Kudrettin Güleryüz wrote: > Hi, > > For a 5.

Re: Lucene update performance

2017-05-09 Thread Rob Audenaerde
file and readding them. > > Is there an update method, is it better performance than remove then add? I > was simply removing modified files from the index (which doesn't seem to > take long), and readd them. > > On Tue, May 9, 2017 at 9:33 AM Rob Audenaerde > wrote: >

indexing performance 6.6 vs 7.1

2018-01-18 Thread Rob Audenaerde
increase in indexing time is to be expected as result of the sparse docvalues change? Kind regards, Rob Audenaerde

Re: indexing performance 6.6 vs 7.1

2018-01-29 Thread Rob Audenaerde
> > where indexing time goes? > > > > If you can run with a profiler, this might also give useful information. > > > > Le jeu. 18 janv. 2018 à 11:23, Rob Audenaerde > a > > écrit : > > > >> Hi all, > >> > >> We recently upgra

Re: indexing performance 6.6 vs 7.1

2018-01-29 Thread Rob Audenaerde
we > > - > Uwe Schindler > Achterdiek 19, D-28357 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > > > -Original Message- > > From: Rob Audenaerde [mailto:rob.audenae...@gmail.com] > > Sent: Monday, January 29, 2018 11:29 AM > > To

Re: indexing performance 6.6 vs 7.1

2018-01-31 Thread Rob Audenaerde
gt; so they commit very seldom. If the system crashes, the changes are replayed >> from tranlog since last commit. >> >> Uwe >> >> - >> Uwe Schindler >> Achterdiek 19, D-28357 Bremen >> http://www.thetaphi.de >> eMail: u...@thetaphi.de >>

Re: indexing performance 6.6 vs 7.1

2018-01-31 Thread Rob Audenaerde
should consider moving to a > time-based policy? eg. commit every 10 minutes? > > Le mer. 31 janv. 2018 à 10:25, Rob Audenaerde a > écrit : > > > Hi all, > > > > We ran the benchmarks (6.6 vs 7.1) with IW info stream and (as attachment > > cannot be too large)

Re: Lucene nested query

2018-04-10 Thread Rob Audenaerde
Your query can be seen as an inner join: select t0.* from employee t0 inner join employee t1 on t0.dept_no = t1.dept_no where t1.email='a...@email.com' Maybe JoinUtill can help you. http://lucene.apache.org/core/7_0_0/join/org/apache/lucene/search/join/JoinUtil.html?is-external=true On Tue, Apr

force deletes - terms enum still has deleted terms?

2018-09-28 Thread Rob Audenaerde
something? Thanks in advance. Rob Audenaerde

find documents with big stored fields

2019-07-01 Thread Rob Audenaerde
documents and/or field names/contents with extreme sizes, so we can delete those from the index without needing to re-index all data. What would be the best approach for this? Thanks, Rob Audenaerde

unexpected performance TermsQuery Occur.SHOULD vs TermsInSetQuery?

2020-10-13 Thread Rob Audenaerde
Hello, I'm benchmarking an application which implements security on lucene by adding a multivalue field "roles". If the user has one of these roles, he can find the document. I implemented this as a Boolean and query, added the original query and the restriction with Occur.MUST. I'm having some

Re: unexpected performance TermsQuery Occur.SHOULD vs TermsInSetQuery?

2020-10-13 Thread Rob Audenaerde
the number > of clauses is less than 16, so I would not expect major performance > differences between a TermInSetQuery over less than 16 terms and a > BooleanQuery wrapped in a ConstantScoreQuery. > > On Tue, Oct 13, 2020 at 11:35 AM Rob Audenaerde > wrote: > > > Hello

Re: unexpected performance TermsQuery Occur.SHOULD vs TermsInSetQuery?

2020-10-13 Thread Rob Audenaerde
AM Rob Audenaerde wrote: > Hello Adrien, > > Thanks for the swift reply. I'll add the details: > > Lucene version: 8.6.2 > > The restrictionQuery is indeed a conjunction, it allowes for a document to > be a hit if the 'roles' field is empty as well. It

Re: unexpected performance TermsQuery Occur.SHOULD vs TermsInSetQuery?

2020-10-13 Thread Rob Audenaerde
tQuery. > > Also beware than IndexSearcher#count will look at index statistics if your > queries have a single term, which would no longer work if you use this > query as a filter for another query. > > On Tue, Oct 13, 2020 at 12:51 PM Rob Audenaerde > wrote: > > > I reduced

best way (performance wise) to search for field without value?

2020-11-13 Thread Rob Audenaerde
Hi all, We have implemented some security on our index by adding a field 'groups_allowed' to documents, and wrap a boolean must query around the original query, that checks if one of the given user-groups matches at least one groups_allowed. We chose to leave the groups_allowed field empty when t

Fwd: best way (performance wise) to search for field without value?

2020-11-13 Thread Rob Audenaerde
use DocValuesFieldExistsQuery. > > Mike McCandless > > http://blog.mikemccandless.com > > > On Fri, Nov 13, 2020 at 7:56 AM Rob Audenaerde > wrote: > >> Hi all, >> >> We have implemented some security on our index by adding a field >> 'groups_al

Re: Slower document retrieval in 8.7.0 comparing to 7.5.0

2021-01-21 Thread Rob Audenaerde
There is no attachment in the previous email that I can see? Maybe you can post it online? On Thu, Jan 21, 2021 at 4:54 PM Martynas L wrote: > Hello, > > Are there any comments on this issue? > If there is no workaround, we will be forced to rollback to the 7.5.0 > version. > > Best regards, > M

Re: Slower document retrieval in 8.7.0 comparing to 7.5.0

2021-01-22 Thread Rob Audenaerde
le.com/drive/folders/1ufVZXzkugBAFnuy8HLAY6mbPWzjknrfE > > IndexGenerator - creates a dummy index. > IndexReader - retrieves documents - duration time with 7.5.0 version is > ~2s, while ~6s with 8.7.0 > > Regards, > Martynas > > > On Thu, Jan 21, 2021 at 8:21 PM Rob Au

Re: Slower document retrieval in 8.7.0 comparing to 7.5.0

2021-01-22 Thread Rob Audenaerde
gt; duration > > ratio - 8.7.0 is 3 times slower. I think it will be similar ratio > > retrieving any number of documents. > > > > On Fri, Jan 22, 2021 at 1:39 PM Rob Audenaerde > > > wrote: > > > > > Hi Martrynas, > > > > > > In

Re: Slower document retrieval in 8.7.0 comparing to 7.5.0

2021-01-22 Thread Rob Audenaerde
> > > > On Fri, Jan 22, 2021 at 4:48 PM Rob Audenaerde > wrote: > > > Hi Martynas > > > > How did you measure that? > > > > I ask, because writing a good benchmark is not an easy task, since there > > are so many factors (class loading times, J

NRT facet issue (bug?), hard to reproduce, please advise

2014-04-11 Thread Rob Audenaerde
Hi all, I have a issue using the near real-time search in the taxonomy. I could really use some advise on how to debug/proceed this issue. The issue is as follows: I index 100k documents, with about 40 fields each. For each field, I also add a FacetField (issues arises both with FacetField as Fl

Getting multi-values to use in filter?

2014-04-23 Thread Rob Audenaerde
Hi all, I'm looking for a way to use multi-values in a filter. I want to be able to search on sum(field)=100, where field has values in one documents: field=60 field=40 In this case 'field' is a LongField. I examined the code in the FieldCache, but that seems to focus on single-valued fields o

Re: Getting multi-values to use in filter?

2014-04-23 Thread Rob Audenaerde
x is that it lets you look up documents very > quickly based on *precomputed* values. > > -Mike > > > > On 04/23/2014 06:56 AM, Rob Audenaerde wrote: > >> Hi all, >> >> I'm looking for a way to use multi-values in a filter. >> >> I wa

Re: Getting multi-values to use in filter?

2014-04-23 Thread Rob Audenaerde
x is that it lets you look up documents very > > quickly based on *precomputed* values. > > > > -Mike > > > > > > On 04/23/2014 06:56 AM, Rob Audenaerde wrote: > > > >> Hi all, > >> > >> I'm looking for a way to use multi-values

Re: Getting multi-values to use in filter?

2014-04-23 Thread Rob Audenaerde
e multi-valued numeric field, and given that NDV > is single valued, we went w/ BDV. > > If I misunderstood the scenario, I'd appreciate if you clarify it :) > > Shai > > > On Wed, Apr 23, 2014 at 5:49 PM, Rob Audenaerde >wrote: > > > Hi Shai, all

Re: Getting multi-values to use in filter?

2014-04-29 Thread Rob Audenaerde
compile expressions, but the methods should take only double values. So I >> think it should be some sort of binding, but I'm not sure yet how to do it. >> Perhaps it should be a name like max_fieldName, which you add a custom >> Expression to as a binding ... I will try to lo

Re: search performance

2014-06-03 Thread Rob Audenaerde
Hi Jamie, What is included in the 5 minutes? Just the call to the searcher? seacher.search(...) ? Can you show a bit more of the code you use? On Tue, Jun 3, 2014 at 11:32 AM, Jamie wrote: > Vitaly > > Thanks for the contribution. Unfortunately, we cannot use Lucene's > pagination function

fill 'empty' facet-values, sampling, taxoreader

2015-01-12 Thread Rob Audenaerde
Hi all, I'm building an application in which users can add arbitrary documents, and all fields will be added as facets as well. This allows users to browse their documents by their own defined facets easily. However, when the number of documents gets very large, I switch to random-sampled facets

disabling all scoring?

2015-02-04 Thread Rob Audenaerde
Hi all, I'm doing some analytics with a custom Collector on a fairly large number of searchresults (+-100.000, all the hits that return from a query). I need to retrieve them by a query (so using search), but I don't need any scoring nor keeping the documents in any order. When profiling the appl

Re: GROUP BY in Lucene

2015-08-10 Thread Rob Audenaerde
You can write a custom (facet) collector to do this. I have done something similar, I'll describe my approach: For all the values that need grouping or aggregating, I have added a FacetField ( an AssociatedFacetField, so I can store the value alongside the ordinal) . The main search stays the same

Number of threads in index writer config?

2015-08-27 Thread Rob Audenaerde
Hi all, I was wondering about the number of threads to use for indexing. There is a setting: getMaxThreadStates() in the IndexWriterConfig that determines how many threads can write to the index simultaneously. The luceneutil Indexer.java (that is used for the nightly benchmarks), seems to use

index size growing while deleting

2015-11-05 Thread Rob Audenaerde
Hi all, I'm currently investigating an issue we have with our index. It keeps getting bigger, and I don't het why. Here is our use case: We index a database of about 4 million records; spread over a few hundred tables. The data consists of a mix of text, dates, numbers etc. We also add all these

Re: index size growing while deleting

2015-11-05 Thread Rob Audenaerde
t; There's some configuration/runtime activities you don't mention And > you make testing process sound like a mirror of production? (Including > configuration?) > > > -will > > > On 11/5/15 7:33 AM, Rob Audenaerde wrote: > >> Hi all, >> >>

Re: index size growing while deleting

2015-11-06 Thread Rob Audenaerde
gt; lets Lucene remove any old segments referenced by the prior commit > point. > > Mike McCandless > > http://blog.mikemccandless.com > > > On Fri, Nov 6, 2015 at 2:59 AM, Rob Audenaerde > wrote: > > Hi will, others > > > > Thanks for you reply, > >

Re: index size growing while deleting

2015-11-08 Thread Rob Audenaerde
On Fri, Nov 6, 2015 at 11:29 AM, Michael McCandless < luc...@mikemccandless.com> wrote: It's also important to IndexWriter.commit (as well as open new NRT > readers) periodically or after doing a large set of updates, as that > lets Lucene remove any old segments referenced by the prior commit > p

Re: index size growing while deleting

2015-11-10 Thread Rob Audenaerde
i Rob, > > we had a similar problem. In our case we had open index readers, that > blocked the index from merging its segments and thus deleting the marked > segments. > > Regards, > > Jürgen. > > > Am 06.11.2015 um 08:59 schrieb Rob Audenaerde: > >> Hi wil

Re: index size growing while deleting

2015-11-10 Thread Rob Audenaerde
rom > the Searcher we get the Reader. After the query you call > searcherManager.release(searcher). The SearcherManager takes care of the > rest. > > Regards, > > Jürgen. > > > Am 10.11.2015 um 13:27 schrieb Rob Audenaerde: > >> Hi Jürgen, Michael >> &g

debugging growing index size

2015-11-11 Thread Rob Audenaerde
Hi all, I'm still debugging the growing-index size. I think closing index readers might help (work in progress), but I can't really see them holding on to files (at least, using lsof ). Restarting the application sheds some light, I see logging on files that are no longer referenced. What I see i

Re: debugging growing index size

2015-11-12 Thread Rob Audenaerde
EF_COUNTS to true, and then re-generate this > log? This causes IW to log the ref count of each file it's tracking > ... > > I'll also add a bit more verbosity to IW when NRT readers are opened > and close, for 5.4.0. > > Mike McCandless > > http://blog.mikem

Re: debugging growing index size

2015-11-13 Thread Rob Audenaerde
wrong here. > > Can you set the (public, static) boolean > IndexFileDeleter.VERBOSE_REF_COUNTS to true, and then re-generate this > log? This causes IW to log the ref count of each file it's tracking > ... > > I'll also add a bit more verbosity to IW when NRT readers

Re: debugging growing index size

2015-11-13 Thread Rob Audenaerde
> Hi Rob, > > A couple more things: > > Can you print the value of MMapDirectory.UNMAP_SUPPORTED? > > Also, can you try your test using NIOFSDirectory instead? Curious if > that changes things... > > Mike McCandless > > http://blog.mikemccandless.com >

RE: debugging growing index size

2015-11-13 Thread Rob Audenaerde
ava.com/view_bug.do?bug_id=4724038 > > > > http://mail-archives.apache.org/mod_mbox/lucene- > > dev/201509.mbox/%3c55f0461a.2070...@gmail.com%3E > > > > hth > > -will > > > > > > > > > On Nov 13, 2015, at 11:23 AM, Rob Audenaerde > >

RE: debugging growing index size

2015-11-14 Thread Rob Audenaerde
n't happen? Are you sure? > > > > I'll look at the 6.6 GB infoStream to see what it says about the ref > counts. > > > > Did you fix the issue in your app where you're not closing all opened > > NRT readers? > > > > Mike McCandless > >