Re: Java 17 and Lucene

2021-10-20 Thread Jigar Shah
Michael, Is this recommended "-XX:+UseZGC options to enable ZGC." as it claims very low pauses. For "*DY* (2021-10-19 08:14:33): Upgrade to JDK17+35" execution for "Indexing throughput " is ZGC used for the "Indexing throughput

Re: MMapDirectory vs In Memory Lucene Index (i.e., ByteBuffersDirectory)

2020-12-14 Thread Jigar Shah
Thanks, Uwe Yes, recommended, tmpfs/ramfs worked like a charm in our use-case with a read-only index, giving us very high-throughput and consistent response time on queries. We had to have some redundancy to be built around that service to be high-available, so we can do a rolling update on the r

Re: MMapDirectory vs In Memory Lucene Index (i.e., ByteBuffersDirectory)

2020-12-14 Thread Jigar Shah
I used one of the Linux feature (ramfs, basically mounting ram on a partition) to guarantee that it's always in ram (No accidental paging ;) cost too). https://www.jamescoyle.net/how-to/943-create-a-ram-disk-in-linux WARN: Only use if it's a read-only index and can fit in ram and have a back-up c

Re: Deduplication of search result with custom with custom sort

2020-10-09 Thread Jigar Shah
My learnings dealing this problem We faced a similar problem before, and did the following things: 1) Don't request totalGroupCount, and the response was fast. as computing group count is an expensive task. If you can live without groupCount. Although you can approximate pagination up to total co

Re: Lucene one to many query

2019-09-21 Thread Jigar Shah
Nested documents structure supported by solr is what you need. But as you are using lucene, you should denormalize and store item with company fields and price. Apply search on item with function query on item_price. As you have results you can store companies in a set. On Sat, Sep 21, 2019, 11

Re: Nested Facets

2019-08-30 Thread Jigar Shah
You should be looking at Facet pivots feature what Solr provides based on doc-values. As you are using core Lucene you may have to do little more search on how to do at low level with Lucene Facet API based on DocValues facet. Your starting point should be https://lucene.apache.org/core/8_2_0/fac

Re: SearcherTaxonomyManager Refreshing

2017-08-24 Thread Jigar Shah
Looks like your approach to manage main index and taxonomy index is risky. Main index keeps ordinals of taxonomy index. if you replace directories then taxo reader might have ordinals off-sync from main index. One fact about taxonomy index is on deletes or cleanup of main index, dosen't affect ta

Re: Lucene 6.6: "Too many open files"

2017-07-31 Thread Jigar Shah
I faced such problem when I used nomergepolicy, and did some code to manual merging segments which had bug and I had same issue. Make sure you have default AFAIR ConcurrentMergeStrategy enabled. And its is configured with appropriate settings. On Jul 31, 2017 11:21 PM, "Erick Erickson" wrote: >

ProximityQueryNode dosen't allow distance parameter to be 0

2016-10-31 Thread Jigar Shah
In some cases where tokens are indexed at same position. e.g. using (synonym filter). Queryparser Flexible API dosen't allow to create ProximityQueryNode with distance '0'. {code} if (type == Type.NUMBER) { if (distance <= 0) { throw new QueryNodeError(new MessageImpl(

Re: NOT Operator with Parenthesis

2015-10-28 Thread Jigar Shah
LUCENE-6249 <https://issues.apache.org/jira/browse/LUCENE-6249> and LUCENE-6857 <https://issues.apache.org/jira/browse/LUCENE-6857> will be back-ported to 4.10.5. You may not need to jump to 5.X version for this. Thanks, Jigar Shah. On Wed, Oct 28, 2015 at 5:19 AM, patel mr

Re: NOT Operator with Parenthesis

2015-10-27 Thread Jigar Shah
Most probably LUCENE-6249 changes parser's behavior, for your case. On Tue, Oct 27, 2015 at 5:33 AM, patel mrugesh wrote: > Thanks for your reply Erick, > I have gone through the document p

Re: Taxonomy index and payload

2015-05-12 Thread Jigar Shah
Check Facet Associations section in this video. https://www.youtube.com/watch?v=-CNZxkAMcKk On Tue, May 12, 2015 at 4:15 AM, Federico Tolomei wrote: > Hello, > is it possible to add a payload within the facet in the taxonomy index ? > > Thank you > > -- > https://s17t.net > f...@s17t.net > @s1

Matched docIds for each facet value

2015-05-08 Thread Jigar Shah
Hello, Is it possible to get matched docIds for each facet value. As to current we only get count. Let me know the classes internal to Lucene which i should look at just in case if its not exposed in API. Thanks, Jigar Shah.

Re: Top 10 words

2015-02-13 Thread Jigar Shah
If those are the known fields in the documents, you may extract words while indexing and create facets. Lucene supports faceted search which can give you Top n counts of such fields, which is much more efficient. Another option is apply clustering algorithm on results which can provide Top n words

Re: Proximity query

2015-02-12 Thread Jigar Shah
This concept is called Proximity Search in general. In Lucene they are achieved using SpanQuery. On Thu, Feb 12, 2015 at 10:10 PM, Maisnam Ns wrote: > Hi, > > Can someone help me if this use case is possible or not with lucene > > Use case: I have a string say 'Japan' appearing in 10 documents

Re: Faceted Search Hierarchy

2015-01-08 Thread Jigar Shah
arat and later add a document Asia/India, we cannot go back > to the other document and update the hierarchy. > > On Thu, Jan 8, 2015 at 3:27 PM, Jigar Shah wrote: > > > Is there some way to achieve this at Lucene level. so i can get facet > like > > below ? > > >

Re: Faceted Search Hierarchy

2015-01-08 Thread Jigar Shah
ia/Gujarat > > When you ask for top children, you will get Asia + India, both with a count > of 1. > > Shai > > On Thu, Jan 8, 2015 at 1:48 PM, Jigar Shah wrote: > > > Very simple question, on facet > > > > Index has 2 documents as follows: > > > &g

Faceted Search Hierarchy

2015-01-08 Thread Jigar Shah
Very simple question, on facet Index has 2 documents as follows: Doc1 Indexed facet path: Asia/India Doc2 Indexed facet path: India/Gujarat Now while faceted search facets.getTopChildren() Will it return 1(Asia) result or 2(Asia, India) ? So basically will it join values and return hierarchy

lucene-facet-4.10.1 version not changed in 'DirectoryTaxonomyWriter'

2014-10-15 Thread Jigar Shah
Hello Lucene commiters, I saw one inconcistent version usage lucene-facet-4.10.1.jar. lucene-facet-4.10.1.jar uses deprecated 'Version.LUCENE_4_10_0" in class 'DirectoryTaxonomyWriter' 'createIndexWriterConfig' Ignore it if it is deliberate. Thanks,

Re: Exception from FastTaxonomyFacetCounts

2014-10-15 Thread Jigar Shah
r. > > Shai > > On Mon, Oct 13, 2014 at 12:15 PM, Jigar Shah > wrote: > > > In my application i have two intances of SearcherManager. > > > > 1) SearcherManager with 'applyAllDeletes = true' which is used by > Indexer. > >

Re: Exception from FastTaxonomyFacetCounts

2014-10-13 Thread Jigar Shah
onsistent view of the two. > > Shai > > On Tue, Oct 7, 2014 at 10:03 AM, Jigar Shah wrote: > > > Intermittently while search i am getting this exception on huge index. > > (FacetsConfig used while indexing and searching is

Getting min/max of numeric doc-values facets

2014-10-08 Thread Jigar Shah
Is there some way when faceted search is executed, we can retrieve the possible min/max values of numeric doc-values field with supplied custom ranges in (LongRangeFacetCounts) or some other way to do it ? As i believe this can give application hint, and next search request can be much smarter, e.

Exception from FastTaxonomyFacetCounts

2014-10-07 Thread Jigar Shah
com.company.search.CustomDrillSideways.buildFacetsResult(LuceneDrillSideways.java:41) 06:28:37,954 ERROR [stderr] at org.apache.lucene.facet.DrillSideways.search(DrillSideways.java:146) 06:28:37,955 ERROR [stderr] at org.apache.lucene.facet.DrillSideways.search(DrillSideways.java:203) Thanks, Jigar Shah

FacetsConfig usage

2014-10-05 Thread Jigar Shah
en you already have some days of index on old FacetsConfig ? Thanks, Jigar Shah

min/max values of numeric facets

2014-09-25 Thread Jigar Shah
ng with counts in the specified ranges ? Thanks, Jigar Shah.

Re: DrillSideways accepting FacetCollector parameter

2014-07-14 Thread Jigar Shah
rillSideways > and override the buildFacetResult method? That method gets the drill > down and all sideways collectors... > > Mike McCandless > > http://blog.mikemccandless.com > > > On Wed, Jul 9, 2014 at 1:40 AM, Jigar Shah wrote: > > Usecase: > > > &

Re: DrillSideways accepting FacetCollector parameter

2014-07-08 Thread Jigar Shah
or. This is not true in case of DrillSideways. Let me know if, there is already some other way provided. Thanks, Jigar Shah. On Tue, Jul 8, 2014 at 8:15 PM, Michael McCandless < luc...@mikemccandless.com> wrote: > We could do this, but what's the use case? > > E.g. Drill

DrillSideways accepting FacetCollector parameter

2014-07-08 Thread Jigar Shah
, i.e. non sideways facets. Thanks, Jigar Shah.

Re: DocIDs from Facet Results

2014-07-07 Thread Jigar Shah
I think, you need to execute DrilDownQuery to get the docIds. On Mon, Jul 7, 2014 at 4:40 PM, Sandeep Khanzode < sandeep_khanz...@yahoo.com.invalid> wrote: > Hi, > > For Lucene 4.7.2 Facets, once we invoke FacetCollector and get the > topNChildren into FacetResult, is there any mechanism that fo

Re: Searching on Large Indexes

2014-06-27 Thread Jigar Shah
Some points based on my experience. You can think of SolrCloud implementation, if you want to distribute your index over multiple servers. Use MMapDirectory locally for each Solr instance in cluster. Hit warm-up query on sever start-up. So most of the documents will be cached, you will start sav

Re: Lucene Facets Module 4.8.1

2014-06-23 Thread Jigar Shah
e counts of all of them, or the > majority of them, that's ok. But if you know you *always* need the count of > a subset of them, then separating that subset to a different field is > better. > > Hope that clarifies. > > Shai > > > On Mon, Jun 23, 2014 at 4:18 P

Re: Lucene Facets Module 4.8.1

2014-06-23 Thread Jigar Shah
acets > > Something like that... > > Shai > > > On Mon, Jun 23, 2014 at 9:04 AM, Jigar Shah wrote: > > > On commenting > > > > //config.setIndexFieldName("CITY", "city"); at search time, this is > before > > i do, getTopChildren

Re: Lucene Facets Module 4.8.1

2014-06-22 Thread Jigar Shah
r indexFieldName. There's no way for the > default ctor of FastTaxonomyFacetCounts to determine which indexFieldName > to use as it doesn't know which dimensions you're going to ask to count. > > Hope that helps. > > Shai > > > On Sun, Jun 22, 2014 at 4:05 P

Re: Lucene Facets Module 4.8.1

2014-06-22 Thread Jigar Shah
etIndexFieldName("CITY", "city") called. > > > > Or, can you try commenting out 'config.setIndexFieldName("CITY", > > "city")' at index time and see if the exception still happens? > > > > Mike McCandless > > > >

Re: Lucene Facets Module 4.8.1

2014-06-22 Thread Jigar Shah
ot;, "city") called. > > Or, can you try commenting out 'config.setIndexFieldName("CITY", > "city")' at index time and see if the exception still happens? > > Mike McCandless > > http://blog.mikemccandless.com > > > On Sat, Jun 21, 201

Re: Lucene Facets Module 4.8.1

2014-06-22 Thread Jigar Shah
myFacetCounts(String indexFieldName, TaxonomyReader taxoReader, FacetsConfig config, FacetsCollector fc) throws IOException { super(indexFieldName, taxoReader, config); ... } Thanks Jigar Shah. On Sat, Jun 21, 2014 at 11:01 PM, Shai Erera wrote: > If you can, while in debug mode try to note the

Re: Lucene Facets Module 4.8.1

2014-06-20 Thread Jigar Shah
acets defined by constant 'public static final String DEFAULT_INDEX_FIELD_NAME = "$facets";' in FacetsConfig. My question is if i am using same FacetsConfig while indexing and searching. why its not identifying correct name of field, and goes for "$facets" Please correct me if i understood wrong. or correct way to solve above problem. Many Thanks. Jigar Shah.

Lucene Facets Module 4.8.1

2014-06-20 Thread Jigar Shah
Hello, I am getting below exception, and using Drillsideways facets. While getting children i am getting below exception: 17:02:10,496 ERROR [stderr:71] (Thread-2 (HornetQ-client-global-threads-790878673)) java.lang.IllegalArgumentException: dimension "CITY" was not indexed into field "$facets 17

Proximity Search for SENTENCE and PARAGRAPH

2014-04-07 Thread Jigar Shah
paragraph boundaries. And then search using SpanQuery Api. Please let me know if some work done for such features, or some proven approach. Thanks Jigar Shah.