Re: Recommended number of fields in one lucene index

2017-02-15 Thread Kumaran Ramasubramanian
Hi Adrien Grand, Thanks for the response. a binary blob that > stores all the data so that you can perform updates. Could you elaborate on this? Do you mean to have StoredField as mentioned below to store all other fields which are needed only for updates? is there any way to use updatedocument

Re: Recommended number of fields in one lucene index

2017-02-15 Thread Adrien Grand
I think it is hard to come up with a general rule, but there is certainly a per-field overhead. There are some things that we need to store per field per segment in memory, so if you multiply the number of fields you have, you could run out of memory. In most cases I have seen where the index had s

Recommended number of fields in one lucene index

2017-02-15 Thread Kumaran Ramasubramanian
Hi All, Elasticsearch allows 1000 fields by default. In lucene, What are the indexing and searching performance impacts of having 10 fields vs 3000 fields in a lucene index? In my case, while indexing, i index and store all fields and so i can provide update on one field where we use to take out

[ANNOUNCE] Apache Lucene 5.5.4 released

2017-02-15 Thread Adrien Grand
15 February 2017, Apache Luceneā„¢ 5.5.4 available The Lucene PMC is pleased to announce the release of Apache Lucene 5.5.4 Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires fu

Re: Numeric Ranges Faceting

2017-02-15 Thread Chitra R
Hi, Thanks for the suggestion. But in the case of drill sideways search, retrieving allDimensions (using Facets.getAllDimension()) threw an exception which is shown below... 1. While opening DocValuesReaderState, global ordinals and ordinals Range map will be computed for '$facets' f

Re: Building an automaton efficiently (CompiledAutomaton vs RunAutomaton vs Automaton)

2017-02-15 Thread Michael McCandless
Actually, that's a great idea to try (Oliver). It would be a relatively simple conversion... maybe Lucene could add some sugar on top, e.g. to convert an FST to an automaton. Hmm, maybe it even exists somewhere already... But even the FST Builder's NodeHash can be non-trivial in its heap usage,

Re: Building an automaton efficiently (CompiledAutomaton vs RunAutomaton vs Automaton)

2017-02-15 Thread Dawid Weiss
Yep, true. I just wonder whether it's worth complicating the code... Could be easier to build an FST and then recreate a RunAutomaton from that directly... :) Dawid On Wed, Feb 15, 2017 at 11:26 AM, Michael McCandless wrote: > We may be able to make DaciukMihovAutomatonBuilder's state registry >

Re: Building an automaton efficiently (CompiledAutomaton vs RunAutomaton vs Automaton)

2017-02-15 Thread Michael McCandless
We may be able to make DaciukMihovAutomatonBuilder's state registry more ram efficient too ... I think it's essentially the same thing as the FST.Builder's NodeHash, just minus the outputs that FSTs have vs automata. Mike McCandless http://blog.mikemccandless.com On Wed, Feb 15, 2017 at 5:14 AM

Re: Building an automaton efficiently (CompiledAutomaton vs RunAutomaton vs Automaton)

2017-02-15 Thread Dawid Weiss
You could try using morfologik's byte-based implementation: https://github.com/morfologik/morfologik-stemming/blob/master/morfologik-fsa-builders/src/test/java/morfologik/fsa/builders/FSABuilderTest.java I can't guarantee it'll be fast enough -- you need to sort those input sequences and even thi

Re: Building an automaton efficiently (CompiledAutomaton vs RunAutomaton vs Automaton)

2017-02-15 Thread Oliver Mannion
Hi Mike, Thanks for the suggestion, I've tried Operations.run on a Automaton and it's fast enough for my use case. However, the real problem I have is in building the Automaton via DaciukMihovAutomatonBuilder. On my input string set it consumes quite a bit of CPU, a lot of which seems to be GC ac

Re: Numeric Ranges Faceting

2017-02-15 Thread Michael McCandless
Hi, have a look at the RangeFacetsExample.java under the lucene/demo module... it shows how to do this. Mike McCandless http://blog.mikemccandless.com On Tue, Feb 14, 2017 at 12:07 PM, Chitra R wrote: > Hi, >We have planned to implement both string and numeric faceting using > docvalues f