Reusable Performance Tests

2014-06-20 Thread Umashanker, Srividhya
Are there any performance test suites available in lucene codebase which can be reused by us to benchmark against our lucene infrastructure? We are looking at mainly multithreaded indexing tests. -Vidhya

Re: search performance

2014-06-20 Thread Jamie
Hi All Thank you for all your suggestions. Some of the recommendations hadn't yet been implemented, as our code base was using older versions of Lucene with reduced capabilities. Thus, far, all the recommendations for fast search have been implemented (e.g. using pagination with searchAfter,

Re: search performance

2014-06-20 Thread Jamie
Greetings Lucene Users As a follow-up to my earlier mail: We are also using Lucene segment warmers, as per recommendation, segments per tier is now set to five, buffer memory is set to (Runtime.getRuntime().totalMemory()*.08)/1024/1024; See below for code used to instantiate writer:

RE: search performance

2014-06-20 Thread Uwe Schindler
Hi, > Am I correct that using SearchManager can't be used with a MultiReader and > NRT? I would appreciate all suggestions on how to optimize our search > performance further. Search time has become a usability issue. Just have a SearcherManger for every index. MultiReader construction is cheap

EarlyTerminatingSortingCollector help needed..

2014-06-20 Thread Ravikumar Govindarajan
I was planning to use ETSC in-conjunction with SortingMergePolicy and got stuck. In ESTC, we have @Override public void collect(int doc) throws IOException { in.collect(doc); if (++numCollected >= numDocsToCollect) { throw new CollectionTerminatedException(); } } I und

AW: fuzzy/case insensitive AnalyzingSuggester )

2014-06-20 Thread Clemens Wyss DEV
Sorry for re-asking. Has anyone implemented an AnalyzingSuggester which - is fuzzy - is case insensitive (or must/should this be implemented by the analyzer?) - does infix search [- has a small memory footprint] -Ursprüngliche Nachricht- Von: Clemens Wyss DEV [mailto:clemens...@mysign.ch

Lucene Facets Module 4.8.1

2014-06-20 Thread Jigar Shah
Hello, I am getting below exception, and using Drillsideways facets. While getting children i am getting below exception: 17:02:10,496 ERROR [stderr:71] (Thread-2 (HornetQ-client-global-threads-790878673)) java.lang.IllegalArgumentException: dimension "CITY" was not indexed into field "$facets 17

RE: fuzzy/case insensitive AnalyzingSuggester )

2014-06-20 Thread Oliver Christ
Hi Clemens, I haven't yet built a suggester which combines all three, and am not aware of one. I'd love to have one though ;-) Case- and diacritics insensitivity is supported out-of-the-box by the analyzing suggesters, including the FuzzySuggester. The logic is in the Analyzer. I haven't yet t

Concurrent Indexing

2014-06-20 Thread Umashanker, Srividhya
Lucene Experts - Recently we upgraded to Lucene 4. We want to make use of concurrent flushing feature Of Lucene. Indexing for us includes certain db operations and writing to lucene ended by commit. There may be multiple concurrent calls to Indexer to publish single/multiple records. So far,

Re: Concurrent Indexing

2014-06-20 Thread Vitaly Funstein
You could just avoid calling commit() altogether if your application's semantics allow this (i.e. it's non-transactional in nature). This way, Lucene will do commits when appropriate, based on the buffering settings you chose. It's generally unnecessary and undesirable to call commit at the end of

Re: search performance

2014-06-20 Thread Vitaly Funstein
If you are using stored fields in your index, consider playing with compression settings, or perhaps turning stored field compression off altogether. Ways to do this have been discussed in this forum on numerous occasions. This is highly use case dependent though, as your indexing performance may o

Re: Lucene Facets Module 4.8.1

2014-06-20 Thread Shai Erera
How do you add facets to your documents? Did you play with the FacetsConfig, such as alter the field under which the CITY dimension is indexed? If you can reproduce this failure in a simple program, I guess it will be easy to spot the error. Looks like a configuration error to me... Shai On Fri

Re: Concurrent Indexing

2014-06-20 Thread Umashanker, Srividhya
It is non transactional. We first write the same data to database in a transaction and then call writer addDocument. If lucene fails we still hold the data to recover. I can avoid the commit if we use NRT reader. We do need this to be searchable immediately. Another question. I did try removi

Re: Concurrent Indexing

2014-06-20 Thread Vitaly Funstein
Hmm, I might have actually given you a slightly incorrect explanation wrt what happens when internal buffers fill up. There will definitely be a flush of the buffer, and segment files will be written to, but it's not actually considered a full commit, i.e. an external reader will not see these chan

Re: Lucene Facets Module 4.8.1

2014-06-20 Thread Jigar Shah
Thanks for helping me. Yes, i did couple of things: Below is simple code for indexing which i use. TrackingIndexWriter nrtWriter DirectoryTaxonomyWriter taxoWriter = ... FacetsConfig config = new FacetConfig(); config.setHierarchical("CITY", true) config.setMultiValued("CITY", true); config

Re: Concurrent Indexing

2014-06-20 Thread Umashanker, Srividhya
Let me try with the NRT and periodic commit say every 5 mins in a committer thread on need basis. Is there a threshold limit on how long we can go without committing ? I think the buffers get flushed to disk but not to crash proof on disk. So we should be good on memory. I should also verify

Re: Concurrent Indexing

2014-06-20 Thread Vitaly Funstein
This is a better idea than what you had before, but I don't think there's any point in doing any commits manually at all unless you have a way of detecting and recovering exactly the data that hasn't been committed. In other words, what difference does it make whether you lost 1 index record or 1M,

Re: Concurrent Indexing

2014-06-20 Thread Umashanker, Srividhya
We do have a way to recover partially with a version number for each transaction. The same version maintained in lucene as one document. During startup these numbers define what has to be syncd up. Unfortunately lucene is used in a webapp, so this happens "only" during a jetty restart. - Vidhy

Re: Concurrent Indexing

2014-06-20 Thread Vitaly Funstein
Hmm, I'm not sure you want to rely on the presence or absence of a particular document in the index to determine the recovery point. It may work for inserts, but not likely for updates or removes. I would look into driving the version numbers from the commiter to the DB, and record them as commit u