RE: Optimize Lucene 4.4 for CPU usage

2013-08-30 Thread Rose, Stuart J
I've noticed that processes that were previously IO bound (in 3.5) are now CPU bound (in 4.4) and I expect it is due to the compression/decompression of term vector fields in 4.4. It would be nice if users of 4.4 could turn the compression OFF entirely. -Original Message- From: Iva

optimal way to access many TermVectors

2013-10-07 Thread Rose, Stuart J
Is there an optimal way to access many document TermVectors (in the same chunk) consecutively when using the LZ4 termvector compression? I'm curious to know whether all TermVectors in a single compressed chunk are decompressed and cached when one TermVector in the same chunk is accessed? Also w

updating docs when using SortedSetDocValuesFacetFields

2014-01-21 Thread Rose, Stuart J
I'm using Lucene 4.4 with SortedSetDocValuesFacetFields and would like to add and/or remove CategoryPaths for certain documents in the index. Basically, as additional sets of docs are added, the CategoryPaths for some of the previously indexed documents need to changed. My current testing with

RE: BytesRef equals() method

2014-01-21 Thread Rose, Stuart J
I agree that comparing the BytesRef lengths in an equals() method seems counter to the purpose of having a BytesRef class. I'd recommend taking a look at the BytesRefHash which maps BytesRef objects to unique ids as it 'may' be more efficient than converting to Strings. Stuart -Original

RE: Only highlight terms that caused a search hit/match

2014-02-16 Thread Rose, Stuart J
Hi Steve, We leveraged the SpanQuery and Highlighting APIs in 3.5 a couple of years ago to do this. In order to get accurate doc hits for the types of phrases that we needed to support search on, we defined a phrase query syntax and then implemented a span query parser to create a nested struc

RE: Order docIds to reduce disk seeks

2014-11-17 Thread Rose, Stuart J
Hi Vijay, ...sorting the documents you need to retrieve by docID order first... means sorting them by their 'document number' which is the value in the 'scoreDoc.doc' field and is the value that the reader takes to 'retrieve' the document from the index. If you write a comparator to sort the

Q on facets performance with CachingWrapperFilter in Lucene 4.4

2015-07-26 Thread Rose, Stuart J
I've noticed some interesting and unexpected behavior regarding performance of the Facets aggregation in Lucene 4.4 and am wondering if anyone has come across this before and can offer insight to potential factors. In a nutshell, the CachingWrapperFilter results in significant performance gain

retrieved doc field values being cached?

2012-02-24 Thread Rose, Stuart J
Lucene (using 3.5) seems to be caching field values for documents (after they have been retrieved) and I am hoping someone can provide more information on how and where exactly the field values are stored. The table below lists the times (in milliseconds) associated with retrieving for a set o

RE: retrieved doc field values being cached?

2012-02-24 Thread Rose, Stuart J
armup queries before you swap your search in to serve user queries to get best performance. hope that helps simon On Fri, Feb 24, 2012 at 10:18 PM, Rose, Stuart J wrote: > > Lucene (using 3.5) seems to be caching field values for documents (after they > have been retrieved) and I am hopi