Why we need org.apache.lucene.codecs.Codec

2016-08-03 Thread aravinth thangasami
Hi all,

Re: Why we need org.apache.lucene.codecs.Codec

2016-08-03 Thread aravinth thangasami
I don't understand why we need to add custom codec name in this file Thanks & Regards Aravinth On Thu, Aug 4, 2016 at 11:52 AM, aravinth thangasami < aravinththangas...@gmail.com> wrote: > Hi all, > >

Clarification Regarding TieredMergePolicy

2016-09-27 Thread aravinth thangasami
Hi All, In TieredMergePolicy, After the sum of segments size reached the *MaxMergedSegmentSize* the merge will be called. My case is, The first merge is triggered by *MaxMergedSegmentSize.* After that,On addition of one more segment to the index, the sum of sizes of segments may be greater than th

Re: Clarification Regarding TieredMergePolicy

2016-09-27 Thread aravinth thangasami
again, > unless it accumulates > 50% deleted documents. > > Mike McCandless > > http://blog.mikemccandless.com > > > On Tue, Sep 27, 2016 at 4:03 PM, aravinth thangasami > wrote: > > Hi All, > > > > In TieredMergePolicy, After the sum of segments size r

Clarification Regarding Directory & Merging

2016-10-11 Thread aravinth thangasami
Hi all, Does Directories (SimpleFSDirectory, NIOFSDirectory, MMapDirectory) have any performance impact while indexing ? If Directory improves reading based on platforms, will it have any impact on merging ? Thanks Aravinth

Clarfication on using Docvalues

2017-02-27 Thread aravinth thangasami
Hi all, I'm trying to implement sort using DocValues. As SortedDocValue is equivalent of pre-sorted BinaryDocvalue and SortedNumericDocvalue is pre-sorted version of NumericDocvalue I'm able to sort on NumericDocvalue Field but not on BinaryDocValue Field Why Lucene allows sorting NumericDocValue

Best way to do time sort

2017-03-13 Thread aravinth thangasami
I am indexing a time field. I need to get the latest results and my index contains millions of documents with frequent updates almost 90% of the index updates. option 1: To index the time as string option 2: To index the time field as Number option 3: To index the time field as DocValue Wh

Docvalue - Sorting on Numeric Docvalue

2017-03-24 Thread aravinth thangasami
Hi all, I'm analysing sorting using doc values. Please correct me if I am wrong 1.In SortedNumericDocvalueField, the sorting apply across field, not across segment 2.While Sorting the SortedNumericSelector forms a NumericDocvalue from SortedSetDocvaluesField So, I thought for a SinglevalueFiel

Indexing Numeric value in Lucene 4.10.4

2017-04-06 Thread aravinth thangasami
Hi all, I'm searching numeric value and will not perform range query on that field I thought of indexing it as String field instead of NumericField so that it will improve indexing time by avoiding numeric tries What are your opinions on this? Kind regards, Aravinth

Re: Indexing Numeric value in Lucene 4.10.4

2017-04-06 Thread aravinth thangasami
the index would be smaller with strings. > I'm certain comparisons would be slower. I really can't come up with > much of any reason why strings would be better. > > Not to mention that sorting won't work unless you left-pad with zeros. > > Best, > Erick &

Re: Indexing Numeric value in Lucene 4.10.4

2017-04-07 Thread aravinth thangasami
; > Uwe > > - > Uwe Schindler > Achterdiek 19, D-28357 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > > > -Original Message- > > From: aravinth thangasami [mailto:aravinththangas...@gmail.com] > > Sent: Friday, April 7, 2017 8:54 AM >

Early Termination of Queries

2017-04-18 Thread aravinth thangasami
Hi all, *EarlyTerminatingSortingCollector* in lucene takes N documents from each segment. I have a case where i need to get the result from latest segment alone will be enough to provide the results. On finding N results in latest segment i will stop searching What is your opinion on this ?? wi

Adding Docvalues to a Field

2017-05-05 Thread aravinth thangasami
Hi all, On process of moving to Lucene 5 from Lucene 4, we faced this following issue We have enabled doc values in Lucene 5.we previously don't used doc values in Lucene 4 Using UninvertingReader, sorting works fine until the first merge happens. On merge documents in the older version without d

Re: Adding Docvalues to a Field

2017-05-05 Thread aravinth thangasami
gt; > I'd create a new collection and re-index it entirely, then use > collection aliasing to point the applications at the new collection. > > Best, > Erick > > On Fri, May 5, 2017 at 2:49 AM, aravinth thangasami > wrote: > > Hi all, > > > > On process o

Re: Adding Docvalues to a Field

2017-05-05 Thread aravinth thangasami
onger needed. > > > > I hope that helps. I can post code that should do this. There is no > ready to > > use tool available, because you need to correctly configure the > uninverter. > > > > Uwe > > > > Am 5. Mai 2017 22:12:13 MESZ schrieb aravinth th

Lucene Grouping Search - Performance

2017-05-11 Thread aravinth thangasami
Hi all, On experimenting with Lucene Group Search in Lucene 4.10, Once Field Cache is formed, We recorded better performance with Field cache compared to doc values. So I decided to avoid doc values on that field. Our Index involves 80% of updates. How much will this affect field cache? Is it

Improving Performance by Combining Multiple Fields into Single Field

2017-06-21 Thread aravinth thangasami
Hi all, We are doing experiment, that combining multiple fields into single field as using it as StoredField While retrieving, Instead of retrieving multiple time, we can do with the Single call. we thought of avoiding multiple disk calls for reading multiple fields. we have an index with million

Re: Improving Performance by Combining Multiple Fields into Single Field

2017-06-21 Thread aravinth thangasami
e call to > IndexReader.document with the list of fields that you want to retrieve > rather than call this method once for each field, or you will pay the price > for decompression every time. > > Le mer. 21 juin 2017 à 10:44, aravinth thangasami < > aravinththangas...@gmail.com> a

Re: Improving Performance by Combining Multiple Fields into Single Field

2017-06-21 Thread aravinth thangasami
Hi, Reading through the web, How elastic search's *_source* field stores entire document and use* _source* for field retrieving. Does it better than* document.get * or loading entire *indexreader.document ?* Thanks Aravinth On Wed, Jun 21, 2017 at 10:18 PM, aravinth thang

Clarification on Multiple calls to Off heap Memory & on combining storedFields

2017-06-28 Thread aravinth thangasami
Hi all, Our search System built on top of Lucene 4.10 we are having multiple indexes on a single machine with each index have millions of documents each having about 500 to 1000 fields we are using Lucene40StoredFieldsFormat to avoid compression of storedFields it seems to give us better performa

Encryption At Rest - Using CustomAnalyzer

2017-12-04 Thread aravinth thangasami
Hi all, To support Encryption at Rest, We have written a custom analyzer, that encrypts every token in the Input string and proceeds to the default indexing chain We are using AES/CTR/NoPadding with unique Key Per User. This helps that the input string with common prefix, the encrypted strings wi

Re: Encryption At Rest - Using CustomAnalyzer

2018-02-05 Thread aravinth thangasami
Kindly post your suggestions. On Mon, Dec 4, 2017 at 11:27 PM, aravinth thangasami < aravinththangas...@gmail.com> wrote: > Hi all, > > To support Encryption at Rest, We have written a custom analyzer, that > encrypts every token in the Input string and proceeds to the defaul

Getting Matched Text and Field for a Document

2019-09-20 Thread aravinth thangasami
Dear all, Along with the search results, we are trying to find which text and field gave the result. For that, We have written a custom collector for collecting query from the scorer along with the ScoreDoc. Please clarify to us is there anything wrong/limitations in our approach. Are there any