that!
>
>
>
> Patrick
>
>
>
> Ravikumar Govindarajan 于2021年5月24日周一
>
> 上午11:49写道:
>
>
>
> > Thanks Patrick for the help!
>
> >
>
> > May I know what lucene version you're using?
>
> > >
>
> >
>
> > We are using
e current default directory implementation is
> MMapDirectory, which delegate the caching to the system and should have
> already optimized this situation. Here's a great blog explaining the
> MMapDirectory in lucene:
> https://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64b
, including mocking LiveDocs to get
> the right documents into the right segments.
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Sat, May 22, 2021 at 3:50 PM Ravikumar Govindarajan <
> ravikumar.govindara...@gmail.com> wrote:
>
>> Hello
Hello,
We have a use-case for index-rewrite on a "frozen index" where no new
documents are added. It goes like this..
1. Get all segments for the index (base-segment-list)
2. Create a new segment from base-segment-list with unique set of docs
(LiveDocs)
3. Repeat step 2, for a fixed c
rite a similarity wrapper that will read the needed information from
> a hash map.
>
> Regards
> Ameer
>
>
>
> On Wed, 4 Dec 2019 at 00:55, Ravikumar Govindarajan <
> ravikumar.govindara...@gmail.com> wrote:
>
> > >
> > > it is enough to giv
>
> it is enough to give each its own field.
>
I kind of over-simplified the problem at hand. Apologies.
DOC_TYPE is just one aspect of the problem. The other one is that, it is
actually shared index where there are multiple-users (100-3000 users per
index). There are many hundreds of such shared
Hello,
We are using TF-IDF for scoring (Yet to migrate to BM25). Different
entities (DOC_TYPES) are crunched & stored together in a single index.
When it comes to IDF, I find that there is a single value computed across
documents & stored as part of TermStats, whereas our documents are not
homoge
t than
> FeatureField but would allow sorting in either ascending or descending
> order.
>
>
>
> On Tue, Jul 2, 2019 at 3:01 PM Ravikumar Govindarajan
> wrote:
> >
> > Our Sort Fields utilize DocValues..
> >
> > Lets say I collect min-max ords of a Sort F
Our Sort Fields utilize DocValues..
Lets say I collect min-max ords of a Sort Field for a block of documents
(128, 256 etc..) at index-time via Codec & store it as part of DocValues at
a Segment level..
During query time, could we take advantage of this Stats when Top-N query
with Sort Field is r
You can know exactly which ops made it into your
> commit and which didn't.
>
> TrackingIndexWriter is replaced by the sequence numbers.
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> On Thu, Aug 10, 2017 at 9:37 AM, Ravikumar Govindarajan <
> ravikum
t; On Thu, Aug 10, 2017 at 6:57 AM, Ravikumar Govindarajan <
> ravikumar.govindara...@gmail.com> wrote:
>
> > Every mutation (Add/Update/Delete) has a transaction-id (incremental
> long)
> > assigned by our Messaging Queue (Kafka)
> >
> > To index these mutati
Every mutation (Add/Update/Delete) has a transaction-id (incremental long)
assigned by our Messaging Queue (Kafka)
To index these mutations, an indexer thread pulls data from the queue, adds
& commits to IndexWriter, then updates the latest transaction-id in an
external system (ZooKeeper). During
>
> Let’s say I have a user info index and user id is the ‘primary key’. So
> when I do a userid term search, will lucene traverse all segments to search
> a 'primary key'term or will it stop as soon as it get one?
Lucene in general will search all segments for primary key. But in case you
want a
hing like this:
> http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html
>
> Kind regards,
> Uwe
>
> -
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: u...@thetaphi.de
>
> > -Original Message---
When we use NIOFSDirectory, lucene internally uses buffering via
BufferedIndexInput (1KB etc...) while reading from the file..
However, for MmapDirectory (ByteBufferIndexInput) there is no such
buffering & data is read from the mapped bytes directly...
Will it be too much of a performance drag if
8, 2016 at 1:41 PM, Robert Muir wrote:
> Can you run checkindex and include the output?
>
> On Mon, Aug 8, 2016 at 2:36 AM, Ravikumar Govindarajan
> wrote:
> > For some of the segments we received the following exception during merge
> > as well as search. They look to be
For some of the segments we received the following exception during merge
as well as search. They look to be corrupt [Lucene 4.6.1 & Sun JDK
1.7.0_55]
Is this a known bug? Any help is much appreciated
The offending line of code is in ForUtil.readBlock() method...
*final int encodedSize = encode
Came across a JIRA filed for pooling IndexReaders
https://issues.apache.org/jira/browse/LUCENE-2297
For every commit/delete/update cycle IndexWriter opens a bunch of
SegmentReaders, does the job & closes it.
Does the JIRA aim to re-use the SegmentReaders for all commit-cycles till
they are fina
385)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:1374)
at org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:89)
at
org.apache.blur.store.hdfs.HdfsIndexInput.readInternal(HdfsIndexInput.java:62)
On Tue, May 10, 2016 at 11:32 AM, Ravikumar Govindarajan <
ravikumar.govindara...@gmail.com> wrote:
Sometimes during an ongoing search we receive an IndexReaderClosedException
& found that it happens when a merge is completed.
We are on an older version of lucene (4.6.1).
IndexFileDeleter (KeepOnlyLastCommitDeletionPolicy) deletes the file after
the merge completes but we have an open IndexSear
On lucene-4.6.1, is there a way specify during search that only docs need
to be iterated/searched and frequency need to be skipped…
I saw DocsEnum.FLAG_NONE meant for this, but could not find out how to pass
this via a search query…
My assumption is that skipping frequencies could speed up search
; and it would be interesting to see how it would affect multi-term
> queries compared to lz4 blocks.
>
> [1] https://en.wikipedia.org/wiki/Byte_pair_encoding
>
> On Fri, Jul 3, 2015 at 12:09 PM, Ravikumar Govindarajan
> wrote:
> > An unrelated question…
> >
&g
ehave well?
Currently we don't have plans of providing queries like Fuzzy/Re-spell
etc.. and thought could benefit from it
On Thu, Jul 2, 2015 at 6:02 PM, Ravikumar Govindarajan <
ravikumar.govindara...@gmail.com> wrote:
> Thanks Adrien…
>
> Works like a charm!!!
>
>
Thanks Adrien…
Works like a charm!!!
On Wed, Jul 1, 2015 at 10:22 PM, Adrien Grand wrote:
> Hi Ravikumar,
>
> You need to run a BooleanQuery with two clauses:
> - a must clause that matches all parent documents
> - a must_not clause that matches all parents that have children
>
> Building thi
We have organised our segments in parent-child blocks and wish to
periodically delete parent-documents that don't have any children to
reclaim space via IndexWriter.deleteDocuments(Query)…
Is it possible to draft a Query that identifies such parents? Any help is
much appreciated…
--
Ravi
Apr 28, 2015 at 6:03 PM, Adrien Grand wrote:
> On Tue, Apr 21, 2015 at 10:00 AM, Ravikumar Govindarajan
> wrote:
> > Thanks for the comments…
> >
> > My only
> >> concern about using the FixedBitSet is that it would make sorting each
> >> postin
Thanks. Glad that it has been pro-actively identified and fixed
--
Ravi
On Thu, Apr 23, 2015 at 10:34 AM, Robert Muir wrote:
> On Tue, Apr 21, 2015 at 4:00 AM, Ravikumar Govindarajan
> wrote:
>
> > b) CompressingStoredFieldsReader did not store the last decoded 32KB
> chunk
> assume you are still on 4.x)?
>
> I'm curious if you already performed any kind of benchmarking of this
> approach?
>
>
> On Tue, Apr 14, 2015 at 2:07 PM, Ravikumar Govindarajan
> wrote:
> > We were experimenting with SortingMergePolicy and came across an
>
We were experimenting with SortingMergePolicy and came across an alternate
solution to TimSort of postings-list using FBS & GrowableWriter.
I have attached relevant code-snippet. It would be nice if someone can
clarify whether it is a good idea to implement...
public class SortingAtomicReader {
…
wrote:
> Sounds like a job for
> org.apache.lucene.analysis.miscellaneous.PerFieldAnalyzerWrapper.
>
>
> --
> Ian.
>
>
> On Tue, Feb 17, 2015 at 8:51 AM, Ravikumar Govindarajan
> wrote:
> > We have a requirement in that E-mail addresses need to be added in a
&
We have a requirement in that E-mail addresses need to be added in a
tokenized form to one field while untokenized form is added to another field
Ex:
"I have mailed a...@xyz.com" . It should tokenize as below
body = {"I", "have", "mailed", "abc", "xyz", "com"};
I also have a body-addr field. To
s...
We switched it to write using ForUtil even if block-size<128 and perf was
much better and predictable.
Are there any particular reasons for taking the VInt approach?
Any help on this issue is appreciated
--
Ravi
On Tue, Nov 18, 2014 at 12:49 PM, Ravikumar Govindarajan <
ravikumar.govindar
Hi,
I am finding that lucene is slowing down a lot when bigger and bigger
doc/pos files are merged... While it's normally the case, the worrying part
is all my data is in RAM. Version is 4.6.1
Some sample statistics took after instrumenting the SortingAtomicReader
code, as we use a SortingMergePo
Sometimes TCBJQ returns parent-doc itself as a child-doc. I traced it down
to the following code...
public int advance(int childTarget) throws IOException {
...
final int firstChild = parentBits.prevSetBit(parentDoc-1);
//System.out.println(" firstChild=" + firstChild);
childTarget = Math
FYI, there is SirenDB on top of lucene that addresses such concerns...
It supports multi-level parent-child relationships and provides nice
querying capabilities...
--
Ravi
On Thu, Jul 31, 2014 at 12:59 PM, Ravikumar Govindarajan <
ravikumar.govindara...@gmail.com> wrote:
> We are pl
We are planning to use block-indexing and ToChildBlockJoin queries...
Each parent-doc can contain anywhere between 1-2000 children-docs and is
highly variable.
A sample user-stats is as follows
1. No.of. parent-docs = 500K
2. Children -per parent = 50
3. Total-docs = 25 Million
4. Size occupied
;
>
>
> On Thu, Jul 3, 2014 at 3:22 AM, Ravikumar Govindarajan <
> ravikumar.govindara...@gmail.com > wrote:
>
> > In case of sorting, updatable DocValues may be what you are looking for.
> >
> > But updatable fields for searching is a different beast.
> >
>
In case of sorting, updatable DocValues may be what you are looking for.
But updatable fields for searching is a different beast.
A sample approach is documented at
http://www.flax.co.uk/blog/2012/06/22/updating-individual-fields-in-lucene-with-a-redis-backed-codec/
The general problems with upd
d values for a few documents,
> - doc values when loading a few field values for many documents.
Thanks for this clarification. Shall surely move towards doc-values...
--
Ravi
On Mon, Jun 23, 2014 at 5:36 PM, Adrien Grand wrote:
> On Sun, Jun 22, 2014 at 6:44 PM, Ravikumar Govindarajan
&g
m seek. [
http://blog.jpountz.net/post/35667727458/stored-fields-compression-in-lucene-4-1
]
If so, then what could make DocValues still a winner?
--
Ravi
On Sat, Jun 21, 2014 at 6:41 PM, Adrien Grand wrote:
> Hi Ravikumar,
>
> On Fri, Jun 20, 2014 at 12:14 PM, Ravikumar Govindarajan
>
I was planning to use ETSC in-conjunction with SortingMergePolicy and got
stuck.
In ESTC, we have
@Override
public void collect(int doc) throws IOException {
in.collect(doc);
if (++numCollected >= numDocsToCollect) {
throw new CollectionTerminatedException();
}
}
I und
t to map them to 4 GLOBALLY SORTED documents.
> If
> > you make a local decision based on these 4 documents, you will end up w/
> a
> > completely messed up segment.
> >
> > I think the global DocMap is really required. Forget about that that
> other
> > co
ap, like Lucene code does ...
>
> If I miss your point, I'd appreciate if you can point me to a code example,
> preferably in Lucene source, which demonstrates the problem.
>
> Shai
>
>
> On Tue, Jun 17, 2014 at 3:03 PM, Ravikumar Govindarajan <
> ravikumar.govin
.8, and now
> you don't really need to implement a Sorter, but rather pass a SortField,
> if that works for you.
>
> Shai
>
>
> On Tue, Jun 17, 2014 at 9:41 AM, Ravikumar Govindarajan <
> ravikumar.govindara...@gmail.com> wrote:
>
> > Shai,
&
.
>
> Shai
>
>
> On Tue, Jun 17, 2014 at 4:04 AM, Ravikumar Govindarajan <
> ravikumar.govindara...@gmail.com> wrote:
>
> > I am planning to use SortingMergePolicy where all the merge-participating
> > segments are already sorted... I understand that I need to d
I am planning to use SortingMergePolicy where all the merge-participating
segments are already sorted... I understand that I need to define a DocMap
with old-new doc-id mappings.
Is it possible to optimize the eager loading of DocMap and make it kind of
lazy load on-demand?
Ex: Pass List to the c
McCandless <
luc...@mikemccandless.com> wrote:
> On Wed, May 21, 2014 at 10:50 AM, Ravikumar Govindarajan
> wrote:
> >>
> >> But does that mean SEQUENTIAL will evict the
> >> page once we're done reading it?
> >
> >
> > Yes, looks like it do
..
There are also sneaky ways to
> invoke some of these OS-level APIs without using JNI
This is cool stuff... Saves an amazing amount of effort for most of the
things...
--
Ravi
On Wed, May 21, 2014 at 7:13 PM, Michael McCandless <
luc...@mikemccandless.com> wrote:
> On Wed, May
ecause then the OS knows to read
> ahead and aggressively free the page once we are done using it.
>
> There is also O_DIRECT (e.g., using NativeUnixDirectory) for direct IO
> to bypass the buffer cache entirely.
>
>
> Mike McCandless
>
> http://blog.mikemccandless.com
&g
Is it a good idea to use FADVISE_DONTNEED/MADVISE_DONTNEED flags during
segment merge reads?
Buffer-Cache contains critical data belonging to searches. A segment-merge
has the potential to disturb the cache no?
--
Ravi
Hi,
I have few questions related to updatable DocValues API... It would be
great if I can get help.
1. Is it possible to provide updateNumericDocValue(Term term,
Map), incase I wish to update multiple-fields and it's
doc-values?
2. Instead of a "Term" based update, is it possible to extend it to
I was just trying to implement a StoredFieldsWriter[4.6.1] and found that
finishDocument() method has an empty impl. Any reason for not declaring it
abstract? We could easily miss over-riding it
--
Ravi
eIndexMergePolicy.html You just have to (anonymously) subclass
> > > UpgradeIndexMergePolicy and return true from "protected boolean
> > shouldUpgradeSegment(SegmentCommitInfo si)" only for the segment to
> > be merged. By default this returns true for segments t
Hi,
Is it possible to merge a single segment all by itself, may be just
accounting for deletes alone?
This is needed so as to solve certain data-locality issues we face in a
custom implementation of Directory API.
--
Ravi
Thanks Mike for your time and help
On Monday, February 17, 2014, Michael McCandless
wrote:
> On Mon, Feb 17, 2014 at 8:33 AM, Ravikumar Govindarajan
> > wrote:
> >>
> >> Well, this will change your scores? MultiReader will sum up all term
> >> statistics
>
> Well, this will change your scores? MultiReader will sum up all term
> statistics across all SegmentReaders "up front", and then scoring per
> segment will use those top-level weights.
Our app needs to do only matching and sorting. In-fact, it would be fully
OK to by-pass scoring. But I feel
balanced and your indexing
> performance will degrade because of unbalanced amount of IO that happens
> during the merge.
>
> Shai
>
>
> On Thu, Feb 13, 2014 at 7:25 AM, Ravikumar Govindarajan <
> ravikumar.govindara...@gmail.com> wrote:
>
> > @Mike,
> >
jacent
> segments and SortingMP ensures the merged segment is also sorted.
>
> Shai
>
>
> On Wed, Feb 12, 2014 at 3:16 PM, Ravikumar Govindarajan <
> ravikumar.govindara...@gmail.com> wrote:
>
> > Yes exactly as you have described.
> >
> > Ex: Consider S
then searched
> in "reverse segment order"?
>
> I think you should be able to do this w/ SortingMergePolicy? And then
> use a custom collector that stops after you've gone back enough in
> time for a given search.
>
> Mike McCandless
>
> http://blog.mikemc
why you need to encouraging merging of the more
> recent (by your "time" field) segments...
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Fri, Feb 7, 2014 at 8:18 AM, Ravikumar Govindarajan
> wrote:
> > Mike,
> >
er's infoStream
> and do a long running test to convince yourself the merging is being
> sane.
>
> Mike
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Thu, Feb 6, 2014 at 11:24 PM, Ravikumar Govindarajan
> wrote:
> > Thanks Mike,
> &g
t has improved, so that
> you can e.g. pull your own TermsEnum and iterate the terms yourself.
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Thu, Feb 6, 2014 at 5:16 AM, Ravikumar Govindarajan
> wrote:
> > I use a Codec to flush data. All methods delegat
I use a Codec to flush data. All methods delegate to actual Lucene42Codec,
except for intercepting one single-field. This field is indexed as an
IntField [Numeric-Trie...], with precisionStep=4.
The purpose of the Codec is as follows
1. Note the first BytesRef for this field
2. During finish() ca
..@mikemccandless.com> wrote:
> On Wed, Dec 18, 2013 at 3:15 AM, Ravikumar Govindarajan
> wrote:
> > Thanks Mike for a great explanation on Flush IOException
>
> You're welcome!
>
> > I was thinking on the perspective of a HDFSDirectory. In addition to the
>
d for handling
momentary IOExceptions
--
Ravi
On Tue, Dec 17, 2013 at 9:14 PM, Michael McCandless <
luc...@mikemccandless.com> wrote:
> On Mon, Dec 16, 2013 at 7:33 AM, Ravikumar Govindarajan
> wrote:
> > I am trying to model a transaction-log for lucene, which creates a
> &g
I am trying to model a transaction-log for lucene, which creates a
transaction-log per-commit
Things work fine during normal operations, but I cannot fathom the effect
during
a. IOException during Index-Commit
Will the index be restored to previous commit-point? Can I blindly re-try
operations f
I am trying to find an optimal way of merging two already sorted segments
and need some help here...
My use-case is this:
Segment1 [Segment already sorted by Field "F1"]
Fields: F1, F2, F3
List
Fields: C1,
signing terms to blocks, but to build the trie terms
> index it builds a separate FST, by adding in each block's prefix (it
> doesn't use the FST's builder pruning to create the trie).
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Fri,
his to build a prefix trie instead of the full FST.
>
> Creating a custom tail freezer is very expert: it lets you implement
> arbitrary logic on which nodes are pruned or not.
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Fri, Nov 15, 2013 at 12:16
I was trying to understand some logic in Builder class of FST.
The method freezeTail() looks quite hairy. I gather that there is an some
logic for pruning a node or compiling it.
What exactly is pruning a node? An example of it will be really really
helpful
--
Ravi
Thanks Mike. Explicit type-cast to SegmentReader will do the trick for the
moment.
--
Ravi
On Fri, Nov 8, 2013 at 6:17 PM, Michael McCandless <
luc...@mikemccandless.com> wrote:
> On Fri, Nov 8, 2013 at 12:22 AM, Ravikumar Govindarajan
> wrote:
> >> So, in your code, &quo
> wrote:
>
> On Thu, Nov 7, 2013 at 12:18 PM, Ravikumar Govindarajan
> wrote:
> > Thanks Mike.
> >
> > If you look at my impl, I am using the getCoreCacheKey() only, but keyed
> > on a ReaderClosedListener and purging it onClose(). When NRT does
reopens,
> >
k at liveDocs "live" and fold them in, instead of
> regenerating the whole cache entry.
>
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Thu, Nov 7, 2013 at 8:04 AM, Ravikumar Govindarajan
> > wrote:
> > Thanks Mike. Can you hel
(returned by IndexReader.leaves()), to play well with NRT.
>
> Typically you'd do so in a context that already sees each leaf, like a
> custom Filter or a Collector.
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Thu, Nov 7, 2013 at 1:33 AM, Ravikumar
I am trying to cache a BitSet by attaching to IndexReader.addCloseListener,
using the getCoreCacheKey()
But, I find that getCoreCacheKey() returns the IndexReader object itself as
the key.
Whenever the IndexReader re-opens via NRT because of deletes, will it mean
that my cache will be purged, bec
Hi,
Currently we merge 2 indexes using iw.addIndexes(idxReaders), where the
same call will be made in batches of 10 readers
Our requirement is to make this addIndex call consistent. That is, during
this merge-time, searches using a MultiReader should not return duplicate
documents[docs currently
TermFirstPassGroupingCollector loads all terms for a given group-by field,
through FieldCache.
Is it possible to instruct the class to group only pruned terms of a field,
based on a user-supplied query [RangeQuery, TermQuery etc...]
This way, only pruned terms are grouped and all others are ignor
We have a system where N number of users are tied to a particular lucene
index. Sort of "Shared Index".
But each of the N users can have their own personalized fields.
Ex: Every E-mail to lucene mailing-list is a document
Every user part of this mailing-list has his own set of labels for
tha
tests
--
Ravi
On Tue, May 14, 2013 at 3:31 PM, Michael McCandless <
luc...@mikemccandless.com> wrote:
> On Tue, May 14, 2013 at 3:03 AM, Ravikumar Govindarajan
> wrote:
> > We ran the checkIndex and a simple test case. It passes. Actually, I had
> > assumed problem wit
CheckIndex on the index produced by the code below,
> how many terms/freqs/positions does it report?
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Mon, May 13, 2013 at 9:25 AM, Ravikumar Govindarajan
> wrote:
> >
9 PM, Michael McCandless <
luc...@mikemccandless.com> wrote:
> It should not be 0, as long as TermsEnum.next() does not return null
> ... can you make a small test case? Thanks.
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Fri, May 10, 2013 at 8:26 AM, Ra
Fri, May 10, 2013 at 5:54 PM, Ravikumar Govindarajan <
ravikumar.govindara...@gmail.com> wrote:
> We have the following code
>
> SegmentInfos segments = new SegmentInfos();
> segments.read(luceneDir);
> for(SegmentInfoPerCommit sipc: segments)
> {
> String name = sipc
We have the following code
SegmentInfos segments = new SegmentInfos();
segments.read(luceneDir);
for(SegmentInfoPerCommit sipc: segments)
{
String name = sipc.info.name;
SegmentReader reader = new SegmentReader(sipc, 1, new IOContext());
Terms terms = reader.terms("content");
TermsEnum tEnum = t
The stacked updates issue as in the link mentioned
https://issues.apache.org/jira/browse/LUCENE-4258 handles FieldUpdates only
for "new incoming values".
In our case, all fields that are updated are, by default StoredFields.
Currently StackedTermsEnum looks too costly on computing Term Stats. Is
Thanks Robert for the quick response. Saved my day!!!
--
Ravi
On Fri, Apr 19, 2013 at 10:45 PM, Robert Muir wrote:
> Its a bug: its already fixed for 4.3 (coming soon):
>
> https://issues.apache.org/jira/browse/LUCENE-4888
>
> On Fri, Apr 19, 2013 at 1:09 PM, Ravikum
When writing a custom codec, I encountered an issue in SloppyPhraseScorer.
I am using lucene-4.2 GA.
public int nextDoc()
{
return advance(max.doc)
}
This in-turn calls my DocsAndPositionEnum.advance(int target).
Intially this seems to call with advance(-1). It's kind of unsettling to
see an i
Most of us, writing custom codec use segment-name as a handle and push data
to a different storage
Would it be possible to get a hook in the codec APIs, when obsolete segment
files are cleaned up after merges?
Currently, this is always implemented as a hack
--
Ravi
x27;t use the segmetns doc count.
>
> hope that helps
>
> simon
>
> On Wed, Mar 20, 2013 at 1:12 PM, Ravikumar Govindarajan
> wrote:
> > This is an internal code I came across in lucene today and unable to
> > decipher it.
> >
> > FreqProxTermsWriterPer
ion sorting, then it should be easy
> to reverse the doc orders in each segment, using something like
> IndexSorter.
>
> Shai
>
> On Wed, Nov 21, 2012 at 8:03 AM, Ravikumar Govindarajan <
> ravikumar.govindara...@gmail.com> wrote:
>
> > Hi Shai,
> >
> > I wou
waste a
> > lot of storage
> >
> > The default merge policy will merge adjacent segments no? Is it going to
> > disturb the ordering?
> >
> > --
> > Ravi
> >
> > On Tue, Nov 20, 2012 at 5:19 PM, Michael McCandless <
> > luc...@mikemccandless.com> w
could waste a
lot of storage
The default merge policy will merge adjacent segments no? Is it going to
disturb the ordering?
--
Ravi
On Tue, Nov 20, 2012 at 5:19 PM, Michael McCandless <
luc...@mikemccandless.com> wrote:
> On Tue, Nov 20, 2012 at 1:49 AM, Ravikumar Govindarajan
> wr
> requests out to other shards, gather the results, call the merge, etc.
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> On Fri, Nov 16, 2012 at 9:43 AM, Ravikumar Govindarajan
> wrote:
> > The formatter has wrecked the table... Reposting it
> >
> >
discussions that has happened
previously on sort-docID-before-flush/sparse-doc-handling?
--
Ravi
On Tue, Nov 6, 2012 at 4:53 PM, Michael McCandless <
luc...@mikemccandless.com> wrote:
> On Tue, Nov 6, 2012 at 1:04 AM, Ravikumar Govindarajan
> wrote:
> > Looks far more complex than
, Nov 5, 2012 at 4:37 AM, Ravikumar Govindarajan
> wrote:
> > Thanks Mike,
> >
> > Joins could be slower than docID based approach, no?
>
> Yes: slower at search time but faster at update time (generally not a
> good tradeoff... but it seems like in your case slow updates are
t;
> http://blog.mikemccandless.com
>
> On Thu, Oct 25, 2012 at 6:10 AM, Ravikumar Govindarajan
> wrote:
> > We have the need to re-index some fields in our application frequently.
> >
> > Our typical document consists of
> >
> > a) Many single-valued {l
cument rather than the low-level Lucene document id.
>
> -- Jack Krupansky
>
> -Original Message- From: Ravikumar Govindarajan
> Sent: Thursday, October 25, 2012 6:10 AM
> To: java-user@lucene.apache.org
> Subject: App supplied docID in lucene possible?
>
>
> We have th
96 matches
Mail list logo