I will test it with my big production indexes first, if it works I
will port to Java and add to contrib I think.
On Wed, Feb 15, 2012 at 10:03 PM, Li Li wrote:
> great. I think you could make it a public tool. maybe others also need such
> functionality.
>
> On Thu, Feb 16, 2012 at 5:31 AM, Rober
great. I think you could make it a public tool. maybe others also need such
functionality.
On Thu, Feb 16, 2012 at 5:31 AM, Robert Stewart wrote:
> I implemented an index shrinker and it works. I reduced my test index
> from 6.6 GB to 3.6 GB by removing a single shingled field I did not
> need a
I implemented an index shrinker and it works. I reduced my test index
from 6.6 GB to 3.6 GB by removing a single shingled field I did not
need anymore. I'm actually using Lucene.Net for this project so code
is C# using Lucene.Net 2.9.2 API. But basic idea is:
Create an IndexReader wrapper that
I have roughly read the codes of 4.0 trunk. maybe it's feasible.
SegmentMerger.add(IndexReader) will add to be merged Readers
merge() will call
mergeTerms(segmentWriteState);
mergePerDoc(segmentWriteState);
mergeTerms() will construct fields from IndexReaders
for(int
rea
I was thinking if I make a wrapper class that aggregates another IndexReader
and filter out terms I don't want anymore it might work. And then pass that
wrapper into SegmentMerger. I think if I filter out terms on
GetFieldNames(...) and Terms(...) it might work.
Something like:
HashSet igno
for method 2, delete is wrong. we can't delete terms.
you also should hack with the tii and tis file.
On Tue, Feb 14, 2012 at 2:46 PM, Li Li wrote:
> method1, dumping data
> for stored fields, you can traverse the whole index and save it to
> somewhere else.
> for indexed but not stored field
method1, dumping data
for stored fields, you can traverse the whole index and save it to
somewhere else.
for indexed but not stored fields, it may be more difficult.
if the indexed and not stored field is not analyzed(fields such as id),
it's easy to get from FieldCache.StringIndex.
But for
Lets say I have a large index (100M docs, 1TB, split up between 10 indexes).
And a bunch of the "stored" and "indexed" fields are not used in search at all.
In order to save memory and disk, I'd like to rebuild that index *without*
those fields, but I don't have original documents to rebuild e