Re: Can I rebuild an index and remove some fields?

2012-02-16 Thread Robert Stewart
I will test it with my big production indexes first, if it works I will port to Java and add to contrib I think. On Wed, Feb 15, 2012 at 10:03 PM, Li Li wrote: > great. I think you could make it a public tool. maybe others also need such > functionality. > > On Thu, Feb 16, 2012 at 5:31 AM, Rober

Re: Can I rebuild an index and remove some fields?

2012-02-15 Thread Li Li
great. I think you could make it a public tool. maybe others also need such functionality. On Thu, Feb 16, 2012 at 5:31 AM, Robert Stewart wrote: > I implemented an index shrinker and it works. I reduced my test index > from 6.6 GB to 3.6 GB by removing a single shingled field I did not > need a

Re: Can I rebuild an index and remove some fields?

2012-02-15 Thread Robert Stewart
I implemented an index shrinker and it works. I reduced my test index from 6.6 GB to 3.6 GB by removing a single shingled field I did not need anymore. I'm actually using Lucene.Net for this project so code is C# using Lucene.Net 2.9.2 API. But basic idea is: Create an IndexReader wrapper that

Re: Can I rebuild an index and remove some fields?

2012-02-14 Thread Li Li
I have roughly read the codes of 4.0 trunk. maybe it's feasible. SegmentMerger.add(IndexReader) will add to be merged Readers merge() will call mergeTerms(segmentWriteState); mergePerDoc(segmentWriteState); mergeTerms() will construct fields from IndexReaders for(int rea

Re: Can I rebuild an index and remove some fields?

2012-02-14 Thread Robert Stewart
I was thinking if I make a wrapper class that aggregates another IndexReader and filter out terms I don't want anymore it might work. And then pass that wrapper into SegmentMerger. I think if I filter out terms on GetFieldNames(...) and Terms(...) it might work. Something like: HashSet igno

Re: Can I rebuild an index and remove some fields?

2012-02-13 Thread Li Li
for method 2, delete is wrong. we can't delete terms. you also should hack with the tii and tis file. On Tue, Feb 14, 2012 at 2:46 PM, Li Li wrote: > method1, dumping data > for stored fields, you can traverse the whole index and save it to > somewhere else. > for indexed but not stored field

Re: Can I rebuild an index and remove some fields?

2012-02-13 Thread Li Li
method1, dumping data for stored fields, you can traverse the whole index and save it to somewhere else. for indexed but not stored fields, it may be more difficult. if the indexed and not stored field is not analyzed(fields such as id), it's easy to get from FieldCache.StringIndex. But for

Can I rebuild an index and remove some fields?

2012-02-13 Thread Robert Stewart
Lets say I have a large index (100M docs, 1TB, split up between 10 indexes). And a bunch of the "stored" and "indexed" fields are not used in search at all. In order to save memory and disk, I'd like to rebuild that index *without* those fields, but I don't have original documents to rebuild e