for method 2, delete is wrong. we can't delete terms. you also should hack with the tii and tis file.
On Tue, Feb 14, 2012 at 2:46 PM, Li Li <fancye...@gmail.com> wrote: > method1, dumping data > for stored fields, you can traverse the whole index and save it to > somewhere else. > for indexed but not stored fields, it may be more difficult. > if the indexed and not stored field is not analyzed(fields such as > id), it's easy to get from FieldCache.StringIndex. > But for analyzed fields, though theoretically it can be restored from > term vector and term position, it's hard to recover from index. > > method 2, hack with metadata > 1. indexed fields > delete by query, e.g. field:* > 2. stored fields > because all fields are stored sequentially. it's not easy to delete > some fields. this will not affect search speed. but if you want to get > stored fields, and the useless fields are very long, then it will slow > down. > also it's possible to hack with it. but need more effort to > understand the index file format and traverse the fdt/fdx file. > http://lucene.apache.org/core/old_versioned_docs/versions/3_5_0/fileformats.html > > this will give you some insight. > > > On Tue, Feb 14, 2012 at 6:29 AM, Robert Stewart <bstewart...@gmail.com>wrote: > >> Lets say I have a large index (100M docs, 1TB, split up between 10 >> indexes). And a bunch of the "stored" and "indexed" fields are not used in >> search at all. In order to save memory and disk, I'd like to rebuild that >> index *without* those fields, but I don't have original documents to >> rebuild entire index with (don't have the full-text anymore, etc.). Is >> there some way to rebuild or optimize an existing index with only a sub-set >> of the existing indexed fields? Or alternatively is there a way to avoid >> loading some indexed fields at all ( to avoid loading term infos and terms >> index ) ? >> >> Thanks >> Bob > > >