method1, dumping data for stored fields, you can traverse the whole index and save it to somewhere else. for indexed but not stored fields, it may be more difficult. if the indexed and not stored field is not analyzed(fields such as id), it's easy to get from FieldCache.StringIndex. But for analyzed fields, though theoretically it can be restored from term vector and term position, it's hard to recover from index.
method 2, hack with metadata 1. indexed fields delete by query, e.g. field:* 2. stored fields because all fields are stored sequentially. it's not easy to delete some fields. this will not affect search speed. but if you want to get stored fields, and the useless fields are very long, then it will slow down. also it's possible to hack with it. but need more effort to understand the index file format and traverse the fdt/fdx file. http://lucene.apache.org/core/old_versioned_docs/versions/3_5_0/fileformats.html this will give you some insight. On Tue, Feb 14, 2012 at 6:29 AM, Robert Stewart <bstewart...@gmail.com>wrote: > Lets say I have a large index (100M docs, 1TB, split up between 10 > indexes). And a bunch of the "stored" and "indexed" fields are not used in > search at all. In order to save memory and disk, I'd like to rebuild that > index *without* those fields, but I don't have original documents to > rebuild entire index with (don't have the full-text anymore, etc.). Is > there some way to rebuild or optimize an existing index with only a sub-set > of the existing indexed fields? Or alternatively is there a way to avoid > loading some indexed fields at all ( to avoid loading term infos and terms > index ) ? > > Thanks > Bob