method1, dumping data
for stored fields, you can traverse the whole index and save it to
somewhere else.
for indexed but not stored fields, it may be more difficult.
    if the indexed and not stored field is not analyzed(fields such as id),
it's easy to get from FieldCache.StringIndex.
    But for analyzed fields, though theoretically it can be restored from
term vector and term position, it's hard to recover from index.

method 2, hack with metadata
1. indexed fields
      delete by query, e.g. field:*
2. stored fields
       because all fields are stored sequentially. it's not easy to delete
some fields. this will not affect search speed. but if you want to get
stored fields,  and the useless fields are very long, then it will slow
down.
       also it's possible to hack with it. but need more effort to
understand the index file format  and traverse the fdt/fdx file.
http://lucene.apache.org/core/old_versioned_docs/versions/3_5_0/fileformats.html

this will give you some insight.

On Tue, Feb 14, 2012 at 6:29 AM, Robert Stewart <bstewart...@gmail.com>wrote:

> Lets say I have a large index (100M docs, 1TB, split up between 10
> indexes).  And a bunch of the "stored" and "indexed" fields are not used in
> search at all.  In order to save memory and disk, I'd like to rebuild that
> index *without* those fields, but I don't have original documents to
> rebuild entire index with (don't have the full-text anymore, etc.).  Is
> there some way to rebuild or optimize an existing index with only a sub-set
> of the existing indexed fields?  Or alternatively is there a way to avoid
> loading some indexed fields at all ( to avoid loading term infos and terms
> index ) ?
>
> Thanks
> Bob

Reply via email to