for method 2, delete is wrong. we can't delete terms.
   you also should hack with the tii and tis file.

On Tue, Feb 14, 2012 at 2:46 PM, Li Li <fancye...@gmail.com> wrote:

> method1, dumping data
> for stored fields, you can traverse the whole index and save it to
> somewhere else.
> for indexed but not stored fields, it may be more difficult.
>     if the indexed and not stored field is not analyzed(fields such as
> id), it's easy to get from FieldCache.StringIndex.
>     But for analyzed fields, though theoretically it can be restored from
> term vector and term position, it's hard to recover from index.
>
> method 2, hack with metadata
> 1. indexed fields
>       delete by query, e.g. field:*
> 2. stored fields
>        because all fields are stored sequentially. it's not easy to delete
> some fields. this will not affect search speed. but if you want to get
> stored fields,  and the useless fields are very long, then it will slow
> down.
>        also it's possible to hack with it. but need more effort to
> understand the index file format  and traverse the fdt/fdx file.
> http://lucene.apache.org/core/old_versioned_docs/versions/3_5_0/fileformats.html
>
> this will give you some insight.
>
>
> On Tue, Feb 14, 2012 at 6:29 AM, Robert Stewart <bstewart...@gmail.com>wrote:
>
>> Lets say I have a large index (100M docs, 1TB, split up between 10
>> indexes).  And a bunch of the "stored" and "indexed" fields are not used in
>> search at all.  In order to save memory and disk, I'd like to rebuild that
>> index *without* those fields, but I don't have original documents to
>> rebuild entire index with (don't have the full-text anymore, etc.).  Is
>> there some way to rebuild or optimize an existing index with only a sub-set
>> of the existing indexed fields?  Or alternatively is there a way to avoid
>> loading some indexed fields at all ( to avoid loading term infos and terms
>> index ) ?
>>
>> Thanks
>> Bob
>
>
>

Reply via email to