The admin UI (schema browser) will give you the counts of unique terms
in your fields, which is where I'd start.

I suspect you've already seen this page, but if not:
http://lucene.apache.org/java/3_5_0/fileformats.html#file-names
the .fdt and .fdx file extensions are where data goes when
you set 'stored="true" '. These files don't affect search speed,
they just contain the verbatim copy of the data.

The relative sizes of the various files above should give
you a hint as to what's using the most space, but it'll be a bit
of a hunt for you to pinpoint what's actually up. TermVectors
and norms are often sources of using up space.

Best
Erick

On Wed, Mar 28, 2012 at 10:55 AM, Vadim Kisselmann
<v.kisselm...@googlemail.com> wrote:
> Hello folks,
>
> i work with Solr 4.0 r1292064 from trunk.
> My index grows fast, with 10Mio. docs i get an index size of 150GB
> (25% stored, 75% indexed).
> I want to find out, which fields(content) are too large, to consider measures.
>
> How can i localize/discover the largest fields in my index?
> Luke(latest from trunk) doesn't work
> with my Solr version. I build Lucene/Solr .jars and tried to feed Luke
> this these, but i get many errors
> and can't build it.
>
> What other options do i have?
>
> Thanks and best regards
> Vadim

Reply via email to