The admin UI (schema browser) will give you the counts of unique terms in your fields, which is where I'd start.
I suspect you've already seen this page, but if not: http://lucene.apache.org/java/3_5_0/fileformats.html#file-names the .fdt and .fdx file extensions are where data goes when you set 'stored="true" '. These files don't affect search speed, they just contain the verbatim copy of the data. The relative sizes of the various files above should give you a hint as to what's using the most space, but it'll be a bit of a hunt for you to pinpoint what's actually up. TermVectors and norms are often sources of using up space. Best Erick On Wed, Mar 28, 2012 at 10:55 AM, Vadim Kisselmann <v.kisselm...@googlemail.com> wrote: > Hello folks, > > i work with Solr 4.0 r1292064 from trunk. > My index grows fast, with 10Mio. docs i get an index size of 150GB > (25% stored, 75% indexed). > I want to find out, which fields(content) are too large, to consider measures. > > How can i localize/discover the largest fields in my index? > Luke(latest from trunk) doesn't work > with my Solr version. I build Lucene/Solr .jars and tried to feed Luke > this these, but i get many errors > and can't build it. > > What other options do i have? > > Thanks and best regards > Vadim