Yeah, it's worth a try. The term vectors aren't entirely necessary for highlighting, although they do make things more efficient.
As far as MLT, does MLT really need such a big field? But you may be on your way to sharding your index if you remove this info and testing shows problems.... Best Erick On Thu, Mar 29, 2012 at 9:32 AM, Vadim Kisselmann <v.kisselm...@googlemail.com> wrote: > Hi Erick, > thanks:) > The admin UI give me the counts, so i can identify fields with big > bulks of unique terms. > I known this wiki-page, but i read it one more time. > List of my file extensions with size in GB(Index size ~150GB): > tvf 90GB > fdt 30GB > tim 18GB > prx 15GB > frq 12GB > tip 200MB > tvx 150MB > > tvf is my biggest file extension. > Wiki :This file contains, for each field that has a term vector > stored, a list of the terms, their frequencies and, optionally, > position and offest information. > > Hmm, i use termVectors on my biggest fields because of MLT and Highlighting. > But i think i should test my performance without termVectors. Good Idea? :) > > What do you think about my file extension sizes? > > Best regards > Vadim > > > > > 2012/3/29 Erick Erickson <erickerick...@gmail.com>: >> The admin UI (schema browser) will give you the counts of unique terms >> in your fields, which is where I'd start. >> >> I suspect you've already seen this page, but if not: >> http://lucene.apache.org/java/3_5_0/fileformats.html#file-names >> the .fdt and .fdx file extensions are where data goes when >> you set 'stored="true" '. These files don't affect search speed, >> they just contain the verbatim copy of the data. >> >> The relative sizes of the various files above should give >> you a hint as to what's using the most space, but it'll be a bit >> of a hunt for you to pinpoint what's actually up. TermVectors >> and norms are often sources of using up space. >> >> Best >> Erick >> >> On Wed, Mar 28, 2012 at 10:55 AM, Vadim Kisselmann >> <v.kisselm...@googlemail.com> wrote: >>> Hello folks, >>> >>> i work with Solr 4.0 r1292064 from trunk. >>> My index grows fast, with 10Mio. docs i get an index size of 150GB >>> (25% stored, 75% indexed). >>> I want to find out, which fields(content) are too large, to consider >>> measures. >>> >>> How can i localize/discover the largest fields in my index? >>> Luke(latest from trunk) doesn't work >>> with my Solr version. I build Lucene/Solr .jars and tried to feed Luke >>> this these, but i get many errors >>> and can't build it. >>> >>> What other options do i have? >>> >>> Thanks and best regards >>> Vadim