I don't think there's really any reason SolrCloud won't work with Tomcat, the setup is probably just tricky. See: http://lucene.472066.n3.nabble.com/SolrCloud-new-td1528872.html It's about a year old, but might prove helpful.
Best Erick On Thu, Mar 29, 2012 at 3:41 PM, Vadim Kisselmann <v.kisselm...@googlemail.com> wrote: > Yes, i think so, too :) > MLT doesn´t need termVectors really, but it´s faster with them. I > found out, what > MLT works better on the title field in my case, instead of big text fields. > > Sharding is in planning, but my setup with SolrCloud, ZK and Tomcat > doesn´t work, > see here: > http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201203.mbox/%3CCA+GXEZE3LCTtgXFzn9uEdRxMymGF=z0ujb9s8b0qkipafn6...@mail.gmail.com%3E > I split my huge index (150GB-index in this case is my test-index), and > want use SolrCloud, > but it´s not runnable with tomcat at this time. > > Best regards > Vadim > > > 2012/3/29 Erick Erickson <erickerick...@gmail.com>: >> Yeah, it's worth a try. The term vectors aren't entirely necessary for >> highlighting, >> although they do make things more efficient. >> >> As far as MLT, does MLT really need such a big field? >> >> But you may be on your way to sharding your index if you remove this info >> and testing shows problems.... >> >> Best >> Erick >> >> On Thu, Mar 29, 2012 at 9:32 AM, Vadim Kisselmann >> <v.kisselm...@googlemail.com> wrote: >>> Hi Erick, >>> thanks:) >>> The admin UI give me the counts, so i can identify fields with big >>> bulks of unique terms. >>> I known this wiki-page, but i read it one more time. >>> List of my file extensions with size in GB(Index size ~150GB): >>> tvf 90GB >>> fdt 30GB >>> tim 18GB >>> prx 15GB >>> frq 12GB >>> tip 200MB >>> tvx 150MB >>> >>> tvf is my biggest file extension. >>> Wiki :This file contains, for each field that has a term vector >>> stored, a list of the terms, their frequencies and, optionally, >>> position and offest information. >>> >>> Hmm, i use termVectors on my biggest fields because of MLT and Highlighting. >>> But i think i should test my performance without termVectors. Good Idea? :) >>> >>> What do you think about my file extension sizes? >>> >>> Best regards >>> Vadim >>> >>> >>> >>> >>> 2012/3/29 Erick Erickson <erickerick...@gmail.com>: >>>> The admin UI (schema browser) will give you the counts of unique terms >>>> in your fields, which is where I'd start. >>>> >>>> I suspect you've already seen this page, but if not: >>>> http://lucene.apache.org/java/3_5_0/fileformats.html#file-names >>>> the .fdt and .fdx file extensions are where data goes when >>>> you set 'stored="true" '. These files don't affect search speed, >>>> they just contain the verbatim copy of the data. >>>> >>>> The relative sizes of the various files above should give >>>> you a hint as to what's using the most space, but it'll be a bit >>>> of a hunt for you to pinpoint what's actually up. TermVectors >>>> and norms are often sources of using up space. >>>> >>>> Best >>>> Erick >>>> >>>> On Wed, Mar 28, 2012 at 10:55 AM, Vadim Kisselmann >>>> <v.kisselm...@googlemail.com> wrote: >>>>> Hello folks, >>>>> >>>>> i work with Solr 4.0 r1292064 from trunk. >>>>> My index grows fast, with 10Mio. docs i get an index size of 150GB >>>>> (25% stored, 75% indexed). >>>>> I want to find out, which fields(content) are too large, to consider >>>>> measures. >>>>> >>>>> How can i localize/discover the largest fields in my index? >>>>> Luke(latest from trunk) doesn't work >>>>> with my Solr version. I build Lucene/Solr .jars and tried to feed Luke >>>>> this these, but i get many errors >>>>> and can't build it. >>>>> >>>>> What other options do i have? >>>>> >>>>> Thanks and best regards >>>>> Vadim