I think the question is strange... May be you are wondering about possible OOM exceptions? I think we can pass to Lucene single document containing comma separated list of "term, term, ..." (few billion times)... Except "stored" and "TermVectorComponent"...
I believe thousands companies already indexed millions documents with average size few hundreds Mbytes... There should not be any limits (except InputSource vs. ByteArray) 100,000 _unique_ terms vs. single document containing 100,000,000,000,000 of non-unique terms (and trying to store offsets) What about "Spell Checker" feature? Is anyone tried to index single terabytes-like document? Personally, I indexed only small (up to 1000 bytes) documents-fields, but I believe 500Mb is very common use case with PDFs (which vendors use Lucene already? Eclipse? To index Eclipse Help file? Even Microsoft uses Lucene...) Fuad On 11-06-07 7:02 PM, "Erick Erickson" <erickerick...@gmail.com> wrote: >From older (2.4) Lucene days, I once indexed the 23 volume "Encyclopedia >of Michigan Civil War Volunteers" in a single document/field, so it's >probably >within the realm of possibility at least <G>... > >Erick > >On Tue, Jun 7, 2011 at 6:59 PM, Otis Gospodnetic ><otis_gospodne...@yahoo.com> wrote: >> Hello, >> >> What are the biggest document fields that you've ever indexed in Solr >>or that >> you've heard of? Ah, it must be Tom's Hathi trust. :) >> >> I'm asking because I just heard of a case of an index where some >>documents >> having a field that can be around 400 MB in size! I'm curious if >>anyone has any >> experience with such monster fields? >> Crazy? Yes, sure. >> Doable? >> >> Otis >> ---- >> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch >> Lucene ecosystem search :: http://search-lucene.com/ >> >>