Hi Otis, 

Our OCR fields average around 800 KB.  My guess is that the largest docs we 
index (in a single OCR field) are somewhere between 2 and 10MB.  We have had 
issues where the in-memory representation of the document (the in memory index 
structures being built)is several times the size of the text, so I would 
suspect even with the largest ramBufferSizeMB, you might run into problems.  
(This is with the 3.x branch.  Trunk might not have this problem since it's 
much more memory efficient when indexing

Tom Burton-West
www.hathitrust.org/blogs
________________________________________
From: Otis Gospodnetic [otis_gospodne...@yahoo.com]
Sent: Tuesday, June 07, 2011 6:59 PM
To: solr-user@lucene.apache.org
Subject: 400 MB Fields

Hello,

What are the biggest document fields that you've ever indexed in Solr or that
you've heard of?  Ah, it must be Tom's Hathi trust. :)

I'm asking because I just heard of a case of an index where some documents
having a field that can be around 400 MB in size!  I'm curious if anyone has any
experience with such monster fields?
Crazy?  Yes, sure.
Doable?

Otis
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/

Reply via email to