Travis -
Whether the index is bigger than the original content depends on what you need
to do with it in Solr. One of the primary deciding factors is if you need to
use highlighting, which currently requires the fields to be highlighted be
stored. Stored fields will take up about the same space as the original
documents (text-wise, likely a bit smaller than, say, the actual Word doc
itself). If you don't need highlighting or the contents stored for other
purposes, then you'll have a dramatically smaller index than the original
(roughly 35% the size, generally).
Erik
On Oct 11, 2011, at 08:36 , Travis Low wrote:
> Greetings. I have a paltry 23,000 database records that point to a
> voluminous 300GB worth of PDF, Word, Excel, and other documents. We are
> planning on indexing the records and the documents they point to. I have no
> clue on how we can calculate what kind of server we need for this. I
> imagine the index isn't going to be bigger than the documents (is it?) so I
> suppose 1TB is a starting point for disk space. But what kind of processing
> power and memory might we need? Can anyone please point me in the right
> direction?