On 6/5/2014 10:55 AM, bbi123 wrote:
> We have a requirement to for large data set like Billing data for example. 
> The Business wants to do sorting and type ahead functions for it.  For
> example, when I start typing “8164…” they want to list ALL the unique number
> and the associated attributes displayed (name, description, etc). 
>  
> We have about 50TB of files that needs to be indexed. I haven't indexed this
> much data before hence thought of getting your valuable inputs. I am
> thinking of using SOLR cloud and use SSD for faster IO. I might need your
> inputs on hardware requirements too.

It's nearly impossible to give you a hardware requirement projection. 
There are simply too many variables.  One variable is that we cannot
know how much of that 50TB of data will actually end up in the Solr
index.  The archive for my data is getting close to 300TB, but because
that is mostly photos and video, the total size of the resulting Solr
index is about 100GB.  My actual data source is a MySQL database that's
probably about 250GB.

http://searchhub.org/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/

The one thing that I can say is that RAM is king with Solr.  Once you
know how big the Solr index contained on each server will actually be,
you'll have some idea of how much RAM you might need.  Add up the Solr
heap size and the total index size on disk for each server.  That is the
ideal total memory size for each server.  You might not actually need
that much RAM, but if you have it, we can *almost guarantee* good
performance.

http://wiki.apache.org/solr/SolrPerformanceProblems

SSD will help performance, but it is not a complete substitute for RAM. 
If you have the ideal RAM size, SSD is not required, because all the
important data will be in RAM, which is much faster than SSD.

> I assume there is no limitations in terms of the maximum number of documents
> that can be indexed in latest version of SOLR (4.8). Am I right?

Each shard has a limit of just over two billion documents.  The actual
number is 2147483647, the maximum number a 32bit java integer can hold. 
This includes deleted documents, so we recommend not going over 1
billion.  SolrCloud has no limits, because the collection can have many
shards.

Thanks,
Shawn

Reply via email to