yura last <y_ura_2...@yahoo.com.INVALID> wrote:
> I have one machine (6 cores, 148 GB RAM, 2.5 TB HDD) and I index 
> around 60 million documents for a day - the index size is around 26GB.

So 1 billion documents would be approximately 500GB.

...and 10 billion/day in 90 days would be 450TB.

> I do have customer-ID today and I use it for the queries. I don't split
> the customers but I get bad performance. If I will make small collection
> for each customer then I know to query only those collections and I
> get better performance - the indexes are smaller and the Solr don't
> need to keep the other customers data in the memory. I checked it
> and the performance is much better.

True when the amount of concurrent active customers is low. How many customers 
do you expect to be actively using the index at a time? If the answer is "most 
of them", you should make sure that your tests reflect that. 

If the answer is "relatively few", then your setup might scale well (if you 
create independent clouds to handle the many collection problem). First search 
for a customer will of course take a while.

> I do have 1 billion documents today but I can't index them

Why? Does it break down, take too long to index, results in too slow searches? 
Your current problems helps a lot when talking future scale.

> - so it is a real requirement for today to be ably index 1 billion and
> keep the data for 90 days.

To be clear: Would that be 1 billion index every 90 days, 1 billion each day in 
90 days = 90 billion at any given time or something third?

> What is better - 1 powerful machine or a few smaller? For example
> - one machine with 12 cores and 256GB 2.5 TB or 5 machines
> each with 4 cores and 32 GB 0.5 TB?

Depends on what you do with your data. Most of the time, IO is the bottleneck 
for Solr and for those cases it is probably more bang-for-the-buck to buy 
machines with 256GB of RAM (or maybe the 148GB you have currently) as it 
minimizes the overhead per box.

- Toke Eskildsen

Reply via email to