yura last <y_ura_2...@yahoo.com.INVALID> wrote:
> I expect that the amount of concurrent customers will be low.
> Today I have 1 machine so I don't have the capacity for all
> the data.

You aim for 90 billion documents in the first go and want to prepare for 10 
times that. Your current test setup is 60M documents, which means you are off 
by a factor 1000. You really need to test on a larger subset.

>  Because of that I am thinking on a new "cluster" solution.Today is 1 billion 
> each day for 90 days = 90 billion (around 45TB data).

> I should prefer a lot of machines with many RAM and not so many HDD - right?

We seem to be looking at non-trivial machines, so I think you should run more 
tests at a larger scale, taking care to emulate the amount of requests and the 
amount of concurrent customer requests you expect. If you are lucky, it works 
well to swap in the data for the active customer and you will be able to get by 
with relatively modest hardware.

We have had great success with buying relatively cheap (bang-for-the-buck) 
machines with low memory (compared to index size) and  local SSDs. With static 
indexes (89 out of your 90 days would be static data, if I understand 
correctly), one of our 256GB machines holds 6 billion documents in 20TB of 
index data. You might want to investigate that option. Some details at 
https://sbdevel.wordpress.com/net-archive-search/

- Toke Eskildsen

Reply via email to