yura last <y_ura_2...@yahoo.com.INVALID> wrote: > I expect that the amount of concurrent customers will be low. > Today I have 1 machine so I don't have the capacity for all > the data.
You aim for 90 billion documents in the first go and want to prepare for 10 times that. Your current test setup is 60M documents, which means you are off by a factor 1000. You really need to test on a larger subset. > Because of that I am thinking on a new "cluster" solution.Today is 1 billion > each day for 90 days = 90 billion (around 45TB data). > I should prefer a lot of machines with many RAM and not so many HDD - right? We seem to be looking at non-trivial machines, so I think you should run more tests at a larger scale, taking care to emulate the amount of requests and the amount of concurrent customer requests you expect. If you are lucky, it works well to swap in the data for the active customer and you will be able to get by with relatively modest hardware. We have had great success with buying relatively cheap (bang-for-the-buck) machines with low memory (compared to index size) and local SSDs. With static indexes (89 out of your 90 days would be static data, if I understand correctly), one of our 256GB machines holds 6 billion documents in 20TB of index data. You might want to investigate that option. Some details at https://sbdevel.wordpress.com/net-archive-search/ - Toke Eskildsen