Re: 30 billion unique documents (and counting)

2015-04-23 Thread Alexandre Heimburger
Regarding 1 index per month, what about search performances when searching on 2 or 3 years ? Le jeudi 23 avril 2015 03:20:33 UTC+2, Kimbro Staken a écrit : Running ES at scale is all about balance and sizing right. Like the 3 bears, not too big and not too small, just right. Big boxes will

Re: 30 billion unique documents (and counting)

2015-04-22 Thread Kimbro Staken
Hello Fred, I have clusters as large as 200billion documents/130TB. Sharing experiences on that would require a book, but a couple quick things that jumped out at me. 1. do not go the huge server route. Elasticasearch works best when you scale it horizontally. The 64GB route is a much better

Re: 30 billion unique documents (and counting)

2015-04-22 Thread Jack Park
I would certainly like to see that book, or at least a draft of it ;-) On Wed, Apr 22, 2015 at 10:12 AM, Kimbro Staken ksta...@kstaken.com wrote: Hello Fred, I have clusters as large as 200billion documents/130TB. Sharing experiences on that would require a book, but a couple quick things

Re: 30 billion unique documents (and counting)

2015-04-22 Thread Jack Park
That starts to argue for lots of smaller servers maybe even with smaller SSD's. Say, a low power i3 with 16 or 23gb ram, and a 128gb SSD. Is that right? On Wed, Apr 22, 2015 at 2:56 PM, Mark Walkom markwal...@gmail.com wrote: If you are using time series data then you should be using time

Re: 30 billion unique documents (and counting)

2015-04-22 Thread Kimbro Staken
Running ES at scale is all about balance and sizing right. Like the 3 bears, not too big and not too small, just right. Big boxes will just be wasted and too small of boxes will have you hitting limits too soon. Given the way java works with heaps above 30GB-ish the best size for a node right now

Re: 30 billion unique documents (and counting)

2015-04-22 Thread Mark Walkom
​Not really, the smaller server you mentioned before would be suitable.​ On 23 April 2015 at 11:04, Jack Park jackp...@topicquests.org wrote: That starts to argue for lots of smaller servers maybe even with smaller SSD's. Say, a low power i3 with 16 or 23gb ram, and a 128gb SSD. Is that

Re: 30 billion unique documents (and counting)

2015-04-22 Thread Mark Walkom
If you are using time series data then you should be using time series indices. As Fred pointed out, routing an entire month's worth of data to a single shard is not going to scale. Also, we recommend that you keep shard size below 50GB, this helps with recovery and distribution. There is also a

30 billion unique documents (and counting)

2015-04-22 Thread fdevillamil
Hi list, I've been using ES in production since 0.17.6 with clusters up to 64 virtual machines and 20T data (including 3 replica). We're now thinking about pushing things a bit further and I wondered if people here had similar experience / needs as we do. Our current index is 1.1 billion