Multi-tenancy strategy: 1 index with 1 shard and 1 replica per client

2014-06-26 Thread Drew Kutcharian
Hey Guys, I'm working on an analytics dashboard project where we collect events into Elasticsearch for clients. Each client could have millions of events per month. We are thinking of using one index with one shard and one replica per client. Looking at Logstash, it seems like Logstash creates

Re: Multi-tenancy strategy: 1 index with 1 shard and 1 replica per client

2014-06-26 Thread Andrew Selden
Drew, The Elasticsearch default is to create 5 shards for each index. I would start with this. Typically it is best to actually over-shard, which is to say have more than 1 shard per node per index. There is not really any measurable cost to this and it gives you flexibility in your design as

Re: Multi-tenancy strategy: 1 index with 1 shard and 1 replica per client

2014-06-26 Thread Drew Kutcharian
Hi Andrew, Not sure if you read my original question. The question is about having a separate index per customer since we are going to have 1000 customers but each would have a lot of data. Each shard comes with it's own overhead since it's an instance of Lucene. I was going with the 1 shard

Re: Multi-tenancy strategy: 1 index with 1 shard and 1 replica per client

2014-06-26 Thread Mark Walkom
Pretty sure he read it as I'd have offered the same advice :) You cannot change the sharding of an index after creation, you need to completely reindex the data to do so. This may not be a major issue for you but it's something to take into account when you have hundreds or thousands of customers,

Re: Multi-tenancy strategy: 1 index with 1 shard and 1 replica per client

2014-06-26 Thread Drew Kutcharian
Hi Mark, The problem that we have is that each customer could generate 60-80 million docs/month on average. In addition, when a customer leaves, we would need to delete all their data. So hence it makes sense to have an index per customer (or even multiple indexes per customer). Another issue

Re: Multi-tenancy strategy: 1 index with 1 shard and 1 replica per client

2014-06-26 Thread Mark Walkom
Ahh ok, knowing this extra info is good as it helps us help you :) Logstash doesn't define how many shards to use, at least not that I can see here - https://github.com/elasticsearch/logstash/blob/master/lib/logstash/outputs/elasticsearch/elasticsearch-template.json - or through some quick tests.