Re: Architecting a NodeJS backend that logs in ElasticSearch which is then used by Apache Spark/MLlib

2015-04-27 Thread Kimbro Staken
pretty sure that's not even necessary to use ES with Spark it's just a convenience if you already have a YARN cluster and want to provision and control elasticsearch within that environment. Kimbro Staken On Mon, Apr 27, 2015 at 11:03 AM, Garrett Gottlieb garrett.gottl...@gmail.com wrote: Hi

Re: Elasticsearch ingest performance

2015-04-23 Thread Kimbro Staken
, Kimbro Staken kst...@kstaken.com wrote: Hello Brian, Many things will affect the rate of ingest, the biggest one is making sure the load gets spread around. But are you sure ES is what's bottlenecking here? With only 5 shards you're only using half your cluster but I'm willing to bet your

Re: 30 billion unique documents (and counting)

2015-04-22 Thread Kimbro Staken
and fielddata can eat heap in ways that will make your head spin. Kimbro Staken On Wed, Apr 22, 2015 at 1:14 AM, fdevilla...@synthesio.com wrote: Hi list, I've been using ES in production since 0.17.6 with clusters up to 64 virtual machines and 20T data (including 3 replica). We're now thinking

Re: Elasticsearch ingest performance

2015-04-22 Thread Kimbro Staken
current numbers don't show that. Kimbro Staken On Wed, Apr 22, 2015 at 4:16 PM, bparki...@maprtech.com wrote: We are running a 10-node Elasticsearch 1.4.2 cluster, and getting cluster wide throughput of 18161 docs/sec, or about 18MB/sec. We'd like to improve this as much as we can

Re: 30 billion unique documents (and counting)

2015-04-22 Thread Kimbro Staken
. An individual node can (and should) hold dozens of shards. Larger shard sizes will work too but when a node crashes recovery of a larger number of 50GB shards will be much faster than a smaller number of 200GB shards, especially in a large cluster. Kimbro Staken On Wed, Apr 22, 2015 at 6:04 PM, Jack Park

Re: marvel.agent.exporter: error connecting to [[0:0:0:0:0:0:0:0]:9200] [No route to host]

2015-04-17 Thread Kimbro Staken
Yes, I saw the same issue yesterday on a test system. For me it started after the node crashed and rebooted. It looks like it's trying to use the IPv6 address to connect. I didn't really dig too far for a real fix since this was a test system but setting network.bind_host 0.0.0.0 got marvel

Re: Cluster misconfigured or elastic is just SLOW!!!!?

2015-01-23 Thread Kimbro Staken
How many shards to use is a complicated question and depends on the specific use case. For testing in this scenario though, it's likely that just matching the number of nodes you have would be a good choice. Then you will have 1 primary shard for each index on each node. That said it also looks

Re: 2 Servers with 2 primary shards... optimization questions

2015-01-22 Thread Kimbro Staken
Your second example is what elasticsearch will do by default. It will never allocate a primary and replica for the same shard on the same node. In that example if one of the nodes were to go down both primaries would move to the remaining node and the replicas would be unallocated and the cluster

Re: Cluster misconfigured or elastic is just SLOW!!!!?

2015-01-22 Thread Kimbro Staken
Yes you have something very wrong. That is showing you have a huge number of shards and the cluster is obviously struggling to allocate all of them. You said you have 9 nodes and 1 replica but you didn't specify how many shards per index? Kimbro On Thu, Jan 22, 2015 at 11:45 AM, Sam Flint

Re: Just initialize shards when problems but no rebalance

2015-01-15 Thread Kimbro Staken
I've experienced what you're describing. I called it a shard relocation storm and it's really tough to get under control. I opened a ticket on the issue and a fix was supposedly included in 1.4.2. What version are you running? If you want to truly manually manage this situation you could set

Re: Just initialize shards when problems but no rebalance

2015-01-15 Thread Kimbro Staken
with more disk space to avoid this situations. By any chance do you have the link to the issue? 2015-01-15 13:26 GMT-03:00 Kimbro Staken ksta...@kstaken.com: I've experienced what you're describing. I called it a shard relocation storm and it's really tough to get under control. I opened

Re: Mapping question

2014-12-09 Thread Kimbro Staken
Looks like you have an extra opening brace under properties. On Tue, Dec 9, 2014 at 1:36 PM, Mohit Anchlia mohitanch...@gmail.com wrote: I am trying to change the mapping but I think I am not doing it right. Documents are not clear either: My document looks like this:

Re: Many indices.fielddata.breaker errors in logs and cluster slow...

2014-10-17 Thread Kimbro Staken
This is caused by elasticsearch trying to load fielddata. Fielddata is used for sorting and faceting/aggregations. When a query has a sort parameter the node will try to load the fielddata for that field for all documents in the shard, not just those included in the query result. The breaker is