Re: [hadoop] newbie question

2015-05-03 Thread Christian Dahlqvist
Hi, I am sure Hadoop can help you calculate this, but you may also be able to go about this more efficiently in Elasticsearch. If you, as you mentioned, were to create a user centric index in addition to the event centric one that you have got, you could store a list of all the events

Re: possible networking problem?

2015-04-30 Thread Christian Dahlqvist
Hi, I think there is some confusion about the port number used. Kibana 4 by default listens to port 5601, which based on the output sample you provided seems to not have been changed. In all your examples you are however looking for port 5061, not 5601. Can you check if you are able to connect

Re: In which case ElasticSearch will return 429 ?

2015-04-30 Thread Christian Dahlqvist
Hi, As explained in the blog post, increasing the queue size will not improve performance, just make you store more data in memory on the cluster awaiting processing. It could actually instead end up reducing performance. It looks like you are hitting the limit of your cluster and that the

Re: Bulk indexing creates a lot of disk read OPS

2015-04-24 Thread christian . dahlqvist
Hi Eran, Which version of Elasticsearch are you using? Are you assigning your own document IDs or letting Elasticsearch assign them automatically? Best regards, Christian On Friday, April 24, 2015 at 7:49:56 AM UTC+1, Eran wrote: Hello, I've created an index I use for logging. This

Re: Bulk indexing creates a lot of disk read OPS

2015-04-24 Thread christian . dahlqvist
Hi Eran, If you are assigning your own ID, Elasticsearch need to search and check if the document already exists before writing it. This could explain why the bulk insert performance goes down as the size of the index grows. If you are not going to update the documents, I would therefore

Re: Users data flow

2015-04-23 Thread christian . dahlqvist
Hi, If I have calculated correctly, that corresponds to about 238TB of raw data. If this is the size of JSON documents being indexed in Elasticsearch, you will definitely need more than 2 nodes. The good thing about using aliases the way David describes is that you will not need to put all

Re: maxDocs different between primary and replica shards

2015-04-22 Thread christian . dahlqvist
Hi, Merging of segments and the resulting removal of deleted documents is not coordinated across nodes in Elasticsearch, meaning that the amount of deleted documents can differ between primary and replica shards. Optimising an index down to a single segment does resolve this, but can as noted

Re: creation_date in index setteing

2015-04-20 Thread christian . dahlqvist
The creation date is given with millisecond precision. Take away the last 3 digits and you converter gives Fri, 06 Mar 2015 08:44:57 GMT for 1425631497. Christian On Monday, April 20, 2015 at 5:06:40 AM UTC+1, tao hiko wrote: I query setting information of index and found that have

Re: Elasticseach issue with some indicies not populating data

2015-04-20 Thread christian . dahlqvist
HI, That sounds like a very large amount of shards for a node that size, and this is most likely the source of your problems. Each shard in Elasticsearch corresponds to a Lucene instance and carries with it a certain amount of overhead. You therefore do not want your shards to be too small.

Re: Elasticseach issue with some indicies not populating data

2015-04-20 Thread christian . dahlqvist
Hi, Having read through the thread it sounds like your configuration has been working in the past. Is that correct? If this is the case I would reiterate David's initial questions about your node's RAM and heap size as the number of shards look quite large for a single node. Could you please

Re: Very sluggish Elasticsearch node; not sure why

2015-04-19 Thread christian . dahlqvist
Hi, You seem to have quite a large number of shards (1180) for a single node with only 7GB heap. As the total data volume is a bit over 600GB, the average shard size is only a bit over 500MB, which is not very large. As each shard is a separate Lucene index and carries some overhead, you would

Re: Compression in Elasticsearch documents

2015-04-15 Thread christian . dahlqvist
Hi, How much space the data takes up on disk in Elasticsearch depends a lot on your mappings. In addition to storing the source in the _source field, all fields are by default also copied over to the _all field to allow free text search across all fields. In addition to this Elasticsearch also

Re: Multiple indices vs. multiple shards approach

2015-03-20 Thread christian . dahlqvist
Hi, You could get around this by using routing based on customer ID when indexing and searching. This will ensure that all documents belonging to a single customer will be located in the same shard, which means that each search for a specific customer can hit a single shard instead of all 9,

Re: Logstash/Elasticsearch Slow CSV Import

2015-03-08 Thread christian . dahlqvist
Hi, Can you please share you logstash configuration, some sample data as well as your mappings? Best regards, Christian On Friday, March 6, 2015 at 11:30:45 AM UTC-8, Econgineer wrote: I'm testing out the ELK stack on my desktop (ie 1 node) and thought I'd start by pulling a flat file,

Re: Help with 4 node cluster

2015-02-18 Thread christian . dahlqvist
Hi, You always want an odd number of master nodes (often 3), so I would therefore recommend setting three of the four nodes to be master eligible and leave the fourth as a pure data node. This will prevent the cluster getting partitioned into two with equal number of master nodes on both

Re: Document ordering in index

2015-02-16 Thread christian . dahlqvist
As Elasticsearch requires indexed documents to be in JSON format, you will need to base64 encode any binary blobs in order to store them. This will increase the size on disk significantly and have an impact on performance. Unless you plan to utilise the search features in Elasticsearch at a

Re: ElasticSearch search performance question

2015-02-13 Thread christian . dahlqvist
How many replicas do you have configured for the index? Christian On Thursday, February 12, 2015 at 8:32:28 PM UTC, Jay Danielian wrote: I know this is difficult to answer, the real answer is always It Depends :) But I am going to go ahead and hope I get some feedback here. We are mainly

Re: Help creating a near real time streaming plugin to perform replication between clusters

2015-01-26 Thread Christian Dahlqvist
Hi, A common approach for replicating changes across multiple geographically distributed clusters if to put a message queue in front of Elasticsearch and feed all data modifications through this so that they can be applied to the clusters independently. This allows issues with unreliable

Re: Search not working unless type specified

2015-01-12 Thread christian . dahlqvist
What does your mapping for the index look like? Is there any possibility there could be a mapping conflict? Christian On Friday, January 9, 2015 at 10:48:52 PM UTC, Stefanie wrote: I am having an issue with searching results if the type is not specified. The following search request works