Re: concerns on possible load of aggregation

2015-02-25 Thread Jilles van Gurp
You need to look into using an index template that uses optimal mapping for your data. For logstash, it really helps to use doc_values on all fields you aggregate on and turning off norms as well on those fields. Doc_values means elasticsearch uses memory mapped files instead of heap memory for

Re: Read past EOF exception on .tis and .fdt file

2015-02-18 Thread Jilles van Gurp
plus 1 for a less invasive way to recover data I had a similar issue today on one of our test servers where I eventually managed to recover my index by running CheckIndex on one of my shards. In my case, I also had to remove the translog recovery file to actually get the cluster green. This is

exist filter also matching on nested fields

2015-02-10 Thread Jilles van Gurp
I'm trying to filter documents that have a particular field at the top level. Like for example: { group_id:xxx } so I wrote the following query: GET inbot_users/usercontact/_search { query: { filtered: { query: { match_all: {} }, filter: { exists: {

Re: exist filter also matching on nested fields

2015-02-10 Thread Jilles van Gurp
://gist.github.com/hkorte/ca5f91e2f4838213d956 I tried it using ES 1.4.1. Which version are you using? Does my gist work for you? I noticed that it doesn't matter whether or not it is a nested document. Best regards, Hannes On 10.02.2015 15:52, Jilles van Gurp wrote: I'm trying

context suggester large number of categories

2015-02-06 Thread Jilles van Gurp
I'm considering to use the context suggester to autocomplete some user specific categories. The idea here is that users can create their own tags and should see only suggestions for their own tags. One solution is to use the complettion suggester with a context and use the userId as the

Re: geo_shape and NullPointerException

2015-01-28 Thread Jilles van Gurp
Yeah, looks like that is the problem indeed. Probably, es needs a bit more friendly error here. This is a common mistake with geojson. Jilles On Wednesday, January 28, 2015 at 2:16:25 PM UTC+1, Roman Drogolov wrote: Yep, format is wrong. Wrap coordinates to one more array. Like this: {

Re: optimize elasticsearch / JVM

2015-01-28 Thread Jilles van Gurp
How much heap are you giving to ES? With this many requests, if your setup is not falling over it is probably not garbage collect related because that would result in very noticable delays/unavailability of es. 32GB should be a good value given how much memory you have. Also, you probably want

Re: stats aggregation on list length

2015-01-23 Thread Jilles van Gurp
work. Masaru On January 22, 2015 at 19:10:25, Jilles van Gurp (jilles...@gmail.com javascript:) wrote: I'm trying to do a stats aggregation on the list length using a script but I'm getting errors. For this data, PUT test_groups/group/1 { name:1, members

stats aggregation on list length

2015-01-22 Thread Jilles van Gurp
I'm trying to do a stats aggregation on the list length using a script but I'm getting errors. For this data, PUT test_groups/group/1 { name:1, members:[ { name:m1 } ] } PUT test_groups/group/2 { name:2, members:[ { name:m1 }, { name:m2 } ]

Re: Moving to Java 8?

2015-01-22 Thread Jilles van Gurp
You should be able to access es from java 8 source code. Most collection classes will support streams already and lambdas and you can use lambdas in many places where you would use inner classes otherwise. I suspect, a full switch to java 8 might take some time for big projects such as

Re: Looking for a suggestion to better organize our indices for performance

2014-12-09 Thread Jilles van Gurp
Indeed increase your shard count. Also, you may want to consider using a routing parameter based on e.g. a tenant_id to ensure all queries related to a tenant only hit shards that actually have data for that tenant. Those two measures would reduce the size of each shard and the number of shards

Re: ES enterprise search engine or log analytics ?

2014-12-09 Thread Jilles van Gurp
The best way to think of elasticsearch is as an ever evolving swiss army knife for search. It has support for both traditional search features as well as some advanced features for structured search. For example, elasticsearch aggregations have very little to do with text search but are highly

elasticsearch 1.4.0 yellow with shards stuck initializing - need help diagnosing this

2014-11-27 Thread Jilles van Gurp
Our production cluster went yellow last night after our logstash index rolled over to the next version. I've seen this happen before but this time I decided to properly diagnose and seek some feedback on what might be going on. So, I'd love some feedback on what is going on. I'm happy to keep

Re: elasticsearch 1.4.0 yellow with shards stuck initializing - need help diagnosing this

2014-11-27 Thread Jilles van Gurp
'? Unfortunately the field name didn't get included with your errors. On 27 November 2014 at 11:19, Jilles van Gurp jilles...@gmail.com javascript: wrote: Our production cluster went yellow last night after our logstash index rolled over to the next version. I've seen this happen before

Re: elasticsearch 1.4.0 yellow with shards stuck initializing - need help diagnosing this

2014-11-27 Thread Jilles van Gurp
of things where it shouldn't get in this state. Apparently a fix for that part is coming. Best, Jilles On Thursday, November 27, 2014 11:19:20 AM UTC+1, Jilles van Gurp wrote: Our production cluster went yellow last night after our logstash index rolled over to the next version. I've seen

Java 8 recommended version?

2014-10-16 Thread Jilles van Gurp
I know JDK 7u55 was labeled as OK some time ago and this is still listed as the official requirement: http://www.elasticsearch.org/guide/en/elasticsearch/hadoop/current/requirements.html However, time has moved on and I was wondering what the testing status and advice is for more recent JDKs.

Re: more accurate date based scoring

2014-10-07 Thread Jilles van Gurp
, Jilles van Gurp jilles...@gmail.com javascript: wrote: PUT /test/test/1 { date:2013-04-01T00:00:00Z } PUT /test/test/2 { date:2013-04-01T00:00:01Z } PUT /test/test/3 { date:2013-04-01T00:00:03Z } PUT /test/test/4 { date:2013-04-01T00:01:03Z } Given these documents

Re: more accurate date based scoring

2014-10-07 Thread Jilles van Gurp
I found another workable solution. sort: [ { _score: { order: desc } }, { date: { order: desc } } ] This sorts first by score and then by date. So this has the effect of ranking by score and then ranking those items with the same

'out of float scope for function score deviation' error using function_score and a gauss decay

2014-10-06 Thread Jilles van Gurp
Using elasticsearch 1.3.4 I have an index with user events and I'm trying to use function_score to get a sensible order by date (without using sort). The query below works, but only for relatively small result sets. While trying to test whether this orders things correctly (following up on

more accurate date based scoring

2014-10-06 Thread Jilles van Gurp
PUT /test/test/1 { date:2013-04-01T00:00:00Z } PUT /test/test/2 { date:2013-04-01T00:00:01Z } PUT /test/test/3 { date:2013-04-01T00:00:03Z } PUT /test/test/4 { date:2013-04-01T00:01:03Z } Given these documents, I'm trying to come up with a query that scores them such that they come

term filter maching on nested property if top level field is missing

2014-09-22 Thread Jilles van Gurp
We have a json structure that may have a deleted in several places. With no explicit mapping, if I post this: POST /test/fooo/1 { properties:{ deleted:true } } and then query for GET /test/fooo/_search { query: { term: { deleted: { value: true } } } }

Re: cluster can't recover after upgrade from 1.1.1 to 1.3.2 due to MaxBytesLengthExceededException

2014-09-11 Thread Jilles van Gurp
You are running into this problem: http://elasticsearch-users.115913.n3.nabble.com/encoding-is-longer-than-the-max-length-32766-td4056738.html You need to change the mapping and define a maximum token length in your analyzer. Unfortunately, you would need to do that before you migrate and I

Re: Building an ERP with Elasticsearch. Am I crazy?

2014-08-26 Thread Jilles van Gurp
This is the generally accepted dogma and it has some merit. However, having two storage systems is more than a bit annoying. If you are aware of the limitations and caveats, elasticsearch is actually a perfectly good document store that happens to have a deeply integrated querying engine. This

Re: Using elasticsearch as a realtime fire hose

2014-08-26 Thread Jilles van Gurp
You might want to look at developing a plugin for this or maybe using an existing one. This one for example might do partly what you need: https://github.com/derryx/elasticsearch-changes-plugin If you develop your own plugin, you should be able to tap into what is happening in the cluster at a

Re: Logstash stop communicating with Elasticsearch

2014-08-26 Thread Jilles van Gurp
I had some issues with logstash as well and ended up modifying the elasticsearch_http plugin to tell me what was going on. Turned out my cluster was red because my index template required more replicas than was possible:-). The problem was that logstash does not fail very gracefully and

Re: Java API or REST API for client development ?

2014-08-26 Thread Jilles van Gurp
I use a in house developed java rest client for elasticsearch. Unfortunately it's not in any shape to untangle from our code base and put on Github yet but I might consider that if there's more interest. Basically I use apache httpclient, I implemented a simple round robin strategy so I can

Re: Searching for geo_circles as geo_shapes

2014-07-10 Thread Jilles van Gurp
I have a bit of code in my geogeometry project that converts circles to polygons with whatever number of segments you wish: https://github.com/jillesvangurp/geogeometry/blob/master/src/main/java/com/jillesvangurp/geo/GeoGeometry.java. A few dozen gives you a really nice approximation of a

refresh API and parent child routing

2014-07-01 Thread Jilles van Gurp
I have a bit of functionality where I occasionally need to refresh documents so I can guarantee queries actually include recent modifications. Disclaimer, I'm of course aware that this is not a great practice. I'm using parent child relations and was wondering if I can restrict the refresh

Re: Elasticsearch Memory issue

2014-07-01 Thread Jilles van Gurp
You should tweak cache sizes. At least the field data cache needs to be restricted (unbounded by default). Also, ensuring the various circuit breakers are turned on will help. Another tip is to disable the _all field if you don't need it. All this should reduce the amount of memory ES uses

Re: does document database means denormalize

2014-06-13 Thread Jilles van Gurp
Yes, definitely think in terms of denormalizing. Joins are hard/expensive in elasticsearch so you need to avoid needing to joing by prejoining. But you have other options as well, see http://www.elasticsearch.org/blog/managing-relations-inside-elasticsearch/ So, say you had a person table and

deduplicating using nested query

2014-06-11 Thread Jilles van Gurp
I have a simple problem where it would be useful to a query like: get me everything that matches query1 except if field foo is in results of query2. The simple solution is to first do query2, fetch the foo field for all the results (potentially thousands), stuff it in some hash and generate a

cluster failure

2014-05-16 Thread Jilles van Gurp
I just had an incident where my entire cluster (all nodes) ended up using 100% cpu on each nod at the same time and become completely unresponsive even to /_cluster/health. This happened while I was using Kibana, which was working fine up to that point. I was running a few simple queries

context suggester

2014-05-13 Thread Jilles van Gurp
Hi, I could really use the new context suggester, which I understand is coming with 1.2. I'm planning to use it for user specific suggestions on e.g. tags and names. Is there any ETA on when 1.2 is going to happen (roughly)? I saw that Lucene 4.8 landed on master but not yet on the 1.1

embedded es test server hangs on startup

2014-05-08 Thread Jilles van Gurp
I'm trying run elasticsearch as part of my jruby tests. Here's some of the code I use to do that: Settings settings = ImmutableSettings.settingsBuilder() .put(name, nodeName) .put(cluster.name, linko-dev-cluster)

elasticsearch rpm and configuring garbage collection

2014-04-25 Thread Jilles van Gurp
I've been using the elasticsearch rpms (1.1.1) on our centos 6.5 setup and I've been wondering about the recommended way to configure it given that it deploys an init.d script with defaults. I figured out that I can use /etc/sysconfig/elasticsearch for things like heap size. However,

Re: 1 large index vs several smaller indexes

2014-04-16 Thread Jilles van Gurp
I would separate the performance issue from the logical structure of your domain. You really need to thing in terms of numbers of documents and shards (and not indices). You may want to look into using index aliases, which can take a filter. That way you can have one index and several branch

Re: upgrading from elasticsearch 0.90.5 to 1.0.1

2014-02-28 Thread Jilles van Gurp
You should take note of the compatibility breaking changes in the release notes of course and do some functional tests with the new version to ensure that you are not affected by those changes. Additionally, I would carefully plan and test the update procedure with a smaller cluster first. In

Re: Newbie question, installed everything, but getting blank page

2014-02-18 Thread Jilles van Gurp
The default should be fine if you have port 9200 open. I've been messing around with kibana as well and it isn't the most friendly to setup. I'd prefer the way e.g. elasticsearch-head works, which simply allows you to paste a url in the UI. I can run elasticsearch-head straight from the git

Re: Elastic search 1.0.0 RC1 and Logstash 1.3.3?

2014-02-17 Thread Jilles van Gurp
For what it is worth, I performed the upgrade this morning. In the end I waited until the 1.0 release. I basically upgraded as follows: 1) shutdown logstash agents reading from redis so write traffic to es cluster stopped 2) shutdown the es cluster entirely 3) one by one, upgrade the machines

Re: Indexing large number of documents

2014-02-17 Thread Jilles van Gurp
You'll want to use the batch API instead of indexing one document at the time. That scales a lot better. I've done tens of millions of documents like that in minutes. Basically, you can use mutlithreading with batch as well but you may want to not outnumber the number of cpus you can dedicate

Elastic search 1.0.0 RC1 and Logstash 1.3.3?

2014-01-28 Thread Jilles van Gurp
I'm considering to start using elastic search 1.0.0RC1 in a new project. However, I also need to deploy logstash (and probably also Kibana). Given the API changes in the new release candidate, can I expect Logstash to play nice with elastic search, should I wait, or should I use development