You need to look into using an index template that uses optimal mapping for
your data. For logstash, it really helps to use doc_values on all fields
you aggregate on and turning off norms as well on those fields. Doc_values
means elasticsearch uses memory mapped files instead of heap memory for
plus 1 for a less invasive way to recover data
I had a similar issue today on one of our test servers where I eventually
managed to recover my index by running CheckIndex on one of my shards. In
my case, I also had to remove the translog recovery file to actually get
the cluster green. This is
I'm trying to filter documents that have a particular field at the top
level. Like for example:
{
group_id:xxx
}
so I wrote the following query:
GET inbot_users/usercontact/_search
{
query: {
filtered: {
query: {
match_all: {}
},
filter: {
exists: {
://gist.github.com/hkorte/ca5f91e2f4838213d956
I tried it using ES 1.4.1. Which version are you using? Does my gist
work for you? I noticed that it doesn't matter whether or not it is a
nested document.
Best regards,
Hannes
On 10.02.2015 15:52, Jilles van Gurp wrote:
I'm trying
I'm considering to use the context suggester to autocomplete some user
specific categories. The idea here is that users can create their own tags
and should see only suggestions for their own tags.
One solution is to use the complettion suggester with a context and use the
userId as the
Yeah, looks like that is the problem indeed. Probably, es needs a bit more
friendly error here. This is a common mistake with geojson.
Jilles
On Wednesday, January 28, 2015 at 2:16:25 PM UTC+1, Roman Drogolov wrote:
Yep, format is wrong. Wrap coordinates to one more array. Like this:
{
How much heap are you giving to ES? With this many requests, if your setup
is not falling over it is probably not garbage collect related because that
would result in very noticable delays/unavailability of es. 32GB should be
a good value given how much memory you have. Also, you probably want
work.
Masaru
On January 22, 2015 at 19:10:25, Jilles van Gurp (jilles...@gmail.com
javascript:) wrote:
I'm trying to do a stats aggregation on the list length using a script
but
I'm getting errors. For this data,
PUT test_groups/group/1
{
name:1,
members
I'm trying to do a stats aggregation on the list length using a script but
I'm getting errors. For this data,
PUT test_groups/group/1
{
name:1,
members:[
{
name:m1
}
]
}
PUT test_groups/group/2
{
name:2,
members:[
{
name:m1
},
{
name:m2
}
]
You should be able to access es from java 8 source code. Most collection
classes will support streams already and lambdas and you can use lambdas in
many places where you would use inner classes otherwise. I suspect, a full
switch to java 8 might take some time for big projects such as
Indeed increase your shard count. Also, you may want to consider using a
routing parameter based on e.g. a tenant_id to ensure all queries related
to a tenant only hit shards that actually have data for that tenant. Those
two measures would reduce the size of each shard and the number of shards
The best way to think of elasticsearch is as an ever evolving swiss army
knife for search. It has support for both traditional search features as
well as some advanced features for structured search. For example,
elasticsearch aggregations have very little to do with text search but are
highly
Our production cluster went yellow last night after our logstash index
rolled over to the next version. I've seen this happen before but this time
I decided to properly diagnose and seek some feedback on what might be
going on.
So, I'd love some feedback on what is going on. I'm happy to keep
'? Unfortunately the field name
didn't get included with your errors.
On 27 November 2014 at 11:19, Jilles van Gurp jilles...@gmail.com
javascript: wrote:
Our production cluster went yellow last night after our logstash index
rolled over to the next version. I've seen this happen before
of things where it shouldn't get in this state. Apparently a fix for
that part is coming.
Best,
Jilles
On Thursday, November 27, 2014 11:19:20 AM UTC+1, Jilles van Gurp wrote:
Our production cluster went yellow last night after our logstash index
rolled over to the next version. I've seen
I know JDK 7u55 was labeled as OK some time ago and this is still listed as
the official
requirement:
http://www.elasticsearch.org/guide/en/elasticsearch/hadoop/current/requirements.html
However, time has moved on and I was wondering what the testing status and
advice is for more recent JDKs.
, Jilles van Gurp jilles...@gmail.com
javascript: wrote:
PUT /test/test/1
{
date:2013-04-01T00:00:00Z
}
PUT /test/test/2
{
date:2013-04-01T00:00:01Z
}
PUT /test/test/3
{
date:2013-04-01T00:00:03Z
}
PUT /test/test/4
{
date:2013-04-01T00:01:03Z
}
Given these documents
I found another workable solution.
sort: [
{
_score: {
order: desc
}
},
{
date: {
order: desc
}
}
]
This sorts first by score and then by date. So this has the effect of
ranking by score and then ranking those items with the same
Using elasticsearch 1.3.4
I have an index with user events and I'm trying to use function_score to
get a sensible order by date (without using sort). The query below works,
but only for relatively small result sets. While trying to test whether
this orders things correctly (following up on
PUT /test/test/1
{
date:2013-04-01T00:00:00Z
}
PUT /test/test/2
{
date:2013-04-01T00:00:01Z
}
PUT /test/test/3
{
date:2013-04-01T00:00:03Z
}
PUT /test/test/4
{
date:2013-04-01T00:01:03Z
}
Given these documents, I'm trying to come up with a query that scores them
such that they come
We have a json structure that may have a deleted in several places. With no
explicit mapping, if I post this:
POST /test/fooo/1
{
properties:{
deleted:true
}
}
and then query for
GET /test/fooo/_search
{
query: {
term: {
deleted: {
value: true
}
}
}
}
You are running into this
problem:
http://elasticsearch-users.115913.n3.nabble.com/encoding-is-longer-than-the-max-length-32766-td4056738.html
You need to change the mapping and define a maximum token length in your
analyzer. Unfortunately, you would need to do that before you migrate and I
This is the generally accepted dogma and it has some merit. However, having
two storage systems is more than a bit annoying. If you are aware of the
limitations and caveats, elasticsearch is actually a perfectly good
document store that happens to have a deeply integrated querying engine.
This
You might want to look at developing a plugin for this or maybe using an
existing one. This one for example might do partly what you
need: https://github.com/derryx/elasticsearch-changes-plugin
If you develop your own plugin, you should be able to tap into what is
happening in the cluster at a
I had some issues with logstash as well and ended up modifying the
elasticsearch_http plugin to tell me what was going on. Turned out my
cluster was red because my index template required more replicas than was
possible:-). The problem was that logstash does not fail very gracefully
and
I use a in house developed java rest client for elasticsearch.
Unfortunately it's not in any shape to untangle from our code base and put
on Github yet but I might consider that if there's more interest.
Basically I use apache httpclient, I implemented a simple round robin
strategy so I can
I have a bit of code in my geogeometry project that converts circles to
polygons with whatever number of segments you wish:
https://github.com/jillesvangurp/geogeometry/blob/master/src/main/java/com/jillesvangurp/geo/GeoGeometry.java.
A few dozen gives you a really nice approximation of a
I have a bit of functionality where I occasionally need to refresh
documents so I can guarantee queries actually include recent modifications.
Disclaimer, I'm of course aware that this is not a great practice.
I'm using parent child relations and was wondering if I can restrict the
refresh
You should tweak cache sizes. At least the field data cache needs to be
restricted (unbounded by default). Also, ensuring the various circuit
breakers are turned on will help. Another tip is to disable the _all field
if you don't need it.
All this should reduce the amount of memory ES uses
Yes, definitely think in terms of denormalizing. Joins are hard/expensive
in elasticsearch so you need to avoid needing to joing by prejoining. But
you have other options as well, see
http://www.elasticsearch.org/blog/managing-relations-inside-elasticsearch/
So, say you had a person table and
I have a simple problem where it would be useful to a query like: get me
everything that matches query1 except if field foo is in results of
query2.
The simple solution is to first do query2, fetch the foo field for all the
results (potentially thousands), stuff it in some hash and generate a
I just had an incident where my entire cluster (all nodes) ended up using
100% cpu on each nod at the same time and become completely unresponsive
even to /_cluster/health. This happened while I was using Kibana, which was
working fine up to that point. I was running a few simple queries
Hi,
I could really use the new context suggester, which I understand is coming
with 1.2. I'm planning to use it for user specific suggestions on e.g. tags
and names.
Is there any ETA on when 1.2 is going to happen (roughly)? I saw that
Lucene 4.8 landed on master but not yet on the 1.1
I'm trying run elasticsearch as part of my jruby tests. Here's some of the
code I use to do that:
Settings settings = ImmutableSettings.settingsBuilder()
.put(name, nodeName)
.put(cluster.name, linko-dev-cluster)
I've been using the elasticsearch rpms (1.1.1) on our centos 6.5 setup and
I've been wondering about the recommended way to configure it given that it
deploys an init.d script with defaults.
I figured out that I can use /etc/sysconfig/elasticsearch for things like
heap size. However,
I would separate the performance issue from the logical structure of your
domain. You really need to thing in terms of numbers of documents and
shards (and not indices).
You may want to look into using index aliases, which can take a filter.
That way you can have one index and several branch
You should take note of the compatibility breaking changes in the release
notes of course and do some functional tests with the new version to ensure
that you are not affected by those changes. Additionally, I would carefully
plan and test the update procedure with a smaller cluster first. In
The default should be fine if you have port 9200 open. I've been messing
around with kibana as well and it isn't the most friendly to setup. I'd
prefer the way e.g. elasticsearch-head works, which simply allows you to
paste a url in the UI. I can run elasticsearch-head straight from the git
For what it is worth, I performed the upgrade this morning. In the end I
waited until the 1.0 release.
I basically upgraded as follows:
1) shutdown logstash agents reading from redis so write traffic to es
cluster stopped
2) shutdown the es cluster entirely
3) one by one, upgrade the machines
You'll want to use the batch API instead of indexing one document at the
time. That scales a lot better. I've done tens of millions of documents
like that in minutes. Basically, you can use mutlithreading with batch as
well but you may want to not outnumber the number of cpus you can dedicate
I'm considering to start using elastic search 1.0.0RC1 in a new project.
However, I also need to deploy logstash (and probably also Kibana). Given
the API changes in the new release candidate, can I expect Logstash to play
nice with elastic search, should I wait, or should I use development
41 matches
Mail list logo