Hi,
We've noticed a strange behavior in elasticsearch during paging.
In one case we use a paging size of 60 and we have 63 documents. So the
first page is using size 60 and offset 0. The second page is using size 60
and offset 60. What we see is that the result is inconsistent. Meaning, on
the
You need to use scroll if you have that requirement.
See:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-scroll.html#search-request-scroll
--
David ;-)
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs
Le 18 août 2014 à 08:02, Ron Sher
Hi Ron,
The cause of this issue is that Elasticsearch uses Lucene's internal doc
IDs as tie-breakers. Internal doc IDs might be completely different across
replicas of the same data, so this explains why documents that have the
same sort values are not consistently ordered.
There are 2 potential
Hi John,
You should be able to do something like:
{
aggs: {
verb: {
terms: {
field: verb
},
aggs: {
load_time_outliers: {
percentiles: {
field: responsetime
}
}
}
}
}
}
This will first break down your
Hi Ashish,
On Thu, Aug 14, 2014 at 12:35 AM, Ashish Mishra laughingbud...@gmail.com
wrote:
That sounds possible. We are using spindle disks. I have ~36Gb free for
the filesystem cache, and the previous data size (without the added field)
was 60-65Gb per node. So it's likely that 50% of
Script filters are inherently slow due to the fact that they cannot
leverage the inverted index in order to skip efficiently over non-matching
documents. Even if they were written in assembly, this would likely still
be slow.
What kind of filtering are you trying to do with scripts?
On Thu, Aug
Can you elaborate more on what you are after?
On Wed, Aug 13, 2014 at 5:16 PM, project2501 darreng5...@gmail.com wrote:
The old facet DSL was very nice and easy to understand. I could declare
only which fields I wanted returned.
how is this done with aggregations? The docs do not say.
I
We've noticed a strange behavior in elasticsearch during paging.
In one case we use a paging size of 60 and we have 63 documents. So the
first page is using size 60 and offset 0. The second page is using size 60
and offset 60.
What we see is that the result is inconsistent. Meaning, on the 2nd
Hi Phil,
We would indeed consider a PR for that change if it makes things easier to
you. Feel free to ping me when you open it so that I don't miss it.
On Wed, Aug 13, 2014 at 3:55 PM, Phil Wills otherp...@gmail.com wrote:
Hello,
In the Java API AbstractAggregationBuilder's name property is
You have asked teh same question from another GMAIL ID.
Please refer to the answers over there.
Thanks
Vineeth
On Mon, Aug 18, 2014 at 10:08 AM, ronsher rons...@gmail.com wrote:
We've noticed a strange behavior in elasticsearch during paging.
In one case we use a paging size of
Thanks Adrien for reply.
My script filter was,
===
{
script: {
script: xyz,
params: {
startRange: 1407939675, // Timestamp in
milliseconds ... keep changing on all queries
Hi,
We're using Elasticsearch with an Analyzer to map the `y` character to
`ij`, (*char_fitler* named char_mapper) since in Dutch these two are
somewhat interchangeable. We're also using a *lowercase filter*.
This is the configuration:
{
analysis: {
analyzer: {
index: {
Your filter would be faster if you used range filters on the start/end
dates instead of using a script.
On Mon, Aug 18, 2014 at 10:52 AM, avacados kotadia.ak...@gmail.com wrote:
_cache: true // I removed this caching and i
found significant performance improvement...
Hi!
I try test elasticsearch suggester, but i got strange error.
user@user:/user/esconfig # curl -X POST
'localhost:9200/dwh_direct/_suggest?pretty' -d @suggester
{
_shards : {
total : 5,
successful : 0,
failed : 5,
failures : [ {
index : dwh_direct,
shard : 0,
Hi everyone !
I'm currently working on a tool with *ES and Twitter Streaming API*, in
which I try to find interesting profiles on Twitter, based on what they
tweet, RT and which of their interactions are shared/RT.
Anyway, I use ES to index and search among tweets. To do that, I get
Twitter
I followed this link to create an elasticsearch 2 nodes cluster on Azure: this
link http://thomasardal.com/running-elasticsearch-in-a-cluster-on-azure/
the installation and configuring went good.
When i started to check the cluster i found a strange behaviour from the
php client.
I declared
Hello. Does someone use NEST for .NET?
Please help me.
Sometime ago I asked how to get part of textfield. I wanted to do it with
Highlight param no_match_size, but it's supported since NEST version
1.0RC1. After update nest.dll from 0.12 to 1.0 I got problem that nothing
works. Looking GitHub
Hello again Mark,
Thanks for your response. Your answers really are very helpful.
As with our previous conversation
https://groups.google.com/d/topic/elasticsearch/ZouS4NVsTJw/discussion I
am confused about how to make a client node also be master eligible. This
is what I posted there, I
Hi,
This is for elasticsearch : elasticsearch-1.3.2-1.noarch
There are 2 nodes in the cluster.
I have installed the river-csv pluging.
When loading a file with 5 million rows loading stops after 477400 rows.
I load with :
curl -XPUT localhost:9200/_river/my_csv_river/_meta -d '
{
type :
Hello Maxim ,
Can you show the schema and a sample data that you have indexed.
Thanks
Vineeth
On Mon, Aug 18, 2014 at 3:31 PM, m...@ciklum.com wrote:
Hi!
I try test elasticsearch suggester, but i got strange error.
user@user:/user/esconfig # curl -X POST
You can put *threadpool.search.type: **cached* on elasticsearch.yml for
unbounded queue for reads.
2014-08-10 9:52 GMT-03:00 James digital...@gmail.com:
On Sat, 2014-08-09 at 23:53 -0700, Deep wrote:
Hi,
Elastic search internally has thread pool and a queue size is associated
with
David hi,
How can I configure the mapping so that the default analyzer will be the
whitespace one?
On Wed, Aug 13, 2014 at 2:46 PM, David Pilato da...@pilato.fr wrote:
Having no answer is not good. I think something goes wrong here. May be
you should see something in logs.
That
What does it work the threadpool using reject_policy *caller*?
Can I catch the exception EsRejectedExecutionException (using Java api)
during heavy writes?
--
Atenciosamente,
Sávio S. Teles de Oliveira
voice: +55 62 9136 6996
http://br.linkedin.com/in/savioteles
Mestrando em Ciências da
I think could help you:
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/custom-dynamic-mapping.html
--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr
Le 18 août 2014 à 15:39:36, Luc Evers (lucev...@gmail.com) a écrit:
David hi,
How
First of all kudos on the awesome job everyone here is doing!
I was wondering if you guys can help me solve this puzzle:
Also available on stack
overflow:
http://stackoverflow.com/questions/25361795/elasticsearch-how-to-normalize-score-when-combining-regular-query-and-function
Idealy what I
Could someone help me write a grok filter for this log real quick here is
what the log looks like:
Aug 18 09:40:39 server01 webmin_log: 172.16.16.96 - username
*[18/Aug/2014:09:40:39
-0400]* GET /right.cgi?open=systemopen=status HTTP/1.1 200 3228
here is what I have so far:
match = [
That's spot on. Thanks!
On 18 Aug 2014 09:08, Adrien Grand adrien.gr...@elasticsearch.com wrote:
Hi John,
You should be able to do something like:
{
aggs: {
verb: {
terms: {
field: verb
},
aggs: {
load_time_outliers: {
percentiles: {
It seems to me that ES ignores the index.query.bool.max_clause_count
argument in elasticsearch.yml
Setting index.query.bool.max_clause_count: 5000 results in the following
error:
Caused by: org.apache.lucene.search.BooleanQuery$TooManyClauses:
maxClauseCount is set to 1024
Any solution whats
Slight follow on - do you know if returning this sort of stuff via Kibana
is on the cards?
Just looking for an easy way to graph the results.
Thanks.
On Friday, 15 August 2014 10:23:16 UTC+1, John Ogden wrote:
Hi,
Am trying to run a single command which calculates percentiles for
I've been given a requirement to produce a single kibana dashboard showing
app response times for multiple date ranges, and am stumped at how to
proceed.
The user wants to see today's graph, along with the previous working day,
day -7, day -28 and day -364 on the same screen - ideally, all 4
Support for aggregations is indeed something that is on the roadmap for the
next version of Kibana (Kibana 4), see this message from Rashid:
https://groups.google.com/forum/?utm_medium=emailutm_source=footer#!msg/elasticsearch/I7um1mX4GSk/aUsT2EmyxysJ
On Mon, Aug 18, 2014 at 4:33 PM, John Ogden
Hi,
From looking at the docs, didn't seem overly clear. Is it possible to
include the data in an aggregate, or is it counts only?
John
--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails
Aggregations only report counts or various metrics (see the metrics
aggregations: stats, min, max, sum, percentiles, cardinality, top_hits,
...). Maybe top_hits is what you are looking for?
Hi Greg,
I believe max_num_segments is technically a hint that can be overridden by the
merge algorithm if it decides to. You might try simply re-running the optimize
again to get from ~25 down closer to 1. Sorry but I don't know of any way to
see when the optimize is finished - it's really
Hey guys,
Finally i changed all my queries to constantscorequeries. It's way better,
but still, certain pages take a lot of time running... I don't understand
why, and i don't have anything in my ES logs...
Now the average time for search 20 users and their mentions/timeline +
scoring them
I am using the foollowing config file
filter{
grok{
match=[
message,
(?:\?|\)C\=%{DATA:kw}\%{DATA}\sT\s%{DATA:town}\sS\s%{WORD:state}\s%{DATA}%{IP:ip}
]
}
grok{
match=[
On Monday, August 18, 2014 9:57:41 AM UTC-4, Kevin M wrote:
Could someone help me write a grok filter for this log real quick here is
what the log looks like:
Aug 18 09:40:39 server01 webmin_log: 172.16.16.96 - username
*[18/Aug/2014:09:40:39
-0400]* GET
I dont see your post - what I am stuck with is whenever the date changes on
that log example:
*[18/Aug/2014:09:40:39 -0400]*
*[20/Aug/2014:11:40:39 -0104]*
*[19/Aug/2014:08:40:39 -0500]*
the filter will not match it
On Monday, August 18, 2014 1:53:37 PM UTC-4, vitaly wrote:
On Monday,
I released version 0.0.11 of the Experimental Highlighter
https://github.com/wikimedia/search-highlighter we've been using . Its
compatible with Elasticsearch 1.3.x and has a few new features:
1. Conditional highlighting - skip highlighting fields you aren't going to
use! Save time and IO
Heya,
We are pleased to announce the release of the Elasticsearch Mapper Attachment
plugin, version 2.2.1
The mapper attachments plugin adds the attachment type to Elasticsearch using
Apache Tika..
Release Notes - Version 2.2.1
Earlier today there was an Apache POI release to address a
Heya,
We are pleased to announce the release of the Elasticsearch Mapper Attachment
plugin, version 2.3.1
The mapper attachments plugin adds the attachment type to Elasticsearch using
Apache Tika..
Release Notes - Version 2.3.1
Earlier today there was an Apache POI release to address a
I saw this problem twice now.
I start with a Green two-node cluster, default 5 shards/node, I index about
50,000 docs, shards/replicas look great and well balanced across the 2
nodes.
I try the same test with 8 million docs. I come back when its done, and I
see all primary shards on node1 and
Hi all,
Just released to Central the v0.5 of the swift-repository plugin.
Mainly contains documentation updates but also built against
1.3.2 instead of 1.1.0.
https://github.com/wikimedia/search-repository-swift
-Chad
--
You received this message because you are subscribed to the Google
What version of ES do you use?
Jörg
On Mon, Aug 18, 2014 at 9:42 PM, rookie7799 pavelbara...@gmail.com wrote:
Hello there,
We are having the same exact problem with a really resource hungry query:
5 nodes with 16GB ES_HEAP_SIZE
1.2 Billion records inside 1 index with 5 shards
Whenever
I using the top hits aggregation with a has_child query. In the top_hits
aggregation documentation it says '*By default the hits are sorted by the
score of the main query*', but I'm not seeing that in the results for my
query
{
from: 0,
size: 3,
query: {
has_child: {
Hi, it's 1.3.2
On Monday, August 18, 2014 5:49:03 PM UTC-4, Jörg Prante wrote:
What version of ES do you use?
Jörg
On Mon, Aug 18, 2014 at 9:42 PM, rookie7799 pavelb...@gmail.com
javascript: wrote:
Hello there,
We are having the same exact problem with a really resource hungry query:
Hi,
I have a Elasticsearch Cluster of 2 nodes. I have configured them to store
data at the location which is /auto/share. I want to point one of the two
nodes in the cluster to some other location to store the data say /auto/foo.
What would be the best way of achieving the above task without
Do you want to copy the existing data in /auto/share to /auto/foo, or start
with no data?
Regards,
Mark Walkom
Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com
On 19 August 2014 08:23, shriyansh jain shriyanshaj...@gmail.com wrote:
Hi,
If you want no data in /auto/foo then just create the directory, give it
the right permissions and then update the config to point to it.
It's the same process you did for /auto/share.
Do you have replicas set on your indexes?
Regards,
Mark Walkom
Infrastructure Engineer
Campaign Monitor
Yes, I have set *index.number_of_replicas: 1*. If I just point one of the 2
nodes to some other location, wont it loose the data stored by that node.?
Thank you,
Shriyansh
On Monday, August 18, 2014 3:34:48 PM UTC-7, Mark Walkom wrote:
If you want no data in /auto/foo then just create the
Yes, I have set *index.number_of_replicas: 1*. If I just point one of the 2
nodes to some other location, wont it lose the data stored by that node.?
Thank you,
Shriyansh
On Monday, August 18, 2014 3:34:48 PM UTC-7, Mark Walkom wrote:
If you want no data in /auto/foo then just create the
If you point the instance to a new data location then yes, it will startup
with no data, but it won't lose the data completely as it will still be
located in your original /auto/share directory.
However given you have replicas set what will happen is when the node
starts up pointing to the new
Sorry if I have not replied sooner, but I was on vacation.
I would use the two fields solution, especially since you simply cannot
store a stripped version. The source field is compressed, so the additional
index size is content dependent. Never used highlighting, so I cannot
recommend
Why do you want to do this if you are worried about data loss?
Regards,
Mark Walkom
Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com
On 19 August 2014 11:50, shriyansh jain shriyanshaj...@gmail.com wrote:
As you mentioned the node will
Just to make sure if /auto/share goes down I have data in /auto/foo.
Thanks,
Shriyansh
On Monday, August 18, 2014 6:55:59 PM UTC-7, Mark Walkom wrote:
Why do you want to do this if you are worried about data loss?
Regards,
Mark Walkom
Infrastructure Engineer
Campaign Monitor
email:
To make sure if /auto/share goes down, I have data in /auto/foo. And I am
short of space on /auto/share. Mainly bcz of these 2 reasons.
Thanks,
Shriyansh
On Monday, August 18, 2014 6:55:59 PM UTC-7, Mark Walkom wrote:
Why do you want to do this if you are worried about data loss?
Regards,
This is why you have replicas, they give you redundancy at a higher level
that the filesystem,
If you are still concerned then you should add another node and increase
your replicas.
Playing around on the FS to create replicas is only extra management
overhead and likely to end up causing more
Apart from replica's, that's really outside the scope of what ES provides.
Regards,
Mark Walkom
Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com
On 19 August 2014 12:12, shriyansh jain shriyanshaj...@gmail.com wrote:
I got your point sir,
Thank you for helping me out. I really appreciate it.
Regards,
Shriyansh
On Monday, August 18, 2014 7:23:50 PM UTC-7, Mark Walkom wrote:
Apart from replica's, that's really outside the scope of what ES provides.
Regards,
Mark Walkom
Infrastructure Engineer
Campaign Monitor
email:
I would like to know one more thing, what would be steps if I want to copy
the data from /auto/share to /auto/foo for a particular node.?
Thanks,
Shriyansh
On Monday, August 18, 2014 3:26:39 PM UTC-7, Mark Walkom wrote:
Do you want to copy the existing data in /auto/share to /auto/foo, or
Char filters are applied before the text is tokenized, and therefore they
are applied before the normal filters are used, which is why they are a
separate class of filter. With Lucene, the order is:
char filters - tokenizer - filters
Have you looked into the ICU analyzer?
Master, data and client are really just abstractions of different
combinations of node.data and node.master values.
A node.master=true, node.data=false can handle both cluster management and
queries.
Regards,
Mark Walkom
Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
62 matches
Mail list logo