adding a new node: how to prime the data

2014-11-20 Thread Yves Dorfsman
We upgrade our clusters by adding new nodes, increase the number or replicas on the indices, let the new node catch up, then exclude the old node, and reduce the number of replicas on the indices. One cluster has a large index for which this operation takes hours. We tried to copy data from an

Re: Changing Analyzer behavior for hyphens - suggestions?

2014-11-20 Thread horst knete
Hi, thx for response and this awesome plugin bundle (especially for me as german). Unfortunately the hyphen analyzer plugin didnt do the job in the way i wanted it to be. The hyphen-analyzer does something similar like the whitespace analyzer - it just dont split on hyphen and instead see

Re: problem with heap space overusage

2014-11-20 Thread tetlika
anyone? Середа, 19 листопада 2014 р. 13:32:37 UTC+1 користувач Serg Fillipenko написав: We have contact profiles (20+ fields, containing nested documents) indexed and their social profiles(10+ fields) indexed as child documents of contact profile. We run complex bool match queries, delete

Double entries in Kibana?

2014-11-20 Thread Siddharth Trikha
I am using logstash 1.4.1 and elsticsearch 1.1.1. My setup is showing an issue: For every new line (log) added to the log file I am getting two entries in Kibana i.e every log entry is showing twice in Kibana. However when I check my logstash console, the log line is showing only once. Any

Re: Double entries in Kibana?

2014-11-20 Thread Siddharth Trikha
My elasticsearch console: [2014-11-20 14:14:42,229][INFO ][cluster.metadata ] [Brothers Grimm] [logstash-2014.11.20] creating index, cause [auto(bulk api)], shards [5]/[1], mappings [_default_] [2014-11-20 14:14:42,672][INFO ][cluster.metadata ] [Brothers Grimm]

Issue with higlighting and analyzed tokens

2014-11-20 Thread felix
Hi, I am experiencing an unexpected result with highlighting when using an _analyzer path in the mapping and custom analyzers. The highlighting returns no result for some query terms, even though the term matches and the document is returned. For other query terms it works fine. Somehow it

Best way to check a document has been indexed

2014-11-20 Thread asanchez
Hello, I'm developing a piece of code that inserts a document into an elastcisearch server. The code uses libcurl to setup an HTTP request and capture the response. So, in order to check wether a document has been properly indexed, what is the official or proper way to do it? This is an

How is the idf calculated for an alias that maps to multiple indexes?

2014-11-20 Thread Dan Tuffery
If I have mapped an alias to more than one index and I execute a search using the alias name, will the idf be calculated for each individual index or will the idf calculation take into consideration all of the indexes that are mapped to the alias? -- You received this message because you are

Search template in bulk

2014-11-20 Thread Viranch Mehta
Hey, I was wondering if there is a way to execute search template queries in bulk. For eg, I have couple of search templates registered in .scripts index. I want to run a bulk search template using these templates and different set of parameters for each search in bulk. Example query could

MLT query delivering strange results

2014-11-20 Thread Daniel Kummer
I have been trying to figure out how exactly the more_like_this query behaves. The doc says Under the hood, more_like_this simply creates multiple should clauses in a bool query of interesting terms extracted from some provided text. But I found several examples that I could not explain. This

Custom Aggregation / Access to documents

2014-11-20 Thread AndyP
When implementing a custom aggregation: can I access the result documents in my aggregator so that I can skip result documents based on it's properties? To make it clearer I explain I have an index products that contains product documents. A product contains a nested collection of variant

Re: What is the best practice for periodic snapshotting with awc-cloud+s3

2014-11-20 Thread João Costa
Hello, Sorry for hijacking this thread, but I'm currently also pondering the best way to perform periodic snapshots in AWS. My main concern is that we are using blue-green deployment with ephemeral storage on EC2, so if for some reason there is a problem with the cluster, we might lose a lot

Re: upgrading from 0.90.7 to 1.4. Gotchas?

2014-11-20 Thread Jason Wee
I would be interested too, we are using the same 0.90.7 version. Jason On Thu, Nov 20, 2014 at 2:03 PM, Yves Dorfsman y...@zioup.com wrote: Are there any precautions to take before upgrading from 0.9 to 1.4? Different data types? Different API calls? etc... And, what is the best way to

Re: Best way to check a document has been indexed

2014-11-20 Thread vineeth mohan
Hi , Just check if its 200 ( Indexed ) or 201 ( Created ) . HTTP status code alone should be sufficient. Thanks Vineeth On Thu, Nov 20, 2014 at 4:01 PM, asanchez asanchez1...@gmail.com wrote: Hello, I'm developing a piece of code that inserts a document into an elastcisearch

Re: Custom Aggregation / Access to documents

2014-11-20 Thread Colin Goodheart-Smithe
Hi, I think you should be able to achieve the functionality you need without writing a custom aggregation. If you use a combination of the filter aggregation wrapped in a nested aggregation then you should be able to filter the child documents (variant) before they are returned. Then if you

Re: how to migrate lucene index into elasticsearch

2014-11-20 Thread Gaurav gupta
Thanks Jorg for the guidance and I have am trying the suggested approach #1 and I have further question on it. As you mentioned - *- a custom written tool could traverse the segments and extract field information and build a rudimentary mapping (without analyzer, without info about _all and

Re: Issue with higlighting and analyzed tokens

2014-11-20 Thread Nikolas Everett
I remember there was a github issue about path specified analyzers and highlighting but I can't find it. Reading it may be your best bet. On Thu, Nov 20, 2014 at 5:14 AM, fe...@squirro.com wrote: Hi, I am experiencing an unexpected result with highlighting when using an _analyzer path in

Is Elasticsearch also supported on AIX and HP Itanium 11.31

2014-11-20 Thread Gaurav gupta
Is Elasticsearch also supported on AIX and HP Itanium 11.31. I didn't find this information in release notes or installation instructions. Thanks Gaurav -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop

Re: analyzing wildcard queries ...

2014-11-20 Thread mkamm78
hi jörg just wanted to tell you that i will/can not fork/commit my improvement on wildcard analysis cause i'm no longer 100% convinced that it is really an improvement resp. can be used in general... after rethinking i must admit that i was probably too much focused on my concrete issues with

Re: Does nested query with operator honor the operator or does it always display some default behavior

2014-11-20 Thread Ramdev Wudali
Hi Ivan: I tried using the _explain API (end point to get an explanation, it returned this : { _index: news, _type: swift, _id: _explain, _version: 5, created: false } I tried adding explain:true as part of my query which resulted in this : _explanation: {

Re: If I use EC2 Discovery Plugin do I necessarily give internet access to my instances?

2014-11-20 Thread wellszhane
I have the same problem yesterday. What I did is make elastic IP and associate it with your ec2 instance. In the sercuity group you need open both private Ip and the elastic IP. try it. On Wednesday, November 19, 2014 8:01:48 AM UTC-5, David Vasquez wrote: Hi everyone! I'm trying to

Deleted indices keep coming back w/ 1.4.0

2014-11-20 Thread David Smith
Hi, Since we upgraded to 1.4.0, deleted indices in our time-series index set keep coming back right after deletion. So whenever we drop an expired index (usually as midnight rolls), it gets deleted and removed from the alias it was under. But about half the time it comes back as an empty

ES seems to be aliasing the byte type to the short type

2014-11-20 Thread Damien Montigny
Hi everyone, If was experimenting on mappings for index size optimization purpose and I have an issue, it seems a bug to me, I cannot find any documentaion about it. When I declare a field of type *byte *ES seems to be considering it as *short*, for proof see the error message of the last

Re: upgrading from 0.90.7 to 1.4. Gotchas?

2014-11-20 Thread David Smith
I can't remember what 0.90.x was unlike as that was long ago for us, but we recently upgraded from 1.1.0 to 1.4.0. Look at http://www.elasticsearch.org/guide/en/elasticsearch/reference/1.x/breaking-changes.html additionally pay attention to: - scripting: - replacement of mvel w/

Re: upgrading from 0.90.7 to 1.4. Gotchas?

2014-11-20 Thread 熊贻青
The most surprising part of my upgrade from 0.90 to 1.0.1 was the drop of indexing performance. So, yes, I’m also interested to know any gotchas. 2014年11月20日 下午8:47于 Jason Wee peich...@gmail.com写道: I would be interested too, we are using the same 0.90.7 version. Jason On Thu, Nov 20, 2014 at

Re: upgrading from 0.90.7 to 1.4. Gotchas?

2014-11-20 Thread David Smith
Also, forgot to mention... if you have native scripts, they will mysteriously throw Unsupported Operation exception whenever invoked. Looks like they made a mistake in 1.4.0 (that is now reverted on master), that requires you to override the setScorer in native scripts. It's ok, I just wish

ElasticSearch 1.3.4 - Duplicate data sometimes

2014-11-20 Thread John D. Ament
Hi, I was wondering how I might be able to trouble shoot issues with duplicate data coming back from queries. In my query, I perform an aggregate query, something like this: final SearchResponse searchResponse = client() .prepareSearch(indexName) .setTypes(OBJ_TYPE) .setFetchSource(true)

Re: ES backups without using snapshots?

2014-11-20 Thread Ivan Brusic
I have never used plugins, but there is also Jorg's tool: https://github.com/jprante/elasticsearch-knapsack -- Ivan On Wed, Nov 19, 2014 at 11:27 PM, Mathew D mathew.degerh...@gmail.com wrote: Hi Ivan, Thanks for the quick response. We've got 5 shards per index, so with 2 replicas each

min_document_doc in nested aggregations

2014-11-20 Thread kazoompa
Hi, I have several aggregations each of which have their own inner aggregations. It seems that the 'min_document_doc' does not apply when their containing aggregation is itself empty. I presumed that because both level of aggregations use 'min_document_doc' there would be buckets for the

Re: If I use EC2 Discovery Plugin do I necessarily give internet access to my instances?

2014-11-20 Thread Norberto Meijome
Yes..but this might not be an option if your instance is in a private subnet...it also means handling all your IPS like this ( though in theory you don't need internal IPs, security group id/name would do as well...) - there r limits to how many rules you can add to a secgroup At the same

Re: Changing Analyzer behavior for hyphens - suggestions?

2014-11-20 Thread joergpra...@gmail.com
The whitespace tokenizer has the problem that punctuation is not ignored. I find the word_delimiter filter not working at all with whitespace, only with keyword tokenizer, with massive pattern matching which is complex and expensive :( Therefore I took the classic tokenizer and generalized the

Re: Getting file text content from mapper?

2014-11-20 Thread Raymond Giorgi
Also, this is the first line of what's posted along the river { index: {_index:resumes,_type:resume,_id:2158912}} Things can get truncated when they're as big as a Base64 encoded file :) On Wednesday, November 19, 2014 6:01:29 PM UTC-5, Raymond Giorgi wrote: Hey all, I'm hoping someone can

Marvel / ES query document count major discrepancy

2014-11-20 Thread Mike Seid
Howdy, I have been hitting my ES cluster pretty hard of recent and I think it is holding up great. In the last few days, I have noticed a major discrepancy in the document count that Marvel shows versus that of doing a _count query of the actual ES cluster. Marvel is reporting about 43.9M

Native script unable to get values, perhaps because it's a child doc? ES v1.1.1

2014-11-20 Thread Jonathan Foy
Hello I have a native script that I'm using to score/sort queries and it is not working properly for one of my three types. All three types have the same nested field, and I'm using the script to check values and score/sort by an externally defined order. However, for one of the three types

Re: Uncertain field types when extracting fields from getSource() (java api)

2014-11-20 Thread Simon Brandhof
A workaround is to cast the value into Number and then to call Number#longValue(). -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to

Re: Deleted indices keep coming back w/ 1.4.0

2014-11-20 Thread Mark Walkom
That's unlikely to be a bug, the only time ES will recreate an index is if it finds dangling data. Are your indexes created automatically? How is your data sent to ES? Is it possible that there is some data that is slower reaching ES than others and so the time difference causes this to happen?

Re: Is Elasticsearch also supported on AIX and HP Itanium 11.31

2014-11-20 Thread Mark Walkom
Depends what you mean by supported. I have seen comments of people running it on AIX, but I don't think it is officially supported. On 21 November 2014 00:37, Gaurav gupta gupta.gaurav0...@gmail.com wrote: Is Elasticsearch also supported on AIX and HP Itanium 11.31. I didn't find this

[ANN] it’s {on}: announcing our first user conference – elastic{on}15

2014-11-20 Thread Mark Walkom
http://www.elasticsearch.org/blog/its-on-announcing-our-first-user-conference-elasticon15 Shay Banon November 20, 2014 It’s been a little over two years since we formed a company around Elasticsearch, and the engagement with our community, users, and customers has taken on a life of its own.

query timing out

2014-11-20 Thread Warner Onstine
Hi all, hoping to get some help with this. I am trying to retrieve the latest tweet by a person. I'm using the javascript library. Using the elastic.js library to help build a query. Here is the query generated: {query: {match: {talent_id:{query:546e50b989fe347230c4}}},

Re: Native script unable to get values, perhaps because it's a child doc? ES v1.1.1

2014-11-20 Thread Shiwen Cheng
Hi, did you index the field you want to use in the native script? Shiwen On Thursday, 20 November 2014 11:38:30 UTC-8, Jonathan Foy wrote: Hello I have a native script that I'm using to score/sort queries and it is not working properly for one of my three types. All three types have the

Odd behavior of bulk loading speed - good riddle?

2014-11-20 Thread Christopher Ambler
So this has me perplexed. I have a bulk data loading job that creates an upsert statement and batches 500 of them in a bulk operation using the _bulk interface. I send the bulk insert via HTTP (on 9200) and wait for the response before sending the next one, which I do immediately. I do not

Re: Odd behavior of bulk loading speed - good riddle?

2014-11-20 Thread Christopher Ambler
The statement, if that helps (this is a line of PHP, hence the $ variables): {\script\ : \ctx._source.auctionid=$auctionID; ctx._source.auctiontype=$auctionType; ctx._source.auctionstatus=$auctionStatus; ctx._source.auctionprice=$auctionPrice; ctx._source.auctionendtime='$auctionEndTime';

Increased query count after moving to nested documents

2014-11-20 Thread Ivan Brusic
We have always indexed nested documents, but never fully used them since issue 3022 is still outstanding. Finally made the move to actually filtering documents at the nested level. Tracking metrics with graphite/grafana, I noticed immediately that the active/current query count is much higher

Re: Native script unable to get values, perhaps because it's a child doc? ES v1.1.1

2014-11-20 Thread Jonathan Foy
Yep, and I can search on it in other queries. After testing most of the afternoon, I finally seem to have gotten it to work by pulling the field using the full name, including the nested path: Long value = docFieldLongs(nestedPath.propertyName).getValue(); This seems to work in all three

Re: Documentation for internals and architecture of Elasticsearch

2014-11-20 Thread joergpra...@gmail.com
Look at the videos from Berlin Buzzwords 2011 and 2012 http://www.elasticsearch.org/videos/page/3/ They are a great intro Jörg On Thu, Nov 20, 2014 at 6:13 AM, Rahul Khengare rahulk1...@gmail.com wrote: Hi All, When we provides documents or data objects to Elasticsearch using REST APIs.

Re: Getting file text content from mapper?

2014-11-20 Thread David Pilato
So that’s the expected behavior. Mapper attachment only index the content but never modify the _source document.. If you want to see extracted text, you need to store the field and explicitly ask for it at query time using fields option. Have a look here:

elasticsearch JAVA version and JDK version?

2014-11-20 Thread Thong Bui
Hi all, I am new to elasticsearch java API and have some questions? 1) What is the minimum JDK to be used with elasticsearch java API version 1.4.0? 2) Is there a version of elasticsearch java API that works with JDK 1.6.0? Thank you! Thong -- You received this message because you are

Re: elasticsearch JAVA version and JDK version?

2014-11-20 Thread Mark Walkom
1.4.X is 1.7u55 or 1.8u20 http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/setup.html#jvm-version You'd have to dig back through the older versions of the docs to find what is supported with Java 1.6.0, but I know 0.90.X was. On 21 November 2014 11:17, Thong Bui

Bool and And filter, which is faster?

2014-11-20 Thread Fei Xie
In this article http://www.elasticsearch.org/blog/all-about-elasticsearch-filter-bitsets/, it's saying bool is faster than add/or filters. But at that time it's elasticsearch 0.9. Is this still the truth? Thanks! -- You received this message because you are subscribed to the Google Groups

RE: 1.4.0 data node can't join existing 1.3.4 cluster

2014-11-20 Thread Christian Hedegaard
FYI, I have found a solution that works (at least for me). I’ve got a small cluster for testing, only 4 v1.3.5 nodes. What I’ve done is bring up 4X new v1.4.0 nodes as data-only machines. In the yaml I added a line to point the nodes via unicast explicitly to the current master:

Why ES node starts recovering all the data from other nodes after reboot?

2014-11-20 Thread Konstantin Erman
I work on an experimental cluster of ES nodes running on Windows Server machines. Once in a while we have a need to reboot machines. The initial state - cluster is green and well balanced. One machine is gracefully taken offline and then after necessary service is performed it comes back

Re: Why ES node starts recovering all the data from other nodes after reboot?

2014-11-20 Thread Mark Walkom
You should disable allocation before you reboot, that will save a lot of shard shuffling - http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/setup-upgrade.html#rolling-upgrades On 21 November 2014 13:48, Konstantin Erman kon...@gmail.com wrote: I work on an experimental

Root type mapping not empty after parsing

2014-11-20 Thread samatha kankipati
Hi I am trying to upgrade from ES 0.90.2 to 1.4.0 I am using java api to set the settings of index, this was working fine with 0.90.2 client.admin().indices().prepareCreate(indexName).setSettings(_).execute().actionGet() here are my settings: { index: { analysis: { analyzer: {

Re: Why ES node starts recovering all the data from other nodes after reboot?

2014-11-20 Thread Yves Dorfsman
If you do disable allocation before you reboot a node and a client writes to a shard that had a replica on that node, does the entire replica gets copied when the node come up? Or does it get just updated? On Thursday, 20 November 2014 19:52:26 UTC-7, Mark Walkom wrote: You should disable

Re: priming data for a new node

2014-11-20 Thread Yves Dorfsman
So if a shard has been updated since the data copy, will it copy the entire shard, or just update it? On Wednesday, 19 November 2014 23:34:01 UTC-7, Mark Walkom wrote: It doesn't copy everything, only what it needs to balance the shards. On 20 November 2014 17:20, Yves Dorfsman

Re: Why ES node starts recovering all the data from other nodes after reboot?

2014-11-20 Thread Mark Walkom
It will enter recovery where it syncs at the segment level from the current primary, then the translog gets shipped over and (re)played, which brings it all up to date. On 21 November 2014 14:51, Yves Dorfsman y...@zioup.com wrote: If you do disable allocation before you reboot a node and a

Re: Why ES node starts recovering all the data from other nodes after reboot?

2014-11-20 Thread Nikolas Everett
The thing is that this is a disk level operation. It pretty much rsyncs the files from the current master shard to the node when it comes back online. This would be OK if the replica shards matched the master but that is only normally the case if the shard was moved to the node after it was mostly

understaning terms syntax

2014-11-20 Thread GX
Hi All Im having the following scenario (elasticsearch 1.0): the query query: { term: { ac: 3A822F3B-3ECF-4463-98F86DF6DE28EC5C } } yields no results but this works query: { query_string : { default_field : ac, query :