Re: Problem with keeping in sync Elasticsearch across two data centers

2014-02-25 Thread Amit Soni
Thanks so much everyone for sharing your thoughts! -Amit. On Sun, Feb 23, 2014 at 10:24 AM, Hariharan Vadivelu harii...@gmail.comwrote: I think with current ES version you have 3 options. - Use the great snapshot and restore feature to snapshot from a DC and restore in the other one -

Re: upgrade to elasticsearch 1.0 now ClassCastException: class ElasticSearch090PostingsFormat

2014-02-25 Thread Alexander Reelsen
Hey, You dont have by accident two lucene versions in your project, right? Would like to know more about that class cast exception, but this is the most verbose output you get I fear? --Alex On Tue, Feb 25, 2014 at 8:03 AM, Kevin J. Smith ke...@rootsmith.ca wrote: Hi, I am using

When a Java process with ES Client terminate, does it automatically close the connection?

2014-02-25 Thread Arinto Murdopo
Hi all, A question on Java API for ElasticSearch. When a Java process with ES Client terminate, does it automatically close the connection? Or we should explicitly close the connection to save the resources? Best regards, Arinto -- You received this message because you are subscribed to

Re: engine failure, message [OutOfMemoryError[unable to create new native thread]]

2014-02-25 Thread T Vinod Gupta
thanks for your response Jörg, somehow missed replying earlier. for some strange reason, the max threads setting was reset when i did a reboot.. so i had to set it back to a high number. On Tue, Feb 11, 2014 at 12:10 AM, joergpra...@gmail.com joergpra...@gmail.com wrote: Your user ran out of

long GC pauses but only one 1 host in the cluster

2014-02-25 Thread T Vinod Gupta
im seeing this consistently happen on only 1 host in my cluster. the other hosts don't have this problem.. what could be the reason and whats the remedy? im running ES on a ec2 m1.xlarge host - 16GB ram on the machine and i allocate 8GB to ES. e.g. [2014-02-25 09:14:38,726][WARN ][monitor.jvm

Re: upgrade to elasticsearch 1.0 now ClassCastException: class ElasticSearch090PostingsFormat

2014-02-25 Thread joergpra...@gmail.com
Maybe there are two Elasticsearch jar versions in the class path. Jörg -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to

Re: long GC pauses but only one 1 host in the cluster

2014-02-25 Thread Mark Walkom
Depends on a lot of things; java version, ES version, doc size and count, index size and count, number of nodes. What are you monitoring the cluster with as well? Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email: ma...@campaignmonitor.com web: www.campaignmonitor.com On 25

Re: Is there a difference between indexing envelopes or polygons.

2014-02-25 Thread Nicolas THOMASSON
Hey thanks a lot ! Now it works just fine. I didn't see that coming, I thought ES was complaining if envelope's coordinates was reverted. My bad... Nicolas Le lundi 24 février 2014 15:58:24 UTC+1, Alexander Reelsen a écrit : Hey, if there is an error, can you please open a github issue?

fragment_size not used for simple queries

2014-02-25 Thread Neamar Tucote
Hello, Using the highlight API for a simple query like this: curl localhost:9200/company_52fb7b90c8318c4dc86b/_search -d'{ fields: [], query: { filtered: { query: { match: { _all: i do not } } } }, highlight: { fields: {

Re: long GC pauses but only one 1 host in the cluster

2014-02-25 Thread joergpra...@gmail.com
Is this node showing more activity than others? What kind of workload is this, indexing/search? Are caches used, for filter/facets? Full GC runs caused by CMS Old Gen may be a sign that you are close at memory limits and need to add nodes, but it could also mean a lot of other different things.

Re: Problem with keeping in sync Elasticsearch across two data centers

2014-02-25 Thread Dario Rossi
I will try the tribe node feature, even if I don't understand it completely... but I think it deserves some experimentation Il giorno martedì 25 febbraio 2014 08:05:05 UTC, amit.soni ha scritto: Thanks so much everyone for sharing your thoughts! -Amit. On Sun, Feb 23, 2014 at 10:24 AM,

Expanding terms

2014-02-25 Thread Petr Janský
Hello, I'trying to find a way how to: 1. expand a term - get all words and count that are relevant for a term(s) 2. get relevant words for a query - list of all words that are highlighted 3. get phrases by word - e.g. for word war = world war, second word war, the second

Is consistent scoring across 2 documents that match either 1 of 2 properties possible?

2014-02-25 Thread Michelle May
Hi, We've been struggling with this for a few days now so I think it is time to pick the expert brains, although probably best to explain by delving straight into an example: 1) Assuming we have the following document: { id: /people/person1, dob: 1980-04-12, fullname: Mickey

Re: ES doesn't take into account field level boost in prefix query over catch-all field?

2014-02-25 Thread Maxim Vorobyov
Hi All. I have the same issue and would highly appreciate answer. Many Thanks! Maxim -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to

Re: Is there a difference between indexing envelopes or polygons.

2014-02-25 Thread Alexander Reelsen
Hey, the problem here is that elasticsearch cant tell by itself if the envelope borders need to be reverted or not... maybe you want/need such an envelope in your calculations. Hard to tell from a machine perspective :-) --Alex On Tue, Feb 25, 2014 at 10:38 AM, Nicolas THOMASSON

Re: [Hadoop] Any goos tut to start with ?

2014-02-25 Thread Yann Barraud
Hi Costin, I did not see the video. It's a good starting point. I 'm not a big fan of videos though. I might reproduce it using Hortonworks sandbox. Cordialement, Yann Barraud 2014-02-24 13:35 GMT+01:00 Costin Leau costin.l...@gmail.com: Have you looked at the video? It does exactly that.

Re: Problem with keeping in sync Elasticsearch across two data centers

2014-02-25 Thread Dario Rossi
From the docs it is not clear if having two clusters with the same indexes, a indexing operation will have effect on both... There is a line that leaves me bit doubtful: However, there are a few exceptions: - The merged view cannot handle indices with the same name in multiple clusters.

dumping index is slow as hell

2014-02-25 Thread Attila Bukor
Hey guys, I needed to migrate an index to a new cluster and after a lot of hesitating I decided to give it a try to taskrabbit's elasticsearch-dump: https://github.com/taskrabbit/elasticsearch-dump I tested it with 10k documents, which worked fine, so I decided to migrate the real data to the

Re: Aggregation on parent/child documents

2014-02-25 Thread Augusto Uehara
We run 4 instances of ES 1.0.0 using 30G for JVM. We run 64-bit OpenJDK 1.7.0_25 on ubuntu servers. $ ulimit -a core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending

Re: fragment_size not used for simple queries

2014-02-25 Thread Luca Cavanna
It would be useful if you can post a complete recreation, mappings included. Which highlighter are you using? On Tuesday, February 25, 2014 10:39:10 AM UTC+1, Neamar Tucote wrote: Hello, Using the highlight API for a simple query like this: curl

How can I do date-calculation/conversion in an MVEL script in ES 1.0.0?

2014-02-25 Thread h . b . wassenaar
Hello, I'm considering upgrading from 0.90.3 to 1.0.0, but I've hit a snag with one of the MVEL scripts I use to update documents through the update api. My update-script uses Joda to parse/format/manipulate dates, but it appears that Joda is no longer available to MVEL scripts in version

[new project using es] Elasticboard - tracking github data

2014-02-25 Thread Mihnea Dobrescu-Balaur
Hello again, Using the recently released github river[1], I'm working on an open source dashboard for keeping track of github projects. It's in the working protoype state right now and I'm trying to figure out what kind of information is desired and relevant. The idea is that people/orgs who

Re: the document payload of the Delete api

2014-02-25 Thread Binh Ly
Unfortunately, you'll have to GET it first. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion

Re: DateRange aggregation semantics - include_lower/include_upper?

2014-02-25 Thread Binh Ly
Yes, you are correct. The from is inclusive, and the to is exclusive. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com.

Re: Elasticsearch 1.0.0 is now GA

2014-02-25 Thread Tony Su
In principle, I agree with everything you describe about best practice but those practices become more important only when you're managing larger numbers of nodes. For those who manage only 5 nodes, the balance may swing in favor of just edit each machine's config directly instead of a more

Re: Elasticsearch 1.0.0 is now GA

2014-02-25 Thread Tony Su
One other issue. I have never been able to deploy an elasticsearch.yml which names the cluster node the same as the machine hostname despite the suggestions in another thread. It just won't work, and based on another thread I strongly suspect the underlying Java code implements single quotes

Re: nodes spend all time in RamUsageEstimator after upgrade to 0.90.11

2014-02-25 Thread Binh Ly
This is a known issue and will be fixed shortly. For now, what you can do is run _optimize on all your indexes and set max_num_segments to 1, like below. Note that this may take a while depending on the size of your indexes. http://localhost:9200/_optimize?max_num_segments -- You received

Relation Between Heap Size and Total Data Size

2014-02-25 Thread Umutcan
Hi, I created a Elasticsearch cluster with 4 instance. Elasticsearch 0.90.10 is running all of them. Heap size is 6 GB for all the instances, so total heap size is 24 GB. I have 5 shard for each index and each shard has 1 replica. A new index is created for every day, so all indices have

Re: Relation Between Heap Size and Total Data Size

2014-02-25 Thread Randy
Probably low on disc on at least one machine. Monitor disc usage. Also look in the logs and find out what error you are getting. Report back. Sent from my iPhone On Feb 25, 2014, at 7:25 AM, Umutcan umut...@gamegos.com wrote: Hi, I created a Elasticsearch cluster with 4 instance.

Re: nodes spend all time in RamUsageEstimator after upgrade to 0.90.11

2014-02-25 Thread Benoît
I forgot to say that one consequence is that the 'head' plugin interface remain empty. The following request timeout : * _status * stats?all=true * _nodes How to have some information on the cluster in this conditions ? Benoît -- You received this message because you are subscribed to

Re: Elasticsearch 1.0.0 is now GA

2014-02-25 Thread InquiringMind
I always start Elasticsearch from within my own wrapper script, es.sh. Inside this wrapper script is the following incantation: NODE_OPT=-D*es.node.name*=$(uname -n | cut -d'.' -f1) This is verified to work on Linux, Mac OS X, and Solaris (at least). I then pass $NODE_OPT as a command-line

Re: the document payload of the Delete api

2014-02-25 Thread InquiringMind
And note that if you GET it and save the version number, and then pass the version number into the DELETE, you can be sure it will be deleted only if nobody else updated it in the meantime. This all works so much better in Java than in scripts + curl. Brian On Tuesday, February 25, 2014

Re: Default analyzer when the given analyzer not found?

2014-02-25 Thread InquiringMind
Based on posts to this newsgroup early on in my usage of ES (over a year now!), I used to put the following in my elasticsearch.yml file. Any field that was not explicitly assigned an analyzer and that was deemed by ES to be a string would pick up English snowball analyzer with no stop words

Re: Default analyzer when the given analyzer not found?

2014-02-25 Thread Frederic Meyer
Ah yes, via the default in the yaml configuration file, of course. I'll give that a try, thanks! It is a pity though that the default analyzer doesn't seem to do his job of processing all unmatched document as far as the _analyze field is concerned. Thanks Fred P.S. : I do understand your

Re: nodes spend all time in RamUsageEstimator after upgrade to 0.90.11

2014-02-25 Thread Benoît
Thank you Binh Ly, On Tuesday, February 25, 2014 4:25:59 PM UTC+1, Binh Ly wrote: This is a known issue and will be fixed shortly. For now, what you can do is run _optimize on all your indexes and set max_num_segments to 1, like below. Note that this may take a while depending on the size

Re: [Book] Mastering ElasticSearch Review

2014-02-25 Thread Ivan Brusic
I purchased the book when Packt was having a $5 ebook sale a couple of months ago. Did not really need the book, but it was cheap and I wanted to support the author who has posted on the mailing list in the past. Overall a decent book, recommended for anyone getting started with Elasticsearch. My

Re: dumping index is slow as hell

2014-02-25 Thread joergpra...@gmail.com
Have you benchmarked your cluster? How many docs can you index per second with bulk indexing? Jörg -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to

Re: upgrade to elasticsearch 1.0 now ClassCastException: class ElasticSearch090PostingsFormat

2014-02-25 Thread joergpra...@gmail.com
Not sure, but maybe you have jars with ES classes in the plugins folder that went astray? IIRC I saw these kind of errors and it was a plugin with dependencies that were not compatible. If that is your code you can hack on, last resort is printing the current classpath in the log file... Jörg

Compute TF/IDF across indexes

2014-02-25 Thread Luiz Guilherme Pais dos Santos
Hi, I'm trying to search across multiple indexes and I couldn't understand the result of the TF/TDF function. I didn't expect for the indexes where the term is more frequent to get penalized. Here follows an example: https://gist.github.com/luizgpsantos/9216108 When searching for the term alice

Re: Compute TF/IDF across indexes

2014-02-25 Thread Ivan Brusic
I have never tried or looked at the code, but off the top of my head perhaps the DFS query type would work: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-search-type.html#dfs-query-then-fetch Since the DFS query type calculates the TF/IDF values based on the

Re: Removing elasticsearch logs

2014-02-25 Thread Binh Ly
There is currently discussion around this: https://github.com/elasticsearch/elasticsearch-marvel/issues/95 But in the meantime, try this to see if it helps: https://github.com/elasticsearch/curator -- You received this message because you are subscribed to the Google Groups elasticsearch

Re: Put mapping documentation -- What options are available? Specifically, how to store a property but without indexing it?

2014-02-25 Thread Daniel Winterstein
Dear Hariharan, Alex, Luke, My apologies. You're quite right. The information is there -- I just didn't read far enough down. Thank you for your help persistence. Best regards, - Daniel -- You received this message because you are subscribed to the Google Groups elasticsearch group. To

Re: ES 1.0.0 Source filtering using the Java API

2014-02-25 Thread Dan
Thanks for your response.I can't see the method 'setFetchSource' in the Client class. Are you sure that is in 1.0.0? On Tuesday, February 25, 2014 8:41:37 PM UTC, Binh Ly wrote: Yes you can use the client.setFetchSource() method: SearchResponse response = client.prepareSearch(index)

Sorting date fields

2014-02-25 Thread Adrian
Hi all, I have a question on how sorting during queries works in elasticsearch. I have an index with a custom date format field, on which the sort is applied. When quering the index for a given keywork, results are provided with the given sort. However, I've observed that some documents are

Re: scalability and creating 1 index per user

2014-02-25 Thread Nikolas Everett
On Tue, Feb 25, 2014 at 4:46 PM, ESUser neerav...@gmail.com wrote: Hi All, I am exploring elastic search to create one index per user instead of one big index for all the users. Each index would be about 6G. I am wondering if anyone has tried it and how would it scale? I couldn't find that

Re: scalability and creating 1 index per user

2014-02-25 Thread Mark Walkom
20K is a lot of indexes, probably too many as ES will need to maintain state about each of those in memory which could mean you have nothing left for caching indexed data! You might want to look at http://www.elasticsearch.org/blog/customizing-your-document-routing/instead, that way you can reduce

Re: Sorting date fields

2014-02-25 Thread joergpra...@gmail.com
ES loads the values of the fields to sort on into memory cache. You should update to 1.0.0, maybe you hit a bug that has been fixed. Jörg -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails

Re: Elasticsearch 1.0.0 is now GA

2014-02-25 Thread Ivan Brusic
I do not use quotes at all. Simply: node.name: ${HOSTNAME} -- Ivan On Tue, Feb 25, 2014 at 7:56 AM, InquiringMind brian.from...@gmail.comwrote: I always start Elasticsearch from within my own wrapper script, es.sh. Inside this wrapper script is the following incantation:

Re: ES 1.0.0 Source filtering using the Java API

2014-02-25 Thread Binh Ly
Hmmm, can please double-check. I can see it from the tests here: https://github.com/elasticsearch/elasticsearch/blob/v1.0.0/src/test/java/org/elasticsearch/search/source/SourceFetchingTests.java -- You received this message because you are subscribed to the Google Groups elasticsearch group.

Re: Sorting date fields

2014-02-25 Thread Adrian
On Tue, Feb 25, 2014 at 11:11:13PM +0100, joergpra...@gmail.com wrote: Jörg, ES loads the values of the fields to sort on into memory cache. Yes, I've read that - is it known when these caches are flushed? You should update to 1.0.0, maybe you hit a bug that has been fixed. I'll do that. I

Re: ES 1.0.0 Source filtering using the Java API

2014-02-25 Thread Dan
Yes, I can see it. Thanks. On 25 Feb 2014, at 22:23, Binh Ly binhly...@yahoo.com wrote: Hmmm, can please double-check. I can see it from the tests here: https://github.com/elasticsearch/elasticsearch/blob/v1.0.0/src/test/java/org/elasticsearch/search/source/SourceFetchingTests.java --

Re: upgrade to elasticsearch 1.0 now ClassCastException: class ElasticSearch090PostingsFormat

2014-02-25 Thread Kevin J. Smith
Many, many, way to many, hours later it came down to what everyone was suggesting was the problem in the first place: an old elasticsearch jar sitting in an abandoned directory but still scanned by tomcat's class loader. Thanks for your help. -- You received this message because you are

Re: Sorting date fields

2014-02-25 Thread joergpra...@gmail.com
For the cache, see http://www.elasticsearch.org/guide/en/elasticsearch/reference/1.x/index-modules-fielddata.html By default, the field cache size is unbounded, and does not expire. For sort, it means that each field to sort is examined, all values of the field are loaded, so the in-memory

copy_to objects?

2014-02-25 Thread asanderson
Does copy_to work with objects? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web

EsRejectedExecutionException when searching date based indices.

2014-02-25 Thread Alex Clark
Hello all, I’m getting failed nodes when running searches and I’m hoping someone can point me in the right direction. I have indices created per day to store messages. The pattern is pretty straight forward: the index for January 1 is messages_20140101, for January 2 is messages_20140102

Need help with a large cluster restart.

2014-02-25 Thread Search User
I have 20 ES data nodes and 10 master nodes in my cluster. I have 6 minimum master nodes for the cluster to function. I wanted to know if any one knows of a correct way to restart a large cluster. I see different results on each cluster restart. Some times, some of the shards are in Unassigned

Lost index metadata and overwriting pre-existing index files

2014-02-25 Thread Danny Berger
Hi - I recently experienced some surprising elasticsearch behavior and I'd appreciate some verification on the whys behind what we saw. Basically, during a cluster restart we lost some index metadata causing those indices to not be realized and loaded from the data nodes (raw index files still

Re: Put mapping documentation -- What options are available? Specifically, how to store a property but without indexing it?

2014-02-25 Thread Ivan Brusic
Luke? :) On Tue, Feb 25, 2014 at 1:09 PM, Daniel Winterstein daniel.winterst...@gmail.com wrote: Dear Hariharan, Alex, Luke, My apologies. You're quite right. The information is there -- I just didn't read far enough down. Thank you for your help persistence. Best regards, - Daniel

Re: Compute TF/IDF across indexes

2014-02-25 Thread Luiz Guilherme Pais dos Santos
Hi Ivan, The DFS query then fetch worked very well! Thank you! Cheers, Luiz Guilherme On Tue, Feb 25, 2014 at 5:15 PM, Ivan Brusic i...@brusic.com wrote: I have never tried or looked at the code, but off the top of my head perhaps the DFS query type would work:

Re: Need help with a large cluster restart.

2014-02-25 Thread Mark Walkom
Some of these will help - http://gibrown.wordpress.com/2013/12/05/managing-elasticsearch-cluster-restart-time/ Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email: ma...@campaignmonitor.com web: www.campaignmonitor.com On 26 February 2014 11:57, Search User feedwo...@gmail.com

Interesting question on Transaction Log record mutability

2014-02-25 Thread Yuri Panchenko
Hi guys, If I turn off automatic indexing and refreshing, and continually execute partial updates on the same document (say 100 times), do the updates change the same record in the transaction log or will it create 100 changes? The reason I'm curious is because when I ask ES to index (or

Re: Kibana: showing a ratio

2014-02-25 Thread Andrew Vine
Ok, I'll check it out On Tuesday, 25 February 2014 00:17:20 UTC+2, Binh Ly wrote: Unfortunately not at the moment. But if you're up to it, you can probably easily write a custom panel that will do this for you. -- You received this message because you are subscribed to the Google Groups

Re: Compute TF/IDF across indexes

2014-02-25 Thread Ivan Brusic
Great, I am glad that it worked. I do not use multi-index searches, so I was not sure if it would. Good to know that shards from different indices can be aggregated with DFS queries. -- Ivan On Tue, Feb 25, 2014 at 6:04 PM, Luiz Guilherme Pais dos Santos luizgpsan...@gmail.com wrote: Hi

Re: Relation Between Heap Size and Total Data Size

2014-02-25 Thread Umutcan
There is enough space on every machine. I looked in the logs and find out that org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: NativeFSLock@/ebs/elasticsearch/elasticsearch-0.90.10/data/elasticsearch/nodes/0/indices/logstash-2014.02.26/0/index/write.lock is what

Histogram of high-cardinality aggregate

2014-02-25 Thread Mike Kaplinskiy
Hey folks, Playing around with the aggregation API, I was wondering whether this is possible. Taking the example at http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-bucket-nested-aggregation.html , how would I get the histogram of the minimum price

Re: Text Categorization in ES

2014-02-25 Thread prashy
Hi All, To be specific I want a query like : Searching for Laptop will automatically give result for Dell, Sony, HP, Lenevo, Samsung... as well. As lingo3g is used for clustering the documents so it will store the reference for above terms as well. For that I have installed Carrot2 and Lingo3g

Re: EsRejectedExecutionException when searching date based indices.

2014-02-25 Thread David Pilato
You are mixing nodes and shards, right? How many elasticsearch nodes do you have to manage your 7300 shards? Why did you set 20 shards per index? You can increase the queue size in elasticsearch.yml but I'm not sure it's the right thing to do here. My 2 cents -- David ;-) Twitter : @dadoonet /