Re: ElasticSearch Version problem
There seems to be some problem when indexing Mysql data in ES. These are the logs of ES: [][DEBUG][NodeClient ] after bulk [18650] [succeeded=93255] [failed=0] [5ms] [][DEBUG][NodeClient ] before bulk [18654] of 5 items, 2407 bytes, 1 outstanding bulk requests [][DEBUG][NodeClient ] after bulk [18651] [succeeded=93260] [failed=0] [3ms] [][DEBUG][NodeClient ] after bulk [18653] [succeeded=93265] [failed=0] [3ms] [][DEBUG][NodeClient ] before bulk [18655] of 5 items, 2381 bytes, 1 outstanding bulk requests [][DEBUG][NodeClient ] before bulk [18656] of 5 items, 2357 bytes, 2 outstanding bulk requests [][DEBUG][NodeClient ] before bulk [18657] of 5 items, 2399 bytes, 3 outstanding bulk requests [][DEBUG][NodeClient ] after bulk [18656] [succeeded=93270] [failed=0] [3ms] [][DEBUG][NodeClient ] before bulk [18658] of 5 items, 2242 bytes, 3 outstanding bulk requests [][DEBUG][NodeClient ] after bulk [18655] [succeeded=93275] [failed=0] [5ms] [][DEBUG][NodeClient ] before bulk [18659] of 5 items, 2316 bytes, 3 outstanding bulk requests [][DEBUG][NodeClient ] after bulk [18658] [succeeded=93280] [failed=0] [2ms] [[DEBUG][NodeClient ] after bulk [18657] [succeeded=93285] [failed=0] [4ms] [][DEBUG][NodeClient ] before bulk [18660] of 5 items, 2336 bytes, 2 outstanding bulk requests [][DEBUG][NodeClient ] after bulk [18654] [succeeded=93290] [failed=0] [7ms] [][DEBUG][NodeClient ] before bulk [18661] of 5 items, 2202 bytes, 2 outstanding bulk requests [][DEBUG][NodeClient ] after bulk [18659] [succeeded=93300] [failed=0] [3ms] [][DEBUG][NodeClient ] before bulk [18662] of 5 items, 2234 bytes, 1 outstanding bulk requests [][DEBUG][NodeClient ] after bulk [18660] [succeeded=93295] [failed=0] [2ms] [][DEBUG][NodeClient ] before bulk [18663] of 5 items, 2176 bytes, 2 outstanding bulk requests [][DEBUG][NodeClient ] before bulk [18664] of 5 items, 2332 bytes, 3 outstanding bulk requests [][DEBUG][NodeClient ] after bulk [18661] [succeeded=93305] [failed=0] [7ms] [][DEBUG][NodeClient ] before bulk [18665] of 5 items, 2275 bytes, 3 outstanding bulk requests [][DEBUG][NodeClient ] after bulk [18662] [succeeded=93310] [failed=0] [8ms] [][DEBUG][NodeClient ] before bulk [18666] of 5 items, 2358 bytes, 3 outstanding bulk requests [][DEBUG][NodeClient ] after bulk [18664] [succeeded=93315] [failed=0] [4ms] [][DEBUG][NodeClient ] after bulk [18665] [succeeded=93320] [failed=0] [2ms] [][DEBUG][NodeClient ] before bulk [18667] of 5 items, 2307 bytes, 2 outstanding bulk requests [][DEBUG][NodeClient ] after bulk [18663] [succeeded=93325] [failed=0] [6ms] [][DEBUG][org.xbib.elasticsearch.river.jdbc.strategy.simple.SimpleRiverSource] merged 93338 rows Though the total count looks fine but I can see count in my index which is exactly half of what it is showing(93338). Any idea what is wrong here ? Also, I can see some outstanding bulk requests which I think is creating problem: Here are the params which I'm using in my jdbc: fetchsize : 100, maxconcurrrentbulkactions : 5, maxbulkactions : 5, max_retries : 15, autocommit : true I tried changing these params to avoid above outstanding bulk requests but no help. Thanks, On Tue, May 20, 2014 at 11:32 PM, joergpra...@gmail.com joergpra...@gmail.com wrote: 1. I'd be thankful if you can post an issue on the JDBC river github page with a (small) example where you can demonstrate that JDBC river is not able to index all data. In my own projects I index millions of rows without skipping any data. There are many possible reasons why this can happen but I can try to track down such issues. A start would be to increase log level to DEBUG and follow the messages JDBC river is writing about the number of indexed documents. This can easily compared to the database rows (assuming you transfer the primary key of a table to the _id field) 2. MySQL functions that return binary data (non-UTF-8) can not be indexed into ES without base64 encoding. Maybe you should switch character set of the JDBC MySQL connection URL if you are unsure that you really get UTF-8 from the database. Jörg On Tue, May 20, 2014 at 7:00 PM, Mukul Gupta mukulnit...@gmail.comwrote: Yeah..I switched to JDBC 1.1.0.2 and it is now updating indexed docs but still few issues are there. 1. Though JDBC can now index docs but not all Mysql docs are getting index. I tried palying with maxbulkactions - the length of each bulk index request submitted maxconcurrrentbulkactions - the maximum number of concurrent bulk requests but not able to solve the issue. I thought
index paramerters
Hello, I need to index 100 000 documents with 1Mo. This is my configuration of ElasticSearch index: index: { type: doc, bulk_size: 100, number_of_shards : 5, number_of_replicas : 2 } I need to know what each parameters effect. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/bdf82941-c406-4f91-94cd-adddf35676ec%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Creating Index not with field _id, but with my field/column name.
Hi Guys, Can someone guide me, If I want to create the index with my field name not with _id column. looking for some help from your end. Regards Dharmendra -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ff83e4c7-341b-4bab-a845-f9fe702ba690%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Number of characters in field
Hi. I am trying to find a way to express a character count filter in a querystring, for instance: I need to find all documents with field subject that holds less than 20 chars. How would i do that in a querystring ? /David -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/4163e674-b9ec-4be1-b142-c5636521f573%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Aliases and percolators
Yes, that is correct. Martijn On 21 May 2014 02:34, Mark Dodwell m...@mkdynamic.co.uk wrote: Many thanks, that is a super clear answer. So, until that issue is addressed, am I correct in thinking I should do this when percolating an existing document: ``` curl http://0.0.0.0:9200/idx-1/docs/123/_percolate -d '{ filter: { term: { account_id: 1 } } }' ``` Thanks again. On Tuesday, May 20, 2014 2:18:02 AM UTC-7, Martijn v Groningen wrote: 1. Yes, the routing will be taken into during adding the percolator query. 2. At the moment only the routing will be taken into account, the filter will not be taken into account. I opened issue percolator against an alias with a filter: https://github.com/elasticsearch/elasticsearch/issues/6241 On 17 May 2014 03:54, Mark Dodwell ma...@mkdynamic.co.uk wrote: Consider an index `idx`, with a mapping for a single type `docs`. Consider aliases of the format `idx-{ACCOUNT-ID}`, with a term filter and a routing value set to the account id, like so: ``` $ curl http://0.0.0.0:9200/idx/_aliases?pretty=1 { idx : { aliases : { idx-1 : { filter : { term : { account_id : 1 } }, index_routing : 1, search_routing : 1 }, ... } } } ``` ### Questions 1. When indexing a percolator query, if you do that via an alias, will it respect the routing? ``` # will the alias routing from idx-1 apply to this operation? curl -XPUT http://0.0.0.0:9200/idx-1/.percolator/1 -d '{query:..., account_id:1}' ``` 2. When percolating an existing document from one of those aliases, will the routing and term filter from the alias be used when retrieving/checking the matching percolator documents? ``` # will the alias routing and term filter from idx-1 apply to the percolators? curl http://0.0.0.0:9200/idx-1/docs/123/_percolate ``` Any insight much appreciated. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/ msgid/elasticsearch/d2eb11ca-4ef2-467c-90c3-1d98e56c2ae0% 40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/d2eb11ca-4ef2-467c-90c3-1d98e56c2ae0%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- Met vriendelijke groet, Martijn van Groningen -- Met vriendelijke groet, Martijn van Groningen -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CA%2BA76TyxhcP46tDQ-z1fDePj7837Hk67xhOCo%3D-U9YXg4D-xuw%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Store Elasticsearch Indices in a revision/audit-proof way
Hey guys, in order to meet the german laws for logging, i got the order to store the elasticsearch indices in a revision/audit-proof way(Indices cannot be edited/changed after the storage). Are there any best practices or tips for doing such a thing?(maybe any plugins?) Thanks for your feedback. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ed95d3f8-9266-4ee4-a1a4-d3764b1150a4%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Store Elasticsearch Indices in a revision/audit-proof way
Yeah it looks like that this would do the job, thanks for response Am Donnerstag, 22. Mai 2014 10:40:19 UTC+2 schrieb Mark Walkom: You can set indexes to readonly - http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-update-settings.html Is that what you're after? Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email: ma...@campaignmonitor.com javascript: web: www.campaignmonitor.com On 22 May 2014 18:36, horst knete badun...@hotmail.de javascript:wrote: Hey guys, in order to meet the german laws for logging, i got the order to store the elasticsearch indices in a revision/audit-proof way(Indices cannot be edited/changed after the storage). Are there any best practices or tips for doing such a thing?(maybe any plugins?) Thanks for your feedback. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com javascript:. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ed95d3f8-9266-4ee4-a1a4-d3764b1150a4%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/ed95d3f8-9266-4ee4-a1a4-d3764b1150a4%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/e2776ff1-4dde-4e96-85b0-f19cd9ad6c9b%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Elasticsearch Facets + limit results
I'm trying to construct the following SQL query in Elasticsearch: SELECT companyId, COUNT(*) c FROM visits GROUP BY companyId ORDER BY c DESC LIMIT 2 I came up with the following JSON body for the query: { facets: { company: { filter: { term: { entityType: companypage } }, terms: { field: entityId, size: 2 } } } } When I use size: 2, I get the following result: facets: { company: { _type: terms missing: 0 total: 4 other: 0 terms: [{ term: 2 count: 3 }, { term: 20 count: 1 }] } } When I use size: 1, I get the following result: facets: { company: { _type: terms missing: 0 total: 4 other: 2 terms: [{ term: 2 count: 2 }] } } How is it possible that the count for term 2 is 3 in the first response, but 2 in the second response? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/45ccf14f-81f2-4a9e-b598-6eb120f46197%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Store Elasticsearch Indices in a revision/audit-proof way
Keep us up to date with your project, I'm sure there would be interested from others on a similar setup. Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email: ma...@campaignmonitor.com web: www.campaignmonitor.com On 22 May 2014 18:46, horst knete baduncl...@hotmail.de wrote: Yeah it looks like that this would do the job, thanks for response Am Donnerstag, 22. Mai 2014 10:40:19 UTC+2 schrieb Mark Walkom: You can set indexes to readonly - http://www.elasticsearch. org/guide/en/elasticsearch/reference/current/indices-update-settings.html Is that what you're after? Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email: ma...@campaignmonitor.com web: www.campaignmonitor.com On 22 May 2014 18:36, horst knete badun...@hotmail.de wrote: Hey guys, in order to meet the german laws for logging, i got the order to store the elasticsearch indices in a revision/audit-proof way(Indices cannot be edited/changed after the storage). Are there any best practices or tips for doing such a thing?(maybe any plugins?) Thanks for your feedback. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/ msgid/elasticsearch/ed95d3f8-9266-4ee4-a1a4-d3764b1150a4% 40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/ed95d3f8-9266-4ee4-a1a4-d3764b1150a4%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/e2776ff1-4dde-4e96-85b0-f19cd9ad6c9b%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/e2776ff1-4dde-4e96-85b0-f19cd9ad6c9b%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624Yy6uXAaDU_4bxyvKmezA7T5zB73sdF6V_HRPabLkb9UA%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: Store Elasticsearch Indices in a revision/audit-proof way
You have to add a facility to your middleware that can trace all authorized operations to your index (access, read, write, modify, delete) and you must write this to an append-only logfile with timestamps. If there is interest I could write such a plugin (assuming it can run in a trusted environment regarding authorization tokens) but I think best place is in a middleware (where an ES client runs in a broader application context e.g. transaction awareness). Jörg On Thu, May 22, 2014 at 10:36 AM, horst knete baduncl...@hotmail.de wrote: Hey guys, in order to meet the german laws for logging, i got the order to store the elasticsearch indices in a revision/audit-proof way(Indices cannot be edited/changed after the storage). Are there any best practices or tips for doing such a thing?(maybe any plugins?) Thanks for your feedback. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ed95d3f8-9266-4ee4-a1a4-d3764b1150a4%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/ed95d3f8-9266-4ee4-a1a4-d3764b1150a4%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoG7JxZW%3D_VqLxW01BUcuP8BA2j_MeiyLuZ-b4uTQmj3SQ%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
How to speed up indexing by using Python API
Hi all: Now , I am trying to index my logs by using the elasticsearch Python API, but I only get about 600 records/s indexing speed. but, on the same ES cluster, with the same data, logstash(redis - logstash - elasticsearch) can index data at the speed about 3000records/s. any advice on how to speed up indexing speed by using the Python API? thanks very much. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1b957d36-b7ad-4671-999a-de06aaa74407%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Store Elasticsearch Indices in a revision/audit-proof way
Hi Jörg, thanks for your offer. I will contact you if there´s a need for such an plugin in our company. Also i will keep you up to date if there´s breaking changes in our project. Am Donnerstag, 22. Mai 2014 10:55:44 UTC+2 schrieb Jörg Prante: You have to add a facility to your middleware that can trace all authorized operations to your index (access, read, write, modify, delete) and you must write this to an append-only logfile with timestamps. If there is interest I could write such a plugin (assuming it can run in a trusted environment regarding authorization tokens) but I think best place is in a middleware (where an ES client runs in a broader application context e.g. transaction awareness). Jörg On Thu, May 22, 2014 at 10:36 AM, horst knete badun...@hotmail.dejavascript: wrote: Hey guys, in order to meet the german laws for logging, i got the order to store the elasticsearch indices in a revision/audit-proof way(Indices cannot be edited/changed after the storage). Are there any best practices or tips for doing such a thing?(maybe any plugins?) Thanks for your feedback. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com javascript:. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ed95d3f8-9266-4ee4-a1a4-d3764b1150a4%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/ed95d3f8-9266-4ee4-a1a4-d3764b1150a4%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/4f2187bc-8d8e-4c3a-ae02-8eed30f3a175%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: how manage insert and update sql (river) ?
If you use the column name _id, you can control the ID of the ES document you created by SQL. If you do not use _id, a random doc ID is generated. See the README at https://github.com/jprante/elasticsearch-river-jdbc Jörg On Thu, May 22, 2014 at 11:43 AM, Tanguy Bernard bernardtanguy1...@gmail.com wrote: Hello, I would like to know a way to manage INSERT and UPDATE ? I am forced to delete and then re-index my data ? Maybe there are a way to index again but without duplicate my data (INSERT) ? Can you help with my problem ? I use this : PUT /_river/user/_meta { type : jdbc, jdbc : { url : jdbc:mysql://my_adress/my_index, user : my_user, password : my_password, sql : select name_user, firstname_user, id_user from user, index : my_index, type : user, max_bulk_requests : 5 } } Thanks in advance. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/2ae75640-f6e4-4b1f-9216-b4e667e53171%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/2ae75640-f6e4-4b1f-9216-b4e667e53171%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoF%3DA0D%2BbMCz91sQ7e0ksQTx%2BgWe7wes0hf-We9enH7ByQ%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
issue with client.admin().cluster().prepareGetSnapshots(...)
The call to prepareGetSnapshots(...) for getting a snapshot which is not existing, throws SnapshotMissingException. But I expect instead it should return response with a list of zero snapshots(getSnapshots()) or atleast isExist=false Is there any other way one can check the existence of a snapshot? Thanks -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3d5f4027-2db0-4da1-86a9-adbd117ec7d3%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
File Descriptors
Hi guys, Kind of stuck with a fresh installation of an ElasticSearch cluster. everything is installed file descriptor limits are set yet when I run curl -XGET http://10.0.8.62:9200/_nodes?os=trueprocess=truepretty=true; stats.txt I get process : { refresh_interval : 1000, id : 1200, max_file_descriptors : 4096, mlockall : true }, yet when I run ulimit -n I get 65535 ElasticSearch is installed as a service and running as root? Any idea why this is happening? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/53e168b5-8834-4b63-98ec-4c5a3080a91b%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: File Descriptors
What OS and how did you install it? (Running as root is a really bad idea by the way!) Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email: ma...@campaignmonitor.com web: www.campaignmonitor.com On 22 May 2014 20:19, Shawn Ritchie xritc...@gmail.com wrote: Hi guys, Kind of stuck with a fresh installation of an ElasticSearch cluster. everything is installed file descriptor limits are set yet when I run curl -XGET http://10.0.8.62:9200/_nodes?os=trueprocess=truepretty=true; stats.txt I get process : { refresh_interval : 1000, id : 1200, max_file_descriptors : 4096, mlockall : true }, yet when I run ulimit -n I get 65535 ElasticSearch is installed as a service and running as root? Any idea why this is happening? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/53e168b5-8834-4b63-98ec-4c5a3080a91b%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/53e168b5-8834-4b63-98ec-4c5a3080a91b%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624YGJTMJVwfZTea%3DJr4-K77hFvG6hYMoNfLWbTxBa3T4iw%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: File Descriptors
so this issue only occurs on server restart. If I had to restart elasticsearch service it would load the correct number of file descriptors. Regards Shawn On Thu, May 22, 2014 at 12:19 PM, Shawn Ritchie xritc...@gmail.com wrote: Hi guys, Kind of stuck with a fresh installation of an ElasticSearch cluster. everything is installed file descriptor limits are set yet when I run curl -XGET http://10.0.8.62:9200/_nodes?os=trueprocess=truepretty=true; stats.txt I get process : { refresh_interval : 1000, id : 1200, max_file_descriptors : 4096, mlockall : true }, yet when I run ulimit -n I get 65535 ElasticSearch is installed as a service and running as root? Any idea why this is happening? -- You received this message because you are subscribed to a topic in the Google Groups elasticsearch group. To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/SIKIWq40MKY/unsubscribe. To unsubscribe from this group and all its topics, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/53e168b5-8834-4b63-98ec-4c5a3080a91b%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/53e168b5-8834-4b63-98ec-4c5a3080a91b%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- Thanks Shawn Ritchie -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAP0uL7zMEv-dDh1N1ZhY%3D2k9SPDt8bdiBPNqjn%2BPgYHqU0oRrQ%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: File Descriptors
CentOS 6.5 and Java 1.7u55 On Thu, May 22, 2014 at 12:28 PM, Shawn Ritchie xritc...@gmail.com wrote: so this issue only occurs on server restart. If I had to restart elasticsearch service it would load the correct number of file descriptors. Regards Shawn On Thu, May 22, 2014 at 12:19 PM, Shawn Ritchie xritc...@gmail.comwrote: Hi guys, Kind of stuck with a fresh installation of an ElasticSearch cluster. everything is installed file descriptor limits are set yet when I run curl -XGET http://10.0.8.62:9200/_nodes?os=trueprocess=truepretty=true; stats.txt I get process : { refresh_interval : 1000, id : 1200, max_file_descriptors : 4096, mlockall : true }, yet when I run ulimit -n I get 65535 ElasticSearch is installed as a service and running as root? Any idea why this is happening? -- You received this message because you are subscribed to a topic in the Google Groups elasticsearch group. To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/SIKIWq40MKY/unsubscribe. To unsubscribe from this group and all its topics, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/53e168b5-8834-4b63-98ec-4c5a3080a91b%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/53e168b5-8834-4b63-98ec-4c5a3080a91b%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- Thanks Shawn Ritchie -- Thanks Shawn Ritchie -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAP0uL7xtHLT6%2ByjPMSENxQEZKYHJL_31ZuLW%2BUXNrO2GOJsSPw%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: File Descriptors
Did you use the RPMs? Where are you setting the ulimit? Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email: ma...@campaignmonitor.com web: www.campaignmonitor.com On 22 May 2014 20:30, Shawn Ritchie xritc...@gmail.com wrote: CentOS 6.5 and Java 1.7u55 On Thu, May 22, 2014 at 12:28 PM, Shawn Ritchie xritc...@gmail.comwrote: so this issue only occurs on server restart. If I had to restart elasticsearch service it would load the correct number of file descriptors. Regards Shawn On Thu, May 22, 2014 at 12:19 PM, Shawn Ritchie xritc...@gmail.comwrote: Hi guys, Kind of stuck with a fresh installation of an ElasticSearch cluster. everything is installed file descriptor limits are set yet when I run curl -XGET http://10.0.8.62:9200/_nodes?os=trueprocess=truepretty=true; stats.txt I get process : { refresh_interval : 1000, id : 1200, max_file_descriptors : 4096, mlockall : true }, yet when I run ulimit -n I get 65535 ElasticSearch is installed as a service and running as root? Any idea why this is happening? -- You received this message because you are subscribed to a topic in the Google Groups elasticsearch group. To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/SIKIWq40MKY/unsubscribe. To unsubscribe from this group and all its topics, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/53e168b5-8834-4b63-98ec-4c5a3080a91b%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/53e168b5-8834-4b63-98ec-4c5a3080a91b%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- Thanks Shawn Ritchie -- Thanks Shawn Ritchie -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAP0uL7xtHLT6%2ByjPMSENxQEZKYHJL_31ZuLW%2BUXNrO2GOJsSPw%40mail.gmail.comhttps://groups.google.com/d/msgid/elasticsearch/CAP0uL7xtHLT6%2ByjPMSENxQEZKYHJL_31ZuLW%2BUXNrO2GOJsSPw%40mail.gmail.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624bonFrf5V1eK%3D9Ug8SpSiC%3DWL3%3Dc6MRiOSmaK9Hkjz6KA%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: File Descriptors
No I did not use RPM used .tar for the installation process and my ulimit settings are in /etc/security/limits.conf * - nofile 65535 /etc/sysctl.conf fs.file-max = 512000 On Thu, May 22, 2014 at 12:37 PM, Mark Walkom ma...@campaignmonitor.comwrote: Did you use the RPMs? Where are you setting the ulimit? Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email: ma...@campaignmonitor.com web: www.campaignmonitor.com On 22 May 2014 20:30, Shawn Ritchie xritc...@gmail.com wrote: CentOS 6.5 and Java 1.7u55 On Thu, May 22, 2014 at 12:28 PM, Shawn Ritchie xritc...@gmail.comwrote: so this issue only occurs on server restart. If I had to restart elasticsearch service it would load the correct number of file descriptors. Regards Shawn On Thu, May 22, 2014 at 12:19 PM, Shawn Ritchie xritc...@gmail.comwrote: Hi guys, Kind of stuck with a fresh installation of an ElasticSearch cluster. everything is installed file descriptor limits are set yet when I run curl -XGET http://10.0.8.62:9200/_nodes?os=trueprocess=truepretty=true; stats.txt I get process : { refresh_interval : 1000, id : 1200, max_file_descriptors : 4096, mlockall : true }, yet when I run ulimit -n I get 65535 ElasticSearch is installed as a service and running as root? Any idea why this is happening? -- You received this message because you are subscribed to a topic in the Google Groups elasticsearch group. To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/SIKIWq40MKY/unsubscribe . To unsubscribe from this group and all its topics, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/53e168b5-8834-4b63-98ec-4c5a3080a91b%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/53e168b5-8834-4b63-98ec-4c5a3080a91b%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- Thanks Shawn Ritchie -- Thanks Shawn Ritchie -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAP0uL7xtHLT6%2ByjPMSENxQEZKYHJL_31ZuLW%2BUXNrO2GOJsSPw%40mail.gmail.comhttps://groups.google.com/d/msgid/elasticsearch/CAP0uL7xtHLT6%2ByjPMSENxQEZKYHJL_31ZuLW%2BUXNrO2GOJsSPw%40mail.gmail.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to a topic in the Google Groups elasticsearch group. To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/SIKIWq40MKY/unsubscribe. To unsubscribe from this group and all its topics, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624bonFrf5V1eK%3D9Ug8SpSiC%3DWL3%3Dc6MRiOSmaK9Hkjz6KA%40mail.gmail.comhttps://groups.google.com/d/msgid/elasticsearch/CAEM624bonFrf5V1eK%3D9Ug8SpSiC%3DWL3%3Dc6MRiOSmaK9Hkjz6KA%40mail.gmail.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- Thanks Shawn Ritchie -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAP0uL7zxoJe_W80wRU4Kqomv0MBzC67cwo8M%2B0WTmepvgs7CEg%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
how to get only aggregation values from elasticsearch
Hi, I want to get the average value of MEMORY field from my ES document. Below is the query I'm using for that. Here I'm getting the aggregation along with the hits Json also. Is there any way we can get the aggreation result only. Please suggest. POST /virtualmachines/_search { query : { filtered : { query : { match : { CLOUD_TYPE : CLOUDSTACK }}, filter : { range : { NODE_CREATE_TIME : { from : 2014-05-22 14:11:35, to : 2014-05-22 14:33:35 }} } } }, aggs : { memory_avg : { avg : { field : MEMORY } } } } *response* : { took: 2, timed_out: false, _shards: { total: 3, successful: 3, failed: 0 }, hits: { total: 1, max_score: 1, hits: [ { _index: virtualmachines, _type: nodes, _id: 102, _score: 1, _source: { NODE_ID: 12235, CLOUD_TYPE: CLOUDSTACK, NODE_GROUP_NAME: JBOSS, NODE_CPU: 4GHZ, NODE_HOSTNAME: cloud.aricent.com, NODE_NAME: cloudstack-node1, NODE_PRIVATE_IP_ADDRESS: 10.123.124.125, NODE_PUBLIC_IP_ADDRESS: 125.31.108.72, NODE_INSTANCE_ID: cloudstack112, NODE_STATUS: ACTIVE, NODE_CATEGORY_ID: 13, NODE_CREATE_TIME: 2014-05-22 14:23:04, CPU_SPEED: 500, MEMORY: 512, CPU_USED: 0.03% } } ] }, aggregations: { memory_avg: { value: 512 } } } -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a8538769-7ba5-4ad5-b190-5f0b8d25a26b%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Nodes restarting automatically
Hello We have a cluster of 3 nodes running Ubuntu 12.04.4 LTS 64bits, and elasticsearch v1.1.1 It's be running flawlessly but since the last weak some of the nodes restarts randomly and cluster gets to red state, then yellow, then green and it happens again in a loop (sometimes it even doesnt get green state) I've tried to look at the logs but i can't find and obvious reason of what can be going on I've found entries like these, but I don't know if they are in some way related to the crash: [2014-05-22 13:55:16,150][WARN ][index.codec ] [elastic ASIC nodo 3] [logstash-2014.05.22] no index mapper found for field: [date_end] returning default postings format [2014-05-22 13:55:16,151][WARN ][index.codec ] [elastic ASIC nodo 3] [logstash-2014.05.22] no index mapper found for field: [date_end.raw] returning default postings format [2014-05-22 13:55:16,151][WARN ][index.codec ] [elastic ASIC nodo 3] [logstash-2014.05.22] no index mapper found for field: [date_start] returning default postings format [2014-05-22 13:55:16,151][WARN ][index.codec ] [elastic ASIC nodo 3] [logstash-2014.05.22] no index mapper found for field: [date_start.raw] returning default postings format For instance right now it was in yellow state, really close to get to the green state and suddenly node 3 autorestarted and now cluster is red with 2000 shard initializing. The log in that node shows this: [2014-05-22 13:59:48,498][INFO ][monitor.jvm ] [elastic ASIC nodo 3] [gc][young][1181][222] duration [735ms], collections [1]/[1s], total [735ms]/[1.1m], memory [6.5gb]-[6.1gb]/[19.9gb], all_pools {[young] [456mb]-[7.2mb]/[532.5mb]}{[survivor] [66.5mb]-[66.5mb]/[66.5mb]}{[old] [6gb]-[6gb]/[19.3gb]} [2014-05-22 14:03:44,825][INFO ][node ] [elastic ASIC nodo 3] version[1.1.1], pid[7511], build[f1585f0/2014-04-16T14:27:12Z] [2014-05-22 14:03:44,826][INFO ][node ] [elastic ASIC nodo 3] initializing ... [2014-05-22 14:03:44,839][INFO ][plugins ] [elastic ASIC nodo 3] loaded [], sites [paramedic, inquisitor, HQ, bigdesk, head] [2014-05-22 14:03:51,967][INFO ][node ] [elastic ASIC nodo 3] initialized [2014-05-22 14:03:51,967][INFO ][node ] [elastic ASIC nodo 3] starting ... The crash happened exactly at 14:02. Any Idea what can be going on or how can I trace what's happening? After rebooting there are also DEBUG errors like this: [2014-05-22 14:06:16,621][DEBUG][action.search.type ] [elastic ASIC nodo 3] [logstash-2014.05.21][1], node[jgwbxcBoTVa3JIIG5a_FJA], [P], s[STARTED]: Failed to execute [org.elasticsearch.action.search.SearchRequest@42b80f4a] lastShard [true] org.elasticsearch.transport.SendRequestTransportException: [elastic ASIC nodo 2][inet[/158.42.250.79:9301]][search/phase/query] at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:202) at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:173) at org.elasticsearch.search.action.SearchServiceTransportAction.sendExecuteQuery(SearchServiceTransportAction.java:208) at org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction$AsyncAction.sendExecuteFirstPhase(TransportSearchQueryThenFetchAction.java:80) at org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.performFirstPhase(TransportSearchTypeAction.java:216) at org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.performFirstPhase(TransportSearchTypeAction.java:203) at org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.start(TransportSearchTypeAction.java:143) at org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction.doExecute(TransportSearchQueryThenFetchAction.java:59) at org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction.doExecute(TransportSearchQueryThenFetchAction.java:49) at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:63) at org.elasticsearch.action.search.TransportSearchAction.doExecute(TransportSearchAction.java:108) at org.elasticsearch.action.search.TransportSearchAction.doExecute(TransportSearchAction.java:43) at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:63) at org.elasticsearch.client.node.NodeClient.execute(NodeClient.java:92) at org.elasticsearch.client.support.AbstractClient.search(AbstractClient.java:212) at org.elasticsearch.rest.action.search.RestSearchAction.handleRequest(RestSearchAction.java:98) at org.elasticsearch.rest.RestController.executeHandler(RestController.java:159) at org.elasticsearch.rest.RestController.dispatchRequest(RestController.java:142) at org.elasticsearch.http.HttpServer.internalDispatchRequest(HttpServer.java:121) at org.elasticsearch.http.HttpServer$Dispatcher.dispatchRequest(HttpServer.java:83) at
Re: Nodes restarting automatically
How are you running the service, upstart, init or something else? ES shouldn't just restart on it's own, this could be something else like the kernel's OOM killer. Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email: ma...@campaignmonitor.com web: www.campaignmonitor.com On 22 May 2014 22:07, Jorge Ferrando jorfe...@gmail.com wrote: Hello We have a cluster of 3 nodes running Ubuntu 12.04.4 LTS 64bits, and elasticsearch v1.1.1 It's be running flawlessly but since the last weak some of the nodes restarts randomly and cluster gets to red state, then yellow, then green and it happens again in a loop (sometimes it even doesnt get green state) I've tried to look at the logs but i can't find and obvious reason of what can be going on I've found entries like these, but I don't know if they are in some way related to the crash: [2014-05-22 13:55:16,150][WARN ][index.codec ] [elastic ASIC nodo 3] [logstash-2014.05.22] no index mapper found for field: [date_end] returning default postings format [2014-05-22 13:55:16,151][WARN ][index.codec ] [elastic ASIC nodo 3] [logstash-2014.05.22] no index mapper found for field: [date_end.raw] returning default postings format [2014-05-22 13:55:16,151][WARN ][index.codec ] [elastic ASIC nodo 3] [logstash-2014.05.22] no index mapper found for field: [date_start] returning default postings format [2014-05-22 13:55:16,151][WARN ][index.codec ] [elastic ASIC nodo 3] [logstash-2014.05.22] no index mapper found for field: [date_start.raw] returning default postings format For instance right now it was in yellow state, really close to get to the green state and suddenly node 3 autorestarted and now cluster is red with 2000 shard initializing. The log in that node shows this: [2014-05-22 13:59:48,498][INFO ][monitor.jvm ] [elastic ASIC nodo 3] [gc][young][1181][222] duration [735ms], collections [1]/[1s], total [735ms]/[1.1m], memory [6.5gb]-[6.1gb]/[19.9gb], all_pools {[young] [456mb]-[7.2mb]/[532.5mb]}{[survivor] [66.5mb]-[66.5mb]/[66.5mb]}{[old] [6gb]-[6gb]/[19.3gb]} [2014-05-22 14:03:44,825][INFO ][node ] [elastic ASIC nodo 3] version[1.1.1], pid[7511], build[f1585f0/2014-04-16T14:27:12Z] [2014-05-22 14:03:44,826][INFO ][node ] [elastic ASIC nodo 3] initializing ... [2014-05-22 14:03:44,839][INFO ][plugins ] [elastic ASIC nodo 3] loaded [], sites [paramedic, inquisitor, HQ, bigdesk, head] [2014-05-22 14:03:51,967][INFO ][node ] [elastic ASIC nodo 3] initialized [2014-05-22 14:03:51,967][INFO ][node ] [elastic ASIC nodo 3] starting ... The crash happened exactly at 14:02. Any Idea what can be going on or how can I trace what's happening? After rebooting there are also DEBUG errors like this: [2014-05-22 14:06:16,621][DEBUG][action.search.type ] [elastic ASIC nodo 3] [logstash-2014.05.21][1], node[jgwbxcBoTVa3JIIG5a_FJA], [P], s[STARTED]: Failed to execute [org.elasticsearch.action.search.SearchRequest@42b80f4a] lastShard [true] org.elasticsearch.transport.SendRequestTransportException: [elastic ASIC nodo 2][inet[/158.42.250.79:9301]][search/phase/query] at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:202) at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:173) at org.elasticsearch.search.action.SearchServiceTransportAction.sendExecuteQuery(SearchServiceTransportAction.java:208) at org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction$AsyncAction.sendExecuteFirstPhase(TransportSearchQueryThenFetchAction.java:80) at org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.performFirstPhase(TransportSearchTypeAction.java:216) at org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.performFirstPhase(TransportSearchTypeAction.java:203) at org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.start(TransportSearchTypeAction.java:143) at org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction.doExecute(TransportSearchQueryThenFetchAction.java:59) at org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction.doExecute(TransportSearchQueryThenFetchAction.java:49) at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:63) at org.elasticsearch.action.search.TransportSearchAction.doExecute(TransportSearchAction.java:108) at org.elasticsearch.action.search.TransportSearchAction.doExecute(TransportSearchAction.java:43) at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:63) at org.elasticsearch.client.node.NodeClient.execute(NodeClient.java:92) at org.elasticsearch.client.support.AbstractClient.search(AbstractClient.java:212) at
Re: Nodes restarting automatically
elasticsearch nodes are launched through /etc/init.d/elasticsearch On Thu, May 22, 2014 at 2:13 PM, Mark Walkom ma...@campaignmonitor.comwrote: How are you running the service, upstart, init or something else? ES shouldn't just restart on it's own, this could be something else like the kernel's OOM killer. Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email: ma...@campaignmonitor.com web: www.campaignmonitor.com On 22 May 2014 22:07, Jorge Ferrando jorfe...@gmail.com wrote: Hello We have a cluster of 3 nodes running Ubuntu 12.04.4 LTS 64bits, and elasticsearch v1.1.1 It's be running flawlessly but since the last weak some of the nodes restarts randomly and cluster gets to red state, then yellow, then green and it happens again in a loop (sometimes it even doesnt get green state) I've tried to look at the logs but i can't find and obvious reason of what can be going on I've found entries like these, but I don't know if they are in some way related to the crash: [2014-05-22 13:55:16,150][WARN ][index.codec ] [elastic ASIC nodo 3] [logstash-2014.05.22] no index mapper found for field: [date_end] returning default postings format [2014-05-22 13:55:16,151][WARN ][index.codec ] [elastic ASIC nodo 3] [logstash-2014.05.22] no index mapper found for field: [date_end.raw] returning default postings format [2014-05-22 13:55:16,151][WARN ][index.codec ] [elastic ASIC nodo 3] [logstash-2014.05.22] no index mapper found for field: [date_start] returning default postings format [2014-05-22 13:55:16,151][WARN ][index.codec ] [elastic ASIC nodo 3] [logstash-2014.05.22] no index mapper found for field: [date_start.raw] returning default postings format For instance right now it was in yellow state, really close to get to the green state and suddenly node 3 autorestarted and now cluster is red with 2000 shard initializing. The log in that node shows this: [2014-05-22 13:59:48,498][INFO ][monitor.jvm ] [elastic ASIC nodo 3] [gc][young][1181][222] duration [735ms], collections [1]/[1s], total [735ms]/[1.1m], memory [6.5gb]-[6.1gb]/[19.9gb], all_pools {[young] [456mb]-[7.2mb]/[532.5mb]}{[survivor] [66.5mb]-[66.5mb]/[66.5mb]}{[old] [6gb]-[6gb]/[19.3gb]} [2014-05-22 14:03:44,825][INFO ][node ] [elastic ASIC nodo 3] version[1.1.1], pid[7511], build[f1585f0/2014-04-16T14:27:12Z] [2014-05-22 14:03:44,826][INFO ][node ] [elastic ASIC nodo 3] initializing ... [2014-05-22 14:03:44,839][INFO ][plugins ] [elastic ASIC nodo 3] loaded [], sites [paramedic, inquisitor, HQ, bigdesk, head] [2014-05-22 14:03:51,967][INFO ][node ] [elastic ASIC nodo 3] initialized [2014-05-22 14:03:51,967][INFO ][node ] [elastic ASIC nodo 3] starting ... The crash happened exactly at 14:02. Any Idea what can be going on or how can I trace what's happening? After rebooting there are also DEBUG errors like this: [2014-05-22 14:06:16,621][DEBUG][action.search.type ] [elastic ASIC nodo 3] [logstash-2014.05.21][1], node[jgwbxcBoTVa3JIIG5a_FJA], [P], s[STARTED]: Failed to execute [org.elasticsearch.action.search.SearchRequest@42b80f4a] lastShard [true] org.elasticsearch.transport.SendRequestTransportException: [elastic ASIC nodo 2][inet[/158.42.250.79:9301]][search/phase/query] at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:202) at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:173) at org.elasticsearch.search.action.SearchServiceTransportAction.sendExecuteQuery(SearchServiceTransportAction.java:208) at org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction$AsyncAction.sendExecuteFirstPhase(TransportSearchQueryThenFetchAction.java:80) at org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.performFirstPhase(TransportSearchTypeAction.java:216) at org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.performFirstPhase(TransportSearchTypeAction.java:203) at org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.start(TransportSearchTypeAction.java:143) at org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction.doExecute(TransportSearchQueryThenFetchAction.java:59) at org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction.doExecute(TransportSearchQueryThenFetchAction.java:49) at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:63) at org.elasticsearch.action.search.TransportSearchAction.doExecute(TransportSearchAction.java:108) at org.elasticsearch.action.search.TransportSearchAction.doExecute(TransportSearchAction.java:43) at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:63) at
Re: Nodes restarting automatically
Like Mark said, check the oomkiller. It should log to syslog. Its is evil. Nik On Thu, May 22, 2014 at 2:14 PM, Jorge Ferrando jorfe...@gmail.com wrote: elasticsearch nodes are launched through /etc/init.d/elasticsearch On Thu, May 22, 2014 at 2:13 PM, Mark Walkom ma...@campaignmonitor.comwrote: How are you running the service, upstart, init or something else? ES shouldn't just restart on it's own, this could be something else like the kernel's OOM killer. Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email: ma...@campaignmonitor.com web: www.campaignmonitor.com On 22 May 2014 22:07, Jorge Ferrando jorfe...@gmail.com wrote: Hello We have a cluster of 3 nodes running Ubuntu 12.04.4 LTS 64bits, and elasticsearch v1.1.1 It's be running flawlessly but since the last weak some of the nodes restarts randomly and cluster gets to red state, then yellow, then green and it happens again in a loop (sometimes it even doesnt get green state) I've tried to look at the logs but i can't find and obvious reason of what can be going on I've found entries like these, but I don't know if they are in some way related to the crash: [2014-05-22 13:55:16,150][WARN ][index.codec ] [elastic ASIC nodo 3] [logstash-2014.05.22] no index mapper found for field: [date_end] returning default postings format [2014-05-22 13:55:16,151][WARN ][index.codec ] [elastic ASIC nodo 3] [logstash-2014.05.22] no index mapper found for field: [date_end.raw] returning default postings format [2014-05-22 13:55:16,151][WARN ][index.codec ] [elastic ASIC nodo 3] [logstash-2014.05.22] no index mapper found for field: [date_start] returning default postings format [2014-05-22 13:55:16,151][WARN ][index.codec ] [elastic ASIC nodo 3] [logstash-2014.05.22] no index mapper found for field: [date_start.raw] returning default postings format For instance right now it was in yellow state, really close to get to the green state and suddenly node 3 autorestarted and now cluster is red with 2000 shard initializing. The log in that node shows this: [2014-05-22 13:59:48,498][INFO ][monitor.jvm ] [elastic ASIC nodo 3] [gc][young][1181][222] duration [735ms], collections [1]/[1s], total [735ms]/[1.1m], memory [6.5gb]-[6.1gb]/[19.9gb], all_pools {[young] [456mb]-[7.2mb]/[532.5mb]}{[survivor] [66.5mb]-[66.5mb]/[66.5mb]}{[old] [6gb]-[6gb]/[19.3gb]} [2014-05-22 14:03:44,825][INFO ][node ] [elastic ASIC nodo 3] version[1.1.1], pid[7511], build[f1585f0/2014-04-16T14:27:12Z] [2014-05-22 14:03:44,826][INFO ][node ] [elastic ASIC nodo 3] initializing ... [2014-05-22 14:03:44,839][INFO ][plugins ] [elastic ASIC nodo 3] loaded [], sites [paramedic, inquisitor, HQ, bigdesk, head] [2014-05-22 14:03:51,967][INFO ][node ] [elastic ASIC nodo 3] initialized [2014-05-22 14:03:51,967][INFO ][node ] [elastic ASIC nodo 3] starting ... The crash happened exactly at 14:02. Any Idea what can be going on or how can I trace what's happening? After rebooting there are also DEBUG errors like this: [2014-05-22 14:06:16,621][DEBUG][action.search.type ] [elastic ASIC nodo 3] [logstash-2014.05.21][1], node[jgwbxcBoTVa3JIIG5a_FJA], [P], s[STARTED]: Failed to execute [org.elasticsearch.action.search.SearchRequest@42b80f4a] lastShard [true] org.elasticsearch.transport.SendRequestTransportException: [elastic ASIC nodo 2][inet[/158.42.250.79:9301]][search/phase/query] at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:202) at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:173) at org.elasticsearch.search.action.SearchServiceTransportAction.sendExecuteQuery(SearchServiceTransportAction.java:208) at org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction$AsyncAction.sendExecuteFirstPhase(TransportSearchQueryThenFetchAction.java:80) at org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.performFirstPhase(TransportSearchTypeAction.java:216) at org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.performFirstPhase(TransportSearchTypeAction.java:203) at org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.start(TransportSearchTypeAction.java:143) at org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction.doExecute(TransportSearchQueryThenFetchAction.java:59) at org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction.doExecute(TransportSearchQueryThenFetchAction.java:49) at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:63) at org.elasticsearch.action.search.TransportSearchAction.doExecute(TransportSearchAction.java:108) at
Re: Number of characters in field
You could use a script filter: filtered : { query : { ... }, filter : { script : { script : doc['subject'].value.length() 20 } } } Dan On Thursday, May 22, 2014 8:45:41 AM UTC+1, David Nielsen wrote: Hi. I am trying to find a way to express a character count filter in a querystring, for instance: I need to find all documents with field subject that holds less than 20 chars. How would i do that in a querystring ? /David -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/9d7cff9d-1449-4635-b780-89ce6102ec77%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Nodes restarting automatically
I've been checking syslog in all of the nodes and I found no mention to oom, process killed, out of memory or something similar... Just in caes I ran this commands in the 3 nodes and the problem persists: echo 0 /proc/sys/vm/oom-kill echo 1 /proc/sys/vm/overcommit_memory echo 100 /proc/sys/vm/overcommit_ratio On Thu, May 22, 2014 at 2:16 PM, Nikolas Everett nik9...@gmail.com wrote: Like Mark said, check the oomkiller. It should log to syslog. Its is evil. Nik On Thu, May 22, 2014 at 2:14 PM, Jorge Ferrando jorfe...@gmail.comwrote: elasticsearch nodes are launched through /etc/init.d/elasticsearch On Thu, May 22, 2014 at 2:13 PM, Mark Walkom ma...@campaignmonitor.comwrote: How are you running the service, upstart, init or something else? ES shouldn't just restart on it's own, this could be something else like the kernel's OOM killer. Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email: ma...@campaignmonitor.com web: www.campaignmonitor.com On 22 May 2014 22:07, Jorge Ferrando jorfe...@gmail.com wrote: Hello We have a cluster of 3 nodes running Ubuntu 12.04.4 LTS 64bits, and elasticsearch v1.1.1 It's be running flawlessly but since the last weak some of the nodes restarts randomly and cluster gets to red state, then yellow, then green and it happens again in a loop (sometimes it even doesnt get green state) I've tried to look at the logs but i can't find and obvious reason of what can be going on I've found entries like these, but I don't know if they are in some way related to the crash: [2014-05-22 13:55:16,150][WARN ][index.codec ] [elastic ASIC nodo 3] [logstash-2014.05.22] no index mapper found for field: [date_end] returning default postings format [2014-05-22 13:55:16,151][WARN ][index.codec ] [elastic ASIC nodo 3] [logstash-2014.05.22] no index mapper found for field: [date_end.raw] returning default postings format [2014-05-22 13:55:16,151][WARN ][index.codec ] [elastic ASIC nodo 3] [logstash-2014.05.22] no index mapper found for field: [date_start] returning default postings format [2014-05-22 13:55:16,151][WARN ][index.codec ] [elastic ASIC nodo 3] [logstash-2014.05.22] no index mapper found for field: [date_start.raw] returning default postings format For instance right now it was in yellow state, really close to get to the green state and suddenly node 3 autorestarted and now cluster is red with 2000 shard initializing. The log in that node shows this: [2014-05-22 13:59:48,498][INFO ][monitor.jvm ] [elastic ASIC nodo 3] [gc][young][1181][222] duration [735ms], collections [1]/[1s], total [735ms]/[1.1m], memory [6.5gb]-[6.1gb]/[19.9gb], all_pools {[young] [456mb]-[7.2mb]/[532.5mb]}{[survivor] [66.5mb]-[66.5mb]/[66.5mb]}{[old] [6gb]-[6gb]/[19.3gb]} [2014-05-22 14:03:44,825][INFO ][node ] [elastic ASIC nodo 3] version[1.1.1], pid[7511], build[f1585f0/2014-04-16T14:27:12Z] [2014-05-22 14:03:44,826][INFO ][node ] [elastic ASIC nodo 3] initializing ... [2014-05-22 14:03:44,839][INFO ][plugins ] [elastic ASIC nodo 3] loaded [], sites [paramedic, inquisitor, HQ, bigdesk, head] [2014-05-22 14:03:51,967][INFO ][node ] [elastic ASIC nodo 3] initialized [2014-05-22 14:03:51,967][INFO ][node ] [elastic ASIC nodo 3] starting ... The crash happened exactly at 14:02. Any Idea what can be going on or how can I trace what's happening? After rebooting there are also DEBUG errors like this: [2014-05-22 14:06:16,621][DEBUG][action.search.type ] [elastic ASIC nodo 3] [logstash-2014.05.21][1], node[jgwbxcBoTVa3JIIG5a_FJA], [P], s[STARTED]: Failed to execute [org.elasticsearch.action.search.SearchRequest@42b80f4a] lastShard [true] org.elasticsearch.transport.SendRequestTransportException: [elastic ASIC nodo 2][inet[/158.42.250.79:9301]][search/phase/query] at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:202) at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:173) at org.elasticsearch.search.action.SearchServiceTransportAction.sendExecuteQuery(SearchServiceTransportAction.java:208) at org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction$AsyncAction.sendExecuteFirstPhase(TransportSearchQueryThenFetchAction.java:80) at org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.performFirstPhase(TransportSearchTypeAction.java:216) at org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.performFirstPhase(TransportSearchTypeAction.java:203) at org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.start(TransportSearchTypeAction.java:143) at org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction.doExecute(TransportSearchQueryThenFetchAction.java:59) at
Re: index paramerters
Hello, I found some informations wich are not complete : There is no “correct” number of actions to perform in a single bulk call. You should experiment with different settings to find the optimum size for your particular workload. Every time you index a document elasticsearch will decide which primary shard is supposed to hold that document and will index it there. Primary shards are not copy of the data, they are the data! Having multiple shards does help taking advantage of parallel processing on a single machine Another type of shard is replica. The default is 1, meaning that every primary shard will be copied to another shard that will contain the same data. Replicas are used to increase search performance and for fail-over. regards, Anass BENJELLOUN Le jeudi 22 mai 2014 09:26:35 UTC+2, anass benjelloun a écrit : Hello, I need to index 100 000 documents with 1Mo. This is my configuration of ElasticSearch index: index: { type: doc, bulk_size: 100, number_of_shards : 5, number_of_replicas : 2 } I need to know what each parameters effect. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/570e793b-4f7a-4021-a8a4-5453d59a154c%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Number of characters in field
Well yes i know that one, is this really the only/best way to do it?. My application is forwarding an input field directly to a querystring, the user need to be able to query something like this: tags:h1 AND subject:lenght20 On Thursday, May 22, 2014 2:30:30 PM UTC+2, Dan Tuffery wrote: You could use a script filter: filtered : { query : { ... }, filter : { script : { script : doc['subject'].value.length() 20 } } } Dan On Thursday, May 22, 2014 8:45:41 AM UTC+1, David Nielsen wrote: Hi. I am trying to find a way to express a character count filter in a querystring, for instance: I need to find all documents with field subject that holds less than 20 chars. How would i do that in a querystring ? /David -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/2d4f588a-4655-4286-9e12-4d4b4d0015e6%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: How to speed up indexing by using Python API
Hi, what method are you using in your python script? Have you looked at the bulk and streaming_bulk helpers in ealsticsearch-py? http://elasticsearch-py.readthedocs.org/en/master/helpers.html Hope this helps, Honza On Thu, May 22, 2014 at 11:09 AM, 潘飞 cnwe...@gmail.com wrote: Hi all: Now , I am trying to index my logs by using the elasticsearch Python API, but I only get about 600 records/s indexing speed. but, on the same ES cluster, with the same data, logstash(redis - logstash - elasticsearch) can index data at the speed about 3000records/s. any advice on how to speed up indexing speed by using the Python API? thanks very much. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1b957d36-b7ad-4671-999a-de06aaa74407%40googlegroups.com. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CABfdDipXuL01-WNM0DbDE2Y%2BqBTs5G3wdRofMGN6C2eY2uwWrA%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Terms lookup mechanism with multiple lookup docs
Is it possible to use this feature with a lookup on multiple documents (multiple IDs) to supply the terms? http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-terms-filter.html#_terms_lookup_mechanism I tried this terms: { user: { index: users, type: user, id: [1,2,3,4], path: followers } } which was accepted as legit syntax, but only ONE of the ids was actually used within the terms filter (the last one in the list). So far the only way I can come up with for doing this is using a bool filter and stuffing a bunch of these terms lookups into the should clause to accomplish the equivalent of multiple docs lookup. Is there a more efficient way? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/783c6825-c6db-4584-80ee-4dc710730c43%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Is it wise to use ES for saving shopping Carts?
Hi Guys, I'm working on an online shop. Currently we are storing the cart's content in a MySQL Database so we can very easy access the amount of a certain product and determine the reserved quantity. This is very important as the amount in the user's carts is reserved so other users my not by them. What do you think: Is it wise to implement such a system in elasticsearch? I'm mostly worried about the time between the add to cart (inserting a document) and being able to access the total value due to the flushing delay. Thanks for your advice. Kind regards Matthias -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/0c48d418-b8b9-4d9a-b659-5aca07e60eb4%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Elasticsearch Facets + limit results
How is it possible that the count for term 2 is 3 in the first response, but 2 in the second response? From the docs: The size parameter defines how many top terms should be returned out of the overall terms list. By default, the node coordinating the search process will ask each shard to provide its own top size terms and once all shards respond, it will reduce the results to the final list that will then be sent back to the client. This means that if the number of unique terms is greater than size, the returned list is slightly off and not accurate (it could be that the term counts are slightly off and it could even be that a term that should have been in the top size entries was not returned). -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/9a37d6e2-6a8b-47dd-9180-8ff6f720a41e%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Setting up indices (mappings, settings etc.)
Hi, if anyone could comment on my code I would be very greatful. I'd like to know whether my way to set up the index is as it is intended to be. Thanks! -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/9891deb3-aa0b-4efe-8838-ed8eed5fe354%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: [HADOOP] Elasticsearch and hive
Hi, It looks like you have two tables - one that uses the JSONSerDe from cloudera and another one using es-hadoop. You configured your es-hadoop table to consider the input as json however it does not receive the proper format (as the exception indicates). See this [1] section of the documentation for more information. Let me restate again that you do not the data to Elasticsearch and es-hadoop to be in JSON format - unless it actually is in JSON (as in JSON files are stored in HDFS) simply send whatever data you have in Hive to es-hadoop and it will do the JSON translation automatically and typically much more efficient. Again the docs [2] discuss this in detail. Frankly, I'm not sure why you want to use the JSONSerDe. To conclude: if the data is in raw format in HDFS, just feed to es-hadoop with the input configured as JSON, if it's in a different format, load it in Hive as you would typically do and then insert it into the es-hadoop table. Hope this helps, [1] http://www.elasticsearch.org/guide/en/elasticsearch/hadoop/current/hive.html#writing-json-hive [2] http://www.elasticsearch.org/guide/en/elasticsearch/hadoop/current/hive.html#hive On 5/22/14 12:13 PM, hanine haninne wrote: Hello again, I m trying to get some tweets from the HDFS and put it in elasticsearch(head) in order to make search on it, I did what u said and I m so thankful, But it start reading and does not continue. I don't know why ?!! here is what I did : hive -hiveconf hive.aux.jars.path=/ES_Hadoop/dist/elasticsearch-hadoop-2.0.0.RC1.jar Logging initialized using configuration in jar:file:/home/hduser/hadoop/hive-0.11.0/lib/hive-common-0.11.0.jar!/hive-log4j.properties Hive history file=/tmp/hduser/hive_job_log_hduser_8166@mouna-pc_201405220958_2036150447.txt hive ADD JAR /home/hduser/hadoop/hive-0.11.0/lib/hive-serdes-1.0-SNAPSHOT.jar; Added /home/hduser/hadoop/hive-0.11.0/lib/hive-serdes-1.0-SNAPSHOT.jar to class path Added resource: /home/hduser/hadoop/hive-0.11.0/lib/hive-serdes-1.0-SNAPSHOT.jar hive INSERT OVERWRITE TABLE tweetsES SELECT * FROM tweets; Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks is set to 0 since there's no reduce operator Starting Job = job_201405220937_0005, Tracking URL = http://master:50030/jobdetails.jsp?jobid=job_201405220937_0005 Kill Command = /home/hduser/hadoop/libexec/../bin/hadoop job -kill job_201405220937_0005 Hadoop job information for Stage-0: number of mappers: 1; number of reducers: 0 2014-05-22 10:01:39,184 Stage-0 map = 0%, reduce = 0% 2014-05-22 10:01:59,320 Stage-0 map = 100%, reduce = 100% Ended Job = job_201405220937_0005 with errors Error during job, obtaining debugging information... Job Tracking URL: http://master:50030/jobdetails.jsp?jobid=job_201405220937_0005 Examining task ID: task_201405220937_0005_m_02 (and more) from job job_201405220937_0005 Task with the most failures(4): - Task ID: task_201405220937_0005_m_00 URL: http://master:50030/taskdetails.jsp?jobid=job_201405220937_0005tipid=task_201405220937_0005_m_00 - Diagnostic Messages for this Task: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {id:467649114269028352,created_at:Sat May 17 12:53:32 + 2014,source:a href=\http://www.apple.com\; rel=\nofollow\iOS/a,favorited:false,retweet_count:0,retweeted_status:null,entities:{urls:[{expanded_url:https://www.google.co.uk/search?q=viera+lifting+fa+cup+for+arsenalclient=safarihl=en-gbsource=lnmstbm=ischsa=Xei=alp3U4LrK82UOufRgaAKved=0CAYQ_AUoAQbiw=320bih=460#facrc=_imgrc=-QK5IhwQT8AS8M%253A%3B2yfUWG7-cngS-M%3Bhttp%253A%252F%252Ffarm4.static.flickr.com%252F3777%252F10838930556_618b87d7cc_m.jpg%3Bhttp%253A%252F%252Fwhotalking.com%252Fflickr%252F2-0%252BChelsea%3B1024%3B669}],user_mentions:[{screen_name:jameswilliams46,name:James Williams}],hashtags:[]},text:@jameswilliams46 https://t.co/CAmxdagAym,user:{screen_name:SSimpson7379,name:Stephen Simpson,friends_count:258,followers_count:266,statuses_count:2609,verified:false,utc_offset:3600,time_zone:London},in_reply_to_screen_name:jameswilliams46} at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:162) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190) at org.apache.hadoop.mapred.Child.main(Child.java:249) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {id:467649114269028352,created_at:Sat May 17 12:53:32 + 2014,source:a href=\http://www.apple.com\;
Analyzers and char_filters o_0 creepy outputs
Hi! This is a sample setup, close to what I am working with https://gist.github.com/anonymous/6e1457321a8ad78c6af8 As you can see, I am trying to remove the hyphens from all words, so that words like hand-made are indexed as handmade. The goal is to make a search for handmade find all documents, containing hand-made and vice versa. For some reason it doesn't work, though :( I have also attached 3 sample queries. The expected result would be for all of them to return the same result set. 1) Astonishingly, a search for Chemie-injenieur finds 2 results, but a search for Chemieingenieur finds none. This is pretty creepy to me, since the char_filter is supposed to strip the hyphens prior to tokenizing in the indexing process. 2) Another creepy fact is that if I specify the searchAnalyzer explicitly, I find no results (see query 3) from this document set 3) Moreover the analyzeAPI shows that the search term Chemie-ingenieur gets translated to Chemieingenieur using this analyzer 4) And the most creepy facts is that when I run these queries with the actual index data (800+ documents), I get 17 results for Chemie-ingenieur and 22 for Chemieingenieur, where NONE OF THEM OVERLAPS. I.e. I have a total of 39 documents that should be matching either of the queries. Some of the documents that match Chemie-ingenieur actually don't contain the word with the hyphen. So I would expect these documents to be contained in both result sets, maybe with a different relevancy score. This is, however, not the case. Please help me get over this, I have been struggling with it for a full week already. I would be very grateful for some explanation too, apart from a solution, since the output is much different that what I expect from my understanding and this means that I don't really understand the system. P.S. Please focus on the actual problem and let's not discuss the mapping into details. The version I have pasted is pretty different than what I have started with initially, due to the try-and-error approach I have been using for almost a week. Thanks sincerely, Georgi -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/417363d0-965f-4398-8174-9889db47d50b%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Wait for yellow status
While doing some tests, I thought I uncovered a bug in the cluster-health/wait-for-yellow request. No matter what settings I tried, the request would always return immediately with no timeout. I then realized that the request is actually something like wait for AT LEAST yellow state. In other words, a GREEN state would satisfy a wait for YELLOW state. The code in question https://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/elasticsearch/action/admin/cluster/health/TransportClusterHealthAction.java#L132 *response.getStatus().value() = request.waitForStatus().value()* Obviously few people are interested in waiting for when a cluster goes from GREEN to YELLLOW, but I am one of those few. I would love to propose a true wait-for-yellow (and even red), but it would be difficult to come up with an appropriately named call since the waitForYellow is taken and should be renamed (breaking code in the process). I am assuming the current code is a feature and not a bug, in which case the documentation should reflect that thinking (willing to fix). But in the meantime, how can one be notified if a cluster changes its state from GREEN without constant polling? Cheers, Ivan -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQAfpmsHe%2BfiRZ6_OD%2BV9bGO_nsoPPZn8w%3DgWBvJdLGzFg%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Unassigned Shards Problem
I have five nodes : Two Master Nodes, One Balancer Node, One Workhorse Node, and One Coordinator Node. I am shipping events from logstash, redis, to elasticsearch. At the moment, my cluster is RED. The shards are created but no index is created. I used to get an index like logstash.2014-05-22, but not anymore. I deleted all my data, Cluster health goes GREEN. However, as soon as data is sent from logstash - redis - elasticsearch, my cluster health goes RED. I end up with unassigned shards. In my /var/log/elasticsearch/logstash.log on my master, I see this in the log: [2014-05-22 12:03:20,599][INFO ][cluster.metadata ] [Bora] [logstash-2014.05.22] creating index, cause [auto(bulk api)], shards [5]/[1], mappings [_default_] On my master, this is the configuration: cluster: name: logstash routing: allocation: awareness: attributes: rack node: data: true master: true curl -XGET 'http://localhost:9200/_cluster/health?pretty=true' { cluster_name : logstash, status : red, timed_out : false, number_of_nodes : 5, number_of_data_nodes : 3, active_primary_shards : 0, active_shards : 0, relocating_shards : 0, initializing_shards : 0, unassigned_shards : 10 } Is there an incorrect setting? I also installed ElasticHQ. It tells me the same information. Brian -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/03c5974b-ae50-4f1c-9ba3-4ef94b564323%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: how to get only aggregation values from elasticsearch
You can set the size to 0. http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-from-size.html You will still get back the search metadata though. -- Ivan On Thu, May 22, 2014 at 4:46 AM, Subhadip Bagui i.ba...@gmail.com wrote: Hi, I want to get the average value of MEMORY field from my ES document. Below is the query I'm using for that. Here I'm getting the aggregation along with the hits Json also. Is there any way we can get the aggreation result only. Please suggest. POST /virtualmachines/_search { query : { filtered : { query : { match : { CLOUD_TYPE : CLOUDSTACK }}, filter : { range : { NODE_CREATE_TIME : { from : 2014-05-22 14:11:35, to : 2014-05-22 14:33:35 }} } } }, aggs : { memory_avg : { avg : { field : MEMORY } } } } *response* : { took: 2, timed_out: false, _shards: { total: 3, successful: 3, failed: 0 }, hits: { total: 1, max_score: 1, hits: [ { _index: virtualmachines, _type: nodes, _id: 102, _score: 1, _source: { NODE_ID: 12235, CLOUD_TYPE: CLOUDSTACK, NODE_GROUP_NAME: JBOSS, NODE_CPU: 4GHZ, NODE_HOSTNAME: cloud.aricent.com, NODE_NAME: cloudstack-node1, NODE_PRIVATE_IP_ADDRESS: 10.123.124.125, NODE_PUBLIC_IP_ADDRESS: 125.31.108.72, NODE_INSTANCE_ID: cloudstack112, NODE_STATUS: ACTIVE, NODE_CATEGORY_ID: 13, NODE_CREATE_TIME: 2014-05-22 14:23:04, CPU_SPEED: 500, MEMORY: 512, CPU_USED: 0.03% } } ] }, aggregations: { memory_avg: { value: 512 } } } -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a8538769-7ba5-4ad5-b190-5f0b8d25a26b%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/a8538769-7ba5-4ad5-b190-5f0b8d25a26b%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQCfKh_qbSiEJtj_%3DS%3DZ5Q43Lw3V4egRVEdsvz46FvvqPw%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: Trigram-accelerated regex searches
Martijn took a swing at it just now. He eliminated any scoring-based slowdown, like so (constant_score_filter)… curl -s -XGET 'http://127.0.0.1:9200/dxr_test/line/_search?pretty' -d '{ query: { filtered: { query: { match_all: {} }, filter: { and: [ { query: { match_phrase: { content_trg: Children } } }, { query: { match_phrase: { content_trg: Next } } }, { query: { wildcard: { content: { wildcard: *Children*Next*, rewrite: constant_score_filter } } } } ] } } } }' …but it didn't make any difference. Somehow, the `and` pipeline isn't behaving as we expect. Since ES can't provide any more detailed timing ouput, I guess the next step is to go look at the source code for the `and` filter and the wildcard query and see what's what. I think we'd both be fascinated to know what's going on, if anyone has anything to add. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3114f40c-0b15-4dd4-8a6b-fc8c13d43f23%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Trigram-accelerated regex searches
On Wed, May 21, 2014 at 6:01 PM, Erik Rose grinche...@gmail.com wrote: I'm trying to move Mozilla's source code search engine (dxr.mozilla.org) from a custom-written SQLite trigram index to ES. In the current production incarnation, we support fast regex (and, by extension, wildcard) searches by extracting trigrams from the search pattern and paring down the documents to those containing said trigrams. This is definitely a great approach for a database, but it won't work exactly the same way for an inverted index because the datastructure is totally different. In the inverted index queries like wildcards are slow: they must iterate and match all terms in the document collection, then intersect those postings with the rest of your query. So because its inverted, it works backwards from what you expect and thats why adding additional intersections like 'AND' don't speed anything up, they haven't happened yet. N-grams can speed up partial matching in general, but the methods to accomplish this are different: usually the best way to go about it is to try to think about Analyzing the data in such a way that the queries to accomplish what you need are as basic as possible. The first question is if you really need partial matching at all: I don't have much knowledge about your use case, but just going from your example, i would look at wildcards like *Children*Next* and ask if instead i'd want to ensure my analyzer split on case-changes, and try to see if i could get what i need with a sloppy phrase query. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAMUKNZUS40rsAjmzrL_YK6yjgjZRumeQKFVPhVu9bUcW4nN_KA%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Elasticsearch 1.20 and 1.1.2
Releases for some reason never get promoted on the mailing list, so here goes: http://www.elasticsearch.org/blog/elasticsearch-1-2-0-released/ The main reason why I posted about the release was because I tested out cross-version cluster compatibility with 1.1.1 and 1.2.0 nodes and everything seems to be working. As someone that had to endure full cluster restarts in the past, this news is welcome. I do know that rolling upgrade have been available since 1.0, but 1.1 was released shortly after 1.0 that they seemed like the same minor version. Cheers, Ivan -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQD5y0MpTScDq7XDWjs9M8FTGDx%3DiuYeySaf8UdD35wyUg%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: Trigram-accelerated regex searches
Leading wildcards are really expensive. Maybe you can try creating a copy of your content field that reverses the tokens using reverse token filter [1]. By doing this you turn those expensive leading wildcards into trailing wildcards which should give you better performance. I think your query would look something like this: { query: { constant_score: { query: { bool: { should: [ {wildcard: {content: Children*Next*}}, {wildcard: {content_rev: txeN*nerdlihC*}} ] } } } } } Note that you will need to reverse your query string as the wildcard query is not analyzed. [1] http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-reverse-tokenfilter.html#analysis-reverse-tokenfilter Thanks, Matt Weber On Thu, May 22, 2014 at 11:09 AM, Erik Rose grinche...@gmail.com wrote: Martijn took a swing at it just now. He eliminated any scoring-based slowdown, like so (constant_score_filter)… curl -s -XGET 'http://127.0.0.1:9200/dxr_test/line/_search?pretty' -d '{ query: { filtered: { query: { match_all: {} }, filter: { and: [ { query: { match_phrase: { content_trg: Children } } }, { query: { match_phrase: { content_trg: Next } } }, { query: { wildcard: { content: { wildcard: *Children*Next*, rewrite: constant_score_filter } } } } ] } } } }' …but it didn't make any difference. Somehow, the `and` pipeline isn't behaving as we expect. Since ES can't provide any more detailed timing ouput, I guess the next step is to go look at the source code for the `and` filter and the wildcard query and see what's what. I think we'd both be fascinated to know what's going on, if anyone has anything to add. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3114f40c-0b15-4dd4-8a6b-fc8c13d43f23%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/3114f40c-0b15-4dd4-8a6b-fc8c13d43f23%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- Thanks, Matt Weber -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAJ3KEoA1fQjbkygEBhxZdMcb%3D22JGDph65qNn1cvkE66NLRn3A%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: Trigram-accelerated regex searches
Leading wildcards are really expensive. Maybe you can try creating a copy of your content field that reverses the tokens using reverse token filter [1]. Good advice, typically, but notice I have wildcards on either side. Reversing just makes the trailing wildcard expensive. :-) -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/6b982d49-5f3b-472a-84ea-cfd4f2c662d6%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Trigram-accelerated regex searches
Aye, and then you can use edit distance on single words (fuzzy query) to cope with fast typers On May 22, 2014 8:22 PM, Robert Muir robert.m...@elasticsearch.com wrote: On Wed, May 21, 2014 at 6:01 PM, Erik Rose grinche...@gmail.com wrote: I'm trying to move Mozilla's source code search engine (dxr.mozilla.org) from a custom-written SQLite trigram index to ES. In the current production incarnation, we support fast regex (and, by extension, wildcard) searches by extracting trigrams from the search pattern and paring down the documents to those containing said trigrams. This is definitely a great approach for a database, but it won't work exactly the same way for an inverted index because the datastructure is totally different. In the inverted index queries like wildcards are slow: they must iterate and match all terms in the document collection, then intersect those postings with the rest of your query. So because its inverted, it works backwards from what you expect and thats why adding additional intersections like 'AND' don't speed anything up, they haven't happened yet. N-grams can speed up partial matching in general, but the methods to accomplish this are different: usually the best way to go about it is to try to think about Analyzing the data in such a way that the queries to accomplish what you need are as basic as possible. The first question is if you really need partial matching at all: I don't have much knowledge about your use case, but just going from your example, i would look at wildcards like *Children*Next* and ask if instead i'd want to ensure my analyzer split on case-changes, and try to see if i could get what i need with a sloppy phrase query. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAMUKNZUS40rsAjmzrL_YK6yjgjZRumeQKFVPhVu9bUcW4nN_KA%40mail.gmail.com . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAHTr4ZugwATMVFH%3DFziTPkX-dT6%3DRGfwhCud2S_aBcSDYmxZEA%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: Trigram-accelerated regex searches
This is definitely a great approach for a database, but it won't work exactly the same way for an inverted index because the datastructure is totally different. Ah, I was afraid of that. I hoped, due to the field being unanalyzed (and the documentation's noted restriction that wildcard queries work only on unanalyzed fields) that the wildcard query would apply to the fetched field contents rather than crawling the whole index. But your theory certainly fits the timings I've been getting. Thanks! -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3d523ff3-5292-4ac5-9ec8-7df8520de58b%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Trigram-accelerated regex searches
Alright, try this on for size. :-) Since the built-in regex-ish filters want to be all clever and index-based, why not use the JS script plugin, which is happy to run as a post-processing phase? curl -s -XGET 'http://127.0.0.1:9200/dxr_test/line/_search?pretty' -d '{ query: { filtered: { query: { match_all: {} }, filter: { and: [ { query: { match_phrase: { content_trg: Children } } }, { query: { match_phrase: { content_trg: Next } } }, { script: { lang: js, script: (new RegExp(pattern)).test(doc[\content\].value), params: { pattern: Children.*Next } } } ] } } } }' That gets me through the whole 16M-doc corpus in 117ms. (Without the match_phrase queries, it takes forever, at 12s, so you can see the trigrams acceleration working.) I am ecstatic. Some of you might note that the pattern doesn't begin or end with a wildcard; that's because RegExp.test() serves as a search rather than a match, so wildcards are effectively assumed. Cheers! -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/6a7674d8-bf51-4be6-860e-589db3939ea8%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Nested cardinality values way off with filter?
Hello, I'm trying to get produce the distribution of documents that matches vs don't match a query, and get the cardinality of a field for both sets. The idea is Users who did vs Users who did not. In reality I'm actually running another aggregation under did not (otherwise I'd just subtract one count from the total), but the query here illustrates the issue I'm having: *Query* aggs: { total_distinct_count: { cardinality: { field: UserId } }, has_thing: { filter: { term: { State: thing } }, aggs: { distinct_count: { cardinality: { field: UserId } } } }, does_not_have_thing: { filter: { not : { term: { State: thing } } }, aggs: { distinct_count: { cardinality: { field: UserId } } } } } *Response* hits: { total: 3309709, max_score: 0, hits: [] }, aggregations: { total_distinct_count: { value: 654556 }, does_not_have_thing: { doc_count: 2575512, distinct_count: { value: 563371 } }, has_thing: { doc_count: 734197, distinct_count: { value: 223128 } } } I would expect (aggregations.has_thing.dictinct_count.value + aggregations.does_not_have_thing.distinct_count.value) to be close to aggreations.total_distinct_count.value, but in reality it's pretty far off (~+20%). Note: That the summation of doc_count adds up exactly to hits.total. So I don't think this is an issue with the query, but I could be wrong. Any ideas whats up? Have I structured the query incorrectly, Is this a bug? Or is this just expected behavior? Some notes: - UserId's data type is a *long, *but the values only fill up integer space. (510,539 to 418,346,844) - I'm running elasticsearch 1.1.0 - I've tried playing around with the precision threshold, but it doesn't appear to make a difference. Thanks in advance, Cheers Phil -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/cb558261-7865-491e-9bc5-e3f78b6390f3%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Unassigned Shards Problem
It does create an index, it says so in the log - [logstash-2014.05.22] creating index - it's jut not assigning things. You've set routing.allocation.awareness.attribute, but have you set the node value, ie node.rack? See http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-cluster.html#allocation-awareness Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email: ma...@campaignmonitor.com web: www.campaignmonitor.com On 23 May 2014 02:22, Brian Wilkins bwilk...@gmail.com wrote: I have five nodes : Two Master Nodes, One Balancer Node, One Workhorse Node, and One Coordinator Node. I am shipping events from logstash, redis, to elasticsearch. At the moment, my cluster is RED. The shards are created but no index is created. I used to get an index like logstash.2014-05-22, but not anymore. I deleted all my data, Cluster health goes GREEN. However, as soon as data is sent from logstash - redis - elasticsearch, my cluster health goes RED. I end up with unassigned shards. In my /var/log/elasticsearch/logstash.log on my master, I see this in the log: [2014-05-22 12:03:20,599][INFO ][cluster.metadata ] [Bora] [logstash-2014.05.22] creating index, cause [auto(bulk api)], shards [5]/[1], mappings [_default_] On my master, this is the configuration: cluster: name: logstash routing: allocation: awareness: attributes: rack node: data: true master: true curl -XGET 'http://localhost:9200/_cluster/health?pretty=true' { cluster_name : logstash, status : red, timed_out : false, number_of_nodes : 5, number_of_data_nodes : 3, active_primary_shards : 0, active_shards : 0, relocating_shards : 0, initializing_shards : 0, unassigned_shards : 10 } Is there an incorrect setting? I also installed ElasticHQ. It tells me the same information. Brian -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/03c5974b-ae50-4f1c-9ba3-4ef94b564323%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/03c5974b-ae50-4f1c-9ba3-4ef94b564323%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624YGofrc55ND%3DXb3QW9cu799R0AKjB6_5Nmse_rMc9qcAw%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: Is it wise to use ES for saving shopping Carts?
ES is eventually consistent, so it may not make sense if you're latency requirements are very strict. If you can introduce a delay then it should work. Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email: ma...@campaignmonitor.com web: www.campaignmonitor.com On 22 May 2014 23:54, Matthias Feist matf...@gmail.com wrote: Hi Guys, I'm working on an online shop. Currently we are storing the cart's content in a MySQL Database so we can very easy access the amount of a certain product and determine the reserved quantity. This is very important as the amount in the user's carts is reserved so other users my not by them. What do you think: Is it wise to implement such a system in elasticsearch? I'm mostly worried about the time between the add to cart (inserting a document) and being able to access the total value due to the flushing delay. Thanks for your advice. Kind regards Matthias -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/0c48d418-b8b9-4d9a-b659-5aca07e60eb4%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/0c48d418-b8b9-4d9a-b659-5aca07e60eb4%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624ZM_fk9i49SpQtRmb2rr5Xy6WV3omwsCB8baBh4tic3Bw%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: Elasticsearch 1.20 and 1.1.2
Plugin developers should watch out for changes in classes, e.g. XContentRestResponse (useful for REST actions) has gone, and there are some internal API changes in IndexShard methods, also new deprecations (IndicesStatusAction is now RecoveryAction) - maybe more I did not recognize yet in my compile error logs... Beside this, I just managed to switch developer stage from 1.1.0 to 1.2.0 with green status :) tomorrow heavy load test. Jörg On Thu, May 22, 2014 at 8:24 PM, Ivan Brusic i...@brusic.com wrote: Releases for some reason never get promoted on the mailing list, so here goes: http://www.elasticsearch.org/blog/elasticsearch-1-2-0-released/ The main reason why I posted about the release was because I tested out cross-version cluster compatibility with 1.1.1 and 1.2.0 nodes and everything seems to be working. As someone that had to endure full cluster restarts in the past, this news is welcome. I do know that rolling upgrade have been available since 1.0, but 1.1 was released shortly after 1.0 that they seemed like the same minor version. Cheers, Ivan -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQD5y0MpTScDq7XDWjs9M8FTGDx%3DiuYeySaf8UdD35wyUg%40mail.gmail.comhttps://groups.google.com/d/msgid/elasticsearch/CALY%3DcQD5y0MpTScDq7XDWjs9M8FTGDx%3DiuYeySaf8UdD35wyUg%40mail.gmail.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoESs5iNQ%3DkB%2BdyKiGuyC9GeVu607Kp%2BquS4RX6pn0ZLAQ%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: Unassigned Shards Problem
Thanks for your reply. I set the node.rack to rack_one on all the nodes as a test. In ElasticHQ, on the right it shows no indices. It is empty. In my master, I see that the nodes are identifying with rack_one (all of them). Any other clues? Thanks Brian On Thursday, May 22, 2014 5:10:25 PM UTC-4, Mark Walkom wrote: It does create an index, it says so in the log - [logstash-2014.05.22] creating index - it's jut not assigning things. You've set routing.allocation.awareness.attribute, but have you set the node value, ie node.rack? See http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-cluster.html#allocation-awareness Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email: ma...@campaignmonitor.com javascript: web: www.campaignmonitor.com On 23 May 2014 02:22, Brian Wilkins bwil...@gmail.com javascript:wrote: I have five nodes : Two Master Nodes, One Balancer Node, One Workhorse Node, and One Coordinator Node. I am shipping events from logstash, redis, to elasticsearch. At the moment, my cluster is RED. The shards are created but no index is created. I used to get an index like logstash.2014-05-22, but not anymore. I deleted all my data, Cluster health goes GREEN. However, as soon as data is sent from logstash - redis - elasticsearch, my cluster health goes RED. I end up with unassigned shards. In my /var/log/elasticsearch/logstash.log on my master, I see this in the log: [2014-05-22 12:03:20,599][INFO ][cluster.metadata ] [Bora] [logstash-2014.05.22] creating index, cause [auto(bulk api)], shards [5]/[1], mappings [_default_] On my master, this is the configuration: cluster: name: logstash routing: allocation: awareness: attributes: rack node: data: true master: true curl -XGET 'http://localhost:9200/_cluster/health?pretty=true' { cluster_name : logstash, status : red, timed_out : false, number_of_nodes : 5, number_of_data_nodes : 3, active_primary_shards : 0, active_shards : 0, relocating_shards : 0, initializing_shards : 0, unassigned_shards : 10 } Is there an incorrect setting? I also installed ElasticHQ. It tells me the same information. Brian -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com javascript:. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/03c5974b-ae50-4f1c-9ba3-4ef94b564323%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/03c5974b-ae50-4f1c-9ba3-4ef94b564323%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5bd4c3d5-3be5-44ef-a8a6-5dba6876130c%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Elasticsearch 1.20 and 1.1.2
Hurray! However they are still using the new version, new path release method, so if you want 1.2 you will need to update your sources to http://packages.elasticsearch.org/elasticsearch/1.2/$OS Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email: ma...@campaignmonitor.com web: www.campaignmonitor.com On 23 May 2014 04:24, Ivan Brusic i...@brusic.com wrote: Releases for some reason never get promoted on the mailing list, so here goes: http://www.elasticsearch.org/blog/elasticsearch-1-2-0-released/ The main reason why I posted about the release was because I tested out cross-version cluster compatibility with 1.1.1 and 1.2.0 nodes and everything seems to be working. As someone that had to endure full cluster restarts in the past, this news is welcome. I do know that rolling upgrade have been available since 1.0, but 1.1 was released shortly after 1.0 that they seemed like the same minor version. Cheers, Ivan -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQD5y0MpTScDq7XDWjs9M8FTGDx%3DiuYeySaf8UdD35wyUg%40mail.gmail.comhttps://groups.google.com/d/msgid/elasticsearch/CALY%3DcQD5y0MpTScDq7XDWjs9M8FTGDx%3DiuYeySaf8UdD35wyUg%40mail.gmail.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624az2QY2ahYzKR_Ycxd34%2Bi9x-WrnzQNUkD3e73C%2BytrKQ%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: Unassigned Shards Problem
Went back and read the page again. So I made one master, workhorse, and balancer with rackid of rack_two for testing. One master shows rackid of rack_one. All nodes were restarted. The shards are still unassigned. Also,the indices in ElasticHQ are empty. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/4ac7b65e-3bc7-4f51-85e5-65ca3719880d%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Nested cardinality values way off with filter?
distinct_countOn Thu, May 22, 2014 at 10:34 PM, Phil Price philpr...@gmail.com wrote: I would expect (aggregations.has_thing.dictinct_count.value + aggregations.does_not_have_thing.distinct_count.value) to be close to aggreations.total_distinct_count.value, but in reality it's pretty far off I think this result is to be expected if you have some user IDs that match both criteria? Eg. if your index has these two documents: { UserId : 42, State: thing } { UserId : 42, State: anything } Then your aggregations would look like: aggregations: { total_distinct_count: { value: 1 }, does_not_have_thing: { doc_count: 1, distinct_count: { value: 1 } }, has_thing: { doc_count: 1, distinct_count: { value: 1 } } } And the sum of the values of distinct_count per bucket is larger than the global value for distinct_count. -- Adrien Grand -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j6Dsf5wbALt4v7ObbeM%3DRyuHDG-ueYoNnXFwzE_TtQqdg%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: time taken by each stage for a query
That is not easy, and the reason is that Elasticsearch and Solr work in quite a different way eg. when it comes to compute facets/aggregations: Solr first computes top hits, and if facets are required, it will load the doc IDs of document matches into a bit set that will be used in a subsequent step in order to compute facets. On the other hand, Elasticsearch computes both top hits and facets/aggregations at the same time (in the same Collector if you are familiar with Lucene terminology) which makes timings harder to track. On Thu, May 22, 2014 at 6:25 AM, Srinivasan Ramaswamy ursva...@gmail.comwrote: In the past when i used solr, i can look at time taken by each component to understand where most of the time is spent for a particular query. Similarly, I am trying to understand the breakdown of time spent for one particular query. Can anyone tell me how can i investigate performance of specific queries it in elasticsearch ? Thanks Srini -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/02f323f6-b2af-4c71-889f-9f4f7f888c57%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/02f323f6-b2af-4c71-889f-9f4f7f888c57%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- Adrien Grand -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j5y1Md0G9WBuQxj_AokB8ro%2BuYAR9biFRKwYLdbpAkU4g%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: Is it wise to use ES for saving shopping Carts?
On Thu, May 22, 2014 at 3:54 PM, Matthias Feist matf...@gmail.com wrote: What do you think: Is it wise to implement such a system in elasticsearch? I'm mostly worried about the time between the add to cart (inserting a document) and being able to access the total value due to the flushing delay. For your information, this flushing delay only exists for search operations. We call it a near realtime operation because of this delay that you need to wait after having indexed a document and before being able to search for it (1 second by default). However Elasticsearch doesn't only have a search API, it also has a GET API that is realtime[1] and basically allows you to use Elasticsearch as a key-value store. [1] http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-get.html#realtime -- Adrien Grand -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j49hmhz7Av15FM13Nnpizh6GZEwn3dWmE_d2nZNCSXvow%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: Nested cardinality values way off with filter?
Doh! You are correct, my bad. I assumed the filter was an exclusive per user property, but in fact - it is not. Thanks for getting back to me Cheers Phil On Thursday, May 22, 2014 4:36:02 PM UTC-7, Adrien Grand wrote: distinct_countOn Thu, May 22, 2014 at 10:34 PM, Phil Price phil...@gmail.com javascript: wrote: I would expect (aggregations.has_thing.dictinct_count.value + aggregations.does_not_have_thing.distinct_count.value) to be close to aggreations.total_distinct_count.value, but in reality it's pretty far off I think this result is to be expected if you have some user IDs that match both criteria? Eg. if your index has these two documents: { UserId : 42, State: thing } { UserId : 42, State: anything } Then your aggregations would look like: aggregations: { total_distinct_count: { value: 1 }, does_not_have_thing: { doc_count: 1, distinct_count: { value: 1 } }, has_thing: { doc_count: 1, distinct_count: { value: 1 } } } And the sum of the values of distinct_count per bucket is larger than the global value for distinct_count. -- Adrien Grand -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/40c4f479-1787-4931-be7d-9511dc06e1fb%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: why the special nested aggregation and query?
Although I would agree that being able to detect it automatically could make things simpler, I think that the fact that it is excplicit is more flexible. For example, it can make sense to copy field values into the root document[1]. This can help speed-up some queries that don't need to know about the tree structure of your document. And in that case you have two ways to search the same field name: - either through the root document: faster but less flexible - or through the nested document: less flexible but slower The fact that nested queries are explicit allows you to choose the way that you want the field to be queried. For aggregations, I think it is also nice to make it explicit so that counts are not surprising: imagine that you have a document with properties stored as nested documents and each property having a name. If you run a terms aggregation on the property name from the root document, buckets will count how many root documents have this property name. On the other hand, if you run this terms aggregation through a nested field, this will count the number of _properties_ that have this name. Since each document can have several properties, counts might be much higher. [1] http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-nested-type.html On Wed, May 21, 2014 at 9:55 PM, Jad jna...@gmail.com wrote: Hi All, I've been thinking about why nested fields need to be handled with a special nested query and aggregation type. Is it to handle the case where there are multiple nested levels, to be able to control whether a query involving two nested fields is within the same nested instance or across two nested instances? Thanks Jad. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/19b47173-41f2-4a5e-9bb8-efc84f0f9a56%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/19b47173-41f2-4a5e-9bb8-efc84f0f9a56%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- Adrien Grand -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j60svm9864bHyz1Da0JU%3DLN3%3DtgOCoyPN06rN5Q0oE4NA%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: _search/scroll?search_type=scan bugs/inconsistencies
scan is mainly useful as a way to export data from the index. In the context of a user interface, I think scroll would make more sense[1]. On a side note, paging improved significantly for scroll requests in Elasticsearch 1.2 (in both terms of speed and memory usage). [1] http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-scroll.html On Wed, May 21, 2014 at 8:28 PM, schmichael mich...@lytics.io wrote: On Wednesday, May 21, 2014 11:20:53 AM UTC-7, schmichael wrote: ... 2. search_type=scan doesn't seem to honor size=N It seems I missed this in the guide: When scanning, the size is applied to each shard, so you will get back a maximum of size * number_of_primary_shards documents in each batch. http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/scan-scroll.html ...but that only seems to be the case with the scan search_type. Do I just have to divide the user's requested page size by my number of shards (5 at this point)? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/76d3e1eb-f0dd-46f3-a508-7b246b068e21%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/76d3e1eb-f0dd-46f3-a508-7b246b068e21%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- Adrien Grand -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j74g0Qqz0VLNDAHDEZ9N%3D2w723vEWaxqicfnQ_Cn1DSfA%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: indices.cache.filter.size limit not enforce ?
For those who would come to this thread through a search engine, Dan found the root cause of this issue https://github.com/elasticsearch/elasticsearch/issues/6268 On Wed, May 21, 2014 at 8:03 PM, Daniel Low dang...@gmail.com wrote: Hello, Has there been any updates to this? We are using nodes with 256GB of ram and heap sizes of 96GB and also seeing this exact same issue where filter cache sizes grow above the limit. What I also discovered was that when I set the filter cache size to 31.9GB or lower the limit worked fine, but anything above and it did not. Thanks, Daniel On Friday, October 25, 2013 5:55:37 AM UTC-7, Benoît wrote: Hi ! On Friday, October 25, 2013 2:06:58 PM UTC+2, Clinton Gormley wrote: I've never seen the filter cache limit not being enforced. If you can provide supporting data, ie the filter cache size from the nodes_stats plus the settings you had in place at the time, would be helpful. Output of _cluster/settings and _nodes/stats?all=true in the following gist : https://gist.github.com/benoit-intrw/7154161 The value is not really high right now but 44.5gb is over 30% of commited heap (127.8gb) filter_cache: { memory_size: 44.5gb, memory_size_in_bytes: 47819287444, evictions: 0 }, I support Ivan's comment about heap size: the bigger the heap, the longer GC takes. And using a heap above 32GB means the JVM can't use compressed pointers. So better to run multiple nodes on one machine, using shard awareness to ensure that you don't have copies of the same data on the same machine. ok i will think about it but the machine are in production ... Regards Benoît -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/f5158eda-7e43-4b05-9f32-52bd3984c766%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/f5158eda-7e43-4b05-9f32-52bd3984c766%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- Adrien Grand -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j4h9EcB6DoW0zSU2bpWCxoWm1_kkMt7vDYY0vqXLZif6g%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: _search/scroll?search_type=scan bugs/inconsistencies
Thanks for the response Adrien. I'm excited to upgrade to 1.2, but it seems strange to me that people refer to scan vs. scroll (you're not the first) as scan is simply a search_type that - AFAIK - can be used for any type of search (scroll or otherwise). It just seems strange that setting the search_type=scan changes the behavior of _search/scroll significantly (size=N semantics differ and no first page is returned). On Thu, May 22, 2014 at 5:09 PM, Adrien Grand adrien.gr...@elasticsearch.com wrote: scan is mainly useful as a way to export data from the index. In the context of a user interface, I think scroll would make more sense[1]. On a side note, paging improved significantly for scroll requests in Elasticsearch 1.2 (in both terms of speed and memory usage). [1] http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-scroll.html On Wed, May 21, 2014 at 8:28 PM, schmichael mich...@lytics.io wrote: On Wednesday, May 21, 2014 11:20:53 AM UTC-7, schmichael wrote: ... 2. search_type=scan doesn't seem to honor size=N It seems I missed this in the guide: When scanning, the size is applied to each shard, so you will get back a maximum of size * number_of_primary_shards documents in each batch. http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/scan-scroll.html ...but that only seems to be the case with the scan search_type. Do I just have to divide the user's requested page size by my number of shards (5 at this point)? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/76d3e1eb-f0dd-46f3-a508-7b246b068e21%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/76d3e1eb-f0dd-46f3-a508-7b246b068e21%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- Adrien Grand -- You received this message because you are subscribed to a topic in the Google Groups elasticsearch group. To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/--2SuAbGjkk/unsubscribe. To unsubscribe from this group and all its topics, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j74g0Qqz0VLNDAHDEZ9N%3D2w723vEWaxqicfnQ_Cn1DSfA%40mail.gmail.comhttps://groups.google.com/d/msgid/elasticsearch/CAL6Z4j74g0Qqz0VLNDAHDEZ9N%3D2w723vEWaxqicfnQ_Cn1DSfA%40mail.gmail.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CANRei7AS2acfapr2n6wEkgc8rASEdrUZ9gnKnk8z084AUfTo1A%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
IncompatibleClassChangeError[Implementing class]
Hey! I'm using Elasticsearch 1.1.1 on ubuntu on java 7: java version 1.7.0_55 OpenJDK Runtime Environment (IcedTea 2.4.7) (7u55-2.4.7-1ubuntu1) OpenJDK 64-Bit Server VM (build 24.51-b03, mixed mode) It's working perfectly. But, when I try to upgrade to 1.2.0, elasticsearch won't start: [2014-05-23 02:32:53,812][INFO ][node ] [Larry Bodine]version [1.2.0], pid[43], build[c82387f/2014-05-22T12:49:13Z] [2014-05-23 02:32:53,813][INFO ][node ] [Larry Bodine]initializing ... [2014-05-23 02:32:53,853][INFO ][plugins ] [Larry Bodine]loaded [cloud-aws, jetty, river-couchdb, lang-javascript], sites [bigdesk] [2014-05-23 02:32:56,655][INFO ][node ] [Larry Bodine]initialized [2014-05-23 02:32:56,655][INFO ][node ] [Larry Bodine]starting ... [2014-05-23 02:32:56,800][INFO ][transport] [Larry Bodine]bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/172.17.0.2:9300]} [2014-05-23 02:32:59,846][INFO ][cluster.service ] [Larry Bodine] new_master [Larry Bodine][QBC3No2hQYKosA-cz9Pfog][7c414f6cd187][inet[/172.17.0.2:9300]], reason: zen-disco-join (elected_as_master) [2014-05-23 02:32:59,895][INFO ][discovery] [Larry Bodine] es-cluster/QBC3No2hQYKosA-cz9Pfog [2014-05-23 02:32:59,939][INFO ][gateway ] [Larry Bodine]recovered [0] indices into cluster_state {1.2.0}: Startup Failed ... - IncompatibleClassChangeError[Implementing class] Any idea of what's happening? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/91b5acc5-28bc-470c-badf-0729b8250235%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Reverse River?
I would like to have a river in reverse. Every time a document is inserted or modified I would like to push that into another destination like a database. Ideally this would be async or maybe even in batches. Has anybody done anything like this before? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/45b10d5a-1981-46a5-bb4a-a7ce9273ba7f%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Reverse River?
Some relevant comments: https://github.com/elasticsearch/elasticsearch/issues/1242 -- Ivan On Thu, May 22, 2014 at 8:45 PM, Tim Uckun timuc...@gmail.com wrote: I would like to have a river in reverse. Every time a document is inserted or modified I would like to push that into another destination like a database. Ideally this would be async or maybe even in batches. Has anybody done anything like this before? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/45b10d5a-1981-46a5-bb4a-a7ce9273ba7f%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/45b10d5a-1981-46a5-bb4a-a7ce9273ba7f%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQA3K_0Sa0C_4PeXjkr%3DvzuBzhXSYp91wSTs8ThtdOTbKA%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Elastic search High threads , 100% utilization of non heap memory.
Hi Team, We are experiencing issue with high usage of non heap memory and high thread count. Mostly we are seeing GC process is running. We are watching threads using big desk from past two days. Threads are reaching peak. We are not sure why it is reaching this much high. Two days back we ran optimizer manually. We restarted the servers, still threads are showing peak. Can you please suggest what might be the issues. Is there any way to debug this kind of issue. Most of the times we are getting node not reachable and timeout error. [7.9gb], all_pools {[young] [3.4mb]-[64.4mb]/[266.2mb]}{[survivor] [33.2mb]-[33.2mb]/[33.2mb]}{[old] [2.1gb]-[2.2gb]/[7.6gb]} [2014-05-22 20:42:53,966][WARN ][monitor.jvm ] [Echo] [gc][young][2864][37] duration [2s], collections [1]/[2.7s], total [2s]/[23.1s], memory [2.4gb]-[2.4gb]/[7.9gb], all_pools {[young] [104.9mb]-[3.6mb]/[266.2mb]}{[survivor] [33.2mb]-[33.2mb]/[33.2mb]}{[old] [2.2gb]-[2.4gb]/[7.6gb]} [2014-05-22 20:43:09,921][INFO ][monitor.jvm ] [Echo] [gc][young][2865][46] duration [6.6s], collections [9]/[15.9s], total [6.6s]/[29.7s], memory [2.4gb]-[2.6gb]/[7.9gb], all_pools {[young] [3.6mb]-[98mb]/[266.2mb]}{[survivor] [33.2mb]-[23.9kb]/[33.2mb]}{[old] [2.4gb]-[2.5gb]/[7.6gb]} Below is our configuration. Having 4 node cluster each node having 16GB RAM - Elastic search allocated to 10GB RAM on each node. Total size of data indexed : 450 GB Each node having 350GB space Thanks, Srikanth. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/af014404-0b15-4387-93ca-ec5d3bdc805f%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Connect through proxy
Is it possible to connect with the TransportClient to an ElasticSearch cluster via a socks proxy? If yes, how? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/e75acc5f-3f43-4dfb-b6dd-3d7bfa4f7bfc%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.