Re: Write a plugin to query and aggregate results from multiple shards

2014-09-16 Thread Sandeep Ramesh Khanzode
Appreciate the response as always. Please bear with my technical understanding of ES :) In the TransportSearchAction, the doExecute() delegates to one of the six different search types. It is inside the execute methods of those individual six actions, that they will look at the shards. Correct

HELP : using multiple but not all interfaces for elasticsearch

2014-09-16 Thread HansPeterSloot
Hi, I have 2 nodes with each 2 network interfaces. One of the networks is public and the other is private. I want to use elasticsearch only on the private network and for convenience also on the loopback devices. I have tried multiple ways in the yml file: network.bind_host: [ 192.168.1.213 ,

Re: Is there any way to prevent ES from disclosing exception details in REST response?

2014-09-16 Thread joergpra...@gmail.com
You could just check for the response code 500, and you're done, no need to capture streams. Jörg On Tue, Sep 16, 2014 at 12:53 AM, Alex Roytman roytm...@gmail.com wrote: I guess I could but it would mean passing a response wrapper to capture output stream and then copy it to real request or

Re: HELP : using multiple but not all interfaces for elasticsearch

2014-09-16 Thread David Pilato
You can not bind the same port to 2 IP. This should work: network.host: 192.168.1.213 See details at  http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-network.html#modules-network HTH --  David Pilato | Technical Advocate | Elasticsearch.com @dadoonet | 

Re: Highly variable query performance with ES 1.3.2 (filter + aggregations)

2014-09-16 Thread joergpra...@gmail.com
If you are sure the spikes are caused by the JVM, I recommend to attach a profiler to the JVM, then you can monitor the code. On JVM level, it is hard to trace queries, so maybe you want to test out bleeding edge? Here is a query profiler: https://github.com/elasticsearch/elasticsearch/pull/6699

Re: Highly variable query performance with ES 1.3.2 (filter + aggregations)

2014-09-16 Thread joergpra...@gmail.com
Just saw that the query profiler can not show what the shard execution times are, so maybe this is not a big help. Jörg On Tue, Sep 16, 2014 at 9:24 AM, joergpra...@gmail.com joergpra...@gmail.com wrote: If you are sure the spikes are caused by the JVM, I recommend to attach a profiler to

configuring puppet-elasticsearch with hiera yaml

2014-09-16 Thread RM
Hello, I've picked up a great little utility called wirbelsturm (https://github.com/miguno/wirbelsturm). With it I've managed to automate the creation of Vagrant backed VMs for a large chunk of my infrastructure without much pain. Then I got to elasticsearch. I've tried a fe variations of the

Re: Write a plugin to query and aggregate results from multiple shards

2014-09-16 Thread joergpra...@gmail.com
If you want to use the filter parser plugin - I think you mean https://github.com/lmenezes/elasticsearch-terms-fetch-filter-plugin - then why don't you simply extend the plugin and build a new plugin from that codebase? From what I understand is you somehow want to modify the search action core

ReceiveTimeoutTransportException in logs

2014-09-16 Thread Abhishek Aggarwal
I am connecting to single instance of Elastic Search server remotely via Transport client. In my web application which makes use of Transport client, i am see following messages in the logs: I have checked, my network connection is proper and ES server is up. But still getting these messages

Re: ReceiveTimeoutTransportException in logs

2014-09-16 Thread joergpra...@gmail.com
Maybe you use a network filter / firewall which is misconfigured - no connection is possible, everything seems to time out. You must open TCP and UDP on port 9300 on all the hosts of the cluster nodes if you use TransportClient. Also check if your network can operate regarding other nodes, if

Re: ReceiveTimeoutTransportException in logs

2014-09-16 Thread Abhishek Aggarwal
Thanks for the reply. I am facing this error intermittently. Transport Client works fine sometimes - so it rules out firewall or port related issues. I have only one ES node (version 1.1.1) - Firewall is not configured - TCP and UDP on port 9300 are open - sniff is disabled (I 'm using default

Re: ReceiveTimeoutTransportException in logs

2014-09-16 Thread Mark Walkom
Can you manually test all of that using telnet? Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email: ma...@campaignmonitor.com web: www.campaignmonitor.com On 16 September 2014 20:09, Abhishek Aggarwal boyobo...@gmail.com wrote: Thanks for the reply. I am facing this error

Failed to running hive job with CDH 5.1.2 and ES-Hadoop 2.0.0

2014-09-16 Thread Joe,Yu
Hello list, I have 4 node ES cluster and 6 node CDH running in the lab. The Hive job is as below: hive job=== CREATE TABLE logs (type STRING, time STRING, ext STRING, ip STRING, req STRING, res INT, bytes INT, phpmem INT, agent STRING) ROW FORMAT DELIMITED FIELDS TERMINATED

Re: Failed to running hive job with CDH 5.1.2 and ES-Hadoop 2.0.0

2014-09-16 Thread Costin Leau
Hi, Upgrade to es-hadoop 2.0.1. The error is caused by the fact that you have nodes within the ES cluster without a HTTP/REST point. These are now properly excluded though note, it means they will not be used by es-hadoop. As an alternative, consider enabling HTTP on all your data nodes. On

Re: Filtered not working with has_parent in elasticsearch 1.3.0

2014-09-16 Thread Roopendra Vishwakarma
Any Suggestion?? On Monday, 15 September 2014 23:31:26 UTC+5:30, Roopendra Vishwakarma wrote: In elasticsearch 1.3.0 *filtered* not working with *has_parent*. In *elasticsearch 0.90.5* its working fine. I am using below query. In this query I need add filtered inside

Re: Filtered not working with has_parent in elasticsearch 1.3.0

2014-09-16 Thread Martijn v Groningen
You need to wrap the has_parent query in the query part of the filtered query: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-filtered-query.html#query-dsl-filtered-query I don't see how this query could have worked in 0.90.5, since the format is incorrect, but if

Annotating results?

2014-09-16 Thread James Addison
I have two types stored in an index: locations and activities.An activity has a 'relation' to a location - ie. an activity takes place at a location. Is it possible to get a location search result set that includes the count of activities at each location? Sort of like annotating each

Sorting search results

2014-09-16 Thread Matej Zerovnik
Hello! I'm trying to create a query, that would return the last(sorted by timestamp) 10 hits. I'm using logstash to parse and index my log files... I tried 2 different queries: { query : { filtered : { query: {match : {user : abc}}, query: {match :

Fast upserts, inmemory, fast expiring data aggregations ?

2014-09-16 Thread ddorian43
Hi List, From the looks of it everything is possible but I still have some questions. My application consist of events being upserted that expire after 30 seconds and doing aggregations on those. I always filter on user_id which is also the routing_value. event_fields =

Generate keyword from multiple string

2014-09-16 Thread manish kumar
I have following scanerio SHOP1 sells : apple laptop apple ipad apple phone SHOP2 sells : apple laptop SHOP3 sells : HP laptop i wanted to generate keyword for each shop sells. such that apple ipad ipad apple should show only SHOP1 not SHOP2. How can i generate searchable keyword by

Strange issue with 2 seperate ELK servers

2014-09-16 Thread Kevin M
So I have 1 ELK server setup and working just fine IP is 172.16.40.28. We wanted to build a second one to log different servers and for several reasons keep the data seperate. So I built the new server and setup ELK again, all seems fine. The IP of the new server is 172.16.40.29. When I go to

Re: Failed to running hive job with CDH 5.1.2 and ES-Hadoop 2.0.0

2014-09-16 Thread Joe,Yu
On Tue, Sep 16, 2014 at 7:01 PM, Costin Leau costin.l...@gmail.com wrote: Hi, Upgrade to es-hadoop 2.0.1. The error is caused by the fact that you have nodes within the ES cluster without a HTTP/REST point. These are now properly excluded though note, it means they will not be used by

perform aggregation on the result of a subquery aggregation

2014-09-16 Thread Guillaume
Hi, I'm newbie with Elastic search. I'm validating Elasticsearch regarding our needs. Lets say I want to monitor disk usage of my VMs. - vm1 and vm2 are in Platform PF_A, vm3 is in platform PF_B The mapping I declared (can be pasted in sense) PUT /example_201408/vm/_mapping { _timestamp

Some efficient ways to export data other then JSON from Elasticsearch.

2014-09-16 Thread John Smith
Hi, building some sort of internal tool to export data from Elasticsearch and I would liek to offer csv or XML. Just wondering what options there are... Bassically a user can login to a front end (No I cannot use what is out there, it's only a small portion of a larger tool within the

Re: Some efficient ways to export data other then JSON from Elasticsearch.

2014-09-16 Thread John Smith
Also it has to be done on the back end so JAVA it is... On Tuesday, 16 September 2014 10:04:44 UTC-4, John Smith wrote: Hi, building some sort of internal tool to export data from Elasticsearch and I would liek to offer csv or XML. Just wondering what options there are... Bassically a

Re: Failed to running hive job with CDH 5.1.2 and ES-Hadoop 2.0.0

2014-09-16 Thread Costin Leau
In Gibhub under issues [1] or in the release notes for the 2.0.1 release. Most likely, you are facing issue #210. [1] https://github.com/elasticsearch/elasticsearch-hadoop/issues?q=is%3Aissue+label%3Av2.0.1+is%3Aclosed On 9/16/14 4:52 PM, Joe,Yu wrote: On Tue, Sep 16, 2014 at 7:01 PM,

Re: ReceiveTimeoutTransportException in logs

2014-09-16 Thread Pawan Sharma
We are also facing this kind of issue in es version 1.1.1. Some node gets disconnected and while analyzing the logs in that disconnected node we got a lot connection time out error. So sometime this issue gets solved by restarting the master node, but sometime we may need to restart the whole

Re: Some efficient ways to export data other then JSON from Elasticsearch.

2014-09-16 Thread David Pilato
You need to use the scan and scroll API for that. See  http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-scroll.html#scroll-scan This class could help you in Java: 

Re: Field Data Cache Size and Eviction

2014-09-16 Thread Philippe Laflamme
Sorry for bumping this, but I'm a little stumped here. We have some nodes that are evicting fielddata cache entries for seemingly no reason: 1) we've set indices.fielddata.cache.size to 10gb 2) the metrics from the node stats endpoint show that the indices.fielddata.memory_size_in_bytes never

Re: Received response for a request that has timed out Error

2014-09-16 Thread Pawan Sharma
So you need to restart that node. On Tue, Sep 16, 2014 at 12:46 AM, shriyansh jain shriyanshaj...@gmail.com wrote: Hi, I am getting the following error in the elasticsearch log file. I have a cluster of 2 elasticsearch nodes, and have a setup of ELK stack with redis as a buffer. Everything

Elasticsearch.Net, strange deserialization after .Index

2014-09-16 Thread Lasse Schou
Hi, (let me know if this is not the right place to post ElasticSearch.Net questions). I'm indexing a document of type User through ElasticSearch.Net with this command (key is a string guid): client.IndexUser(index, user, key, user); This invokes the serializer and stores the json in my ES

Re: Some efficient ways to export data other then JSON from Elasticsearch.

2014-09-16 Thread John Smith
Yep, already doing that part actually... Was just wondering I guess the best way to deserialize from json to xml for instance. I suppose it's slightly off topic but what are some good json to xml converters. On Tuesday, 16 September 2014 10:23:05 UTC-4, David Pilato wrote: You need to use

Re: Some efficient ways to export data other then JSON from Elasticsearch.

2014-09-16 Thread Costin Leau
When it comes to JSON, Jackson should be at the top of your list. It's an excellent library and it has plenty of support for XML [1] [1] https://github.com/FasterXML/jackson-dataformat-xml On 9/16/14 5:48 PM, John Smith wrote: Yep, already doing that part actually... Was just wondering I

[scripting score] quick way to see if a value is or is not in a field of a type array

2014-09-16 Thread NM
Hi guys, I have objects with 3 fields of type array containing a large amount of integers. these integers are mutually excluded between fields = if an integer is in the field1, it can't be in the field2 or field3, and vise-versa for instance object_1: { field1: [1,4,5,8], field2:

Re: Some efficient ways to export data other then JSON from Elasticsearch.

2014-09-16 Thread John Smith
Hadn't looked at Jakson for a while but it seems to do both XML and CSV (limited to json that represents tabular data) On Tuesday, 16 September 2014 10:48:58 UTC-4, John Smith wrote: Yep, already doing that part actually... Was just wondering I guess the best way to deserialize from json to

Re: [Version 1.3.2] Root type mapping not empty after parsing! Remaining fields:

2014-09-16 Thread Jack Park
A solution was found here http://stackoverflow.com/questions/22071198/adding-mapping-to-a-type-from-java-how-do-i-do-it On Mon, Sep 15, 2014 at 4:16 PM, Jack Park jackp...@topicquests.org wrote: I got this on 1.2.2 and found on the web that it was a bug. So, I upgraded to 1.3.2 and got the

Re: suggest without stemming

2014-09-16 Thread Julien Ricard
Hello, I have the exact same issue. I wonder how to get full strings instead of their stems which is not what I expect from a suggest query. Don't have any solution yet. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this

Re: java.lang.NoSuchFieldError: ALLOW_UNQUOTED_FIELD_NAMES when trying to query elasticsearch using spark

2014-09-16 Thread SAURAV PAUL
Oh. Sorry :-) On Mon, Sep 15, 2014 at 3:27 AM, Mark Walkom ma...@campaignmonitor.com wrote: You probably want to put this in your own thread :) Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email: ma...@campaignmonitor.com web: www.campaignmonitor.com On 15 September

data from hive table

2014-09-16 Thread ibmuser1
Hi, my hadoop version is 1.1.1 and hive version is 0.9.0 (biginsights installation). I am trying to push data from existing hive table(s) into elasticsearch. My job fails with the following error. I copied hive script as well below the error. Not sure what I am doing wrong. Can you

OLAP analytics in Elasticsearch

2014-09-16 Thread Maaz
I am working with Analytic of events, I use hadoop to process the logs and store some results in Mysql. This did not work now due to scalability issues as logs are keep coming daily. We need to show stats per year, month, week, day, hour along with filtering capability Our samples can grow for

Re: Received response for a request that has timed out Error

2014-09-16 Thread shriyansh jain
Thank you.! I got that working. On Tuesday, September 16, 2014 7:25:39 AM UTC-7, pawansharma2045 wrote: So you need to restart that node. On Tue, Sep 16, 2014 at 12:46 AM, shriyansh jain shriyan...@gmail.com javascript: wrote: Hi, I am getting the following error in the elasticsearch

Cluster troubles, Azure related?

2014-09-16 Thread Tim Heikell
We are prepping to launch our app into production and seem to be having some stability issues. We have a cluster of 4 VMs on Azure that all use the Azure plugin for discovery. Most of the time it works as expected, but sometimes it looses its mind. This morning for example, I made adjustments

Re: Cluster troubles, Azure related?

2014-09-16 Thread joergpra...@gmail.com
It looks like you did not configure minimum_master_nodes Jörg On Tue, Sep 16, 2014 at 8:00 PM, Tim Heikell tim.heik...@heapsylon.com wrote: We are prepping to launch our app into production and seem to be having some stability issues. We have a cluster of 4 VMs on Azure that all use the

Re: Cluster troubles, Azure related?

2014-09-16 Thread Tim Heikell
Thanks for the reply Jörg. I have discovery.zen.minimum_master_nodes=2. Should it be something different? On Tuesday, September 16, 2014 11:21:16 AM UTC-7, Jörg Prante wrote: It looks like you did not configure minimum_master_nodes Jörg On Tue, Sep 16, 2014 at 8:00 PM, Tim Heikell

Re: Cluster troubles, Azure related?

2014-09-16 Thread Tim Heikell
Ah, I just found the n/2+1 recommendation, so I expect I need to set it to 3. On Tuesday, September 16, 2014 11:30:38 AM UTC-7, Tim Heikell wrote: Thanks for the reply Jörg. I have discovery.zen.minimum_master_nodes=2. Should it be something different? On Tuesday, September 16, 2014

2 Exact Same Documents Being Ranked Differently

2014-09-16 Thread Randy Jensen
I'm trying to track down an issue where 2 simple documents I'm testing are being ranked quite a bit differently. For testing purposes, I'm only searching against one field, keywords. The only word in that field for both documents is jefferson. However, when I search for the word jefferson, one

Accuracy issue of aggregation results

2014-09-16 Thread Yifan Wang
It seems to be a common problem that the top N results returned from an aggregation query is inaccurate due to uneven distribution of matching documents on different shards, because ES will collect top N buckets from each shard no matter actually how many hits are on each shard. It is very

Re: Strange issue with 2 seperate ELK servers

2014-09-16 Thread Mark Walkom
By default ES uses a discovery method that allows any node with the same cluster name to join an existing node with the same cluster name, thereby forming one cluster. http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-discovery-zen.html and you want to look at unicast

Just cannot seem to make progress with rsyslog and Logstash

2014-09-16 Thread Marty Hillman
I even bought the book and rebuilt my test environment servers from scratch, but I still have the same issues. On the central server, I have redis, logstash 1.4 and elasticsearch 1.3 installed - all from apt repositories. I verified that all services are started and I can curl results from

Re: Just cannot seem to make progress with rsyslog and Logstash

2014-09-16 Thread Mark Walkom
You should ask this over on the logstash list - https://groups.google.com/forum/?hl=en-GB#!forum/logstash-users :) Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email: ma...@campaignmonitor.com web: www.campaignmonitor.com On 17 September 2014 06:04, Marty Hillman

Re: Just cannot seem to make progress with rsyslog and Logstash

2014-09-16 Thread Marty Hillman
Thanks Mark. Thought this was that list. :-) On Tuesday, September 16, 2014 3:08:45 PM UTC-5, Mark Walkom wrote: You should ask this over on the logstash list - https://groups.google.com/forum/?hl=en-GB#!forum/logstash-users :) Regards, Mark Walkom Infrastructure Engineer Campaign

Re: Accuracy issue of aggregation results

2014-09-16 Thread Matt Weber
Hi Yifan, Nothing dynamic, but you can increase the number of terms collected on each shard to increase the accuracy [1]. Might also want to play with the shard_min_doc_count value if you know certain shards have a low hit count and are throwing off the aggregations [2]. [1]

Sorting and Pagination

2014-09-16 Thread Matt Hughes
I have logstash indicies that go back thirty days. I have logs in those indices from today. If I do a search with: size: 500, sort: [ { @timestamp: { order: desc, ignore_unmapped: true } } ] I don't get any logs from today. If I limit the search

slow execution of nested boolean filter

2014-09-16 Thread Kireet Reddy
I have a query with a nested boolean (boolean within a boolean) filter with a should clause that performs really terribly. But if I move the nested query up to top level, it performs as much as 50x faster. I am struggling to understand why this is the case. Here are the 2 forms:

Re: better places to store es.nodes and es.port in ES Hive integration?

2014-09-16 Thread Jinyuan Zhou
I have confirmed with both elasticsearch hive and easticsearcg mr, If both below situation happens, , EsOutFormat produces invalid header for bulk indexing. 1. es.resouce contains data to be extracted from doucment 2. es.mapping.id set to be one of field sin document I looked at the code

Search performance - Scaling options Horizontally vs Vertically

2014-09-16 Thread Venkat Pavan Bellapu Konda
I have observed that elastic search defaults the search thread pool to 3 X #of CPUs and even if you increase this to a fix # it does not really help as the threads start sharing the CPU cycles. Does this mean that to get same performance results for more concurrent searches I either have to

Re: better places to store es.nodes and es.port in ES Hive integration?

2014-09-16 Thread Costin Leau
Please upgrade to version 2.0.1 On 9/17/14 1:18 AM, Jinyuan Zhou wrote: I have confirmed with both elasticsearch hive and easticsearcg mr, If both below situation happens, , EsOutFormat produces invalid header for bulk indexing. 1. es.resouce contains data to be extracted from doucment 2.