[HADOOP] [Spark] Problem with encoding of parentId containing backslash

2015-01-29 Thread Neil Andrassy
Hi list, I have an RDD with a field included that contains an ID that I'd like to become the parent document when I execute saveToEs (all authored in scala). Something like this... { units_sold: 100, unit_price: 8.99, revenue: 899, parentId: binlin\\staglow(L28AF) //i.e. it has

[HADOOP] [Spark] Problem with encoding of parentId containing backslash

2015-01-29 Thread Neil Andrassy
Hi list, I have an RDD with a field included that contains an ID that I'd like to become the parent document when I execute saveToEs (all authored in scala). Something like this... { units_sold: 100, unit_price: 8.99, revenue: 899, parentId: maplin\\staging(L28AF) //i.e. it has

Re: ES with Hadoop

2015-01-29 Thread Costin Leau
I'm not sure whether you have one or multiple questions but it's perfectly fine to use ES for both storage and search. You can use HDFS as a snapshot/backup store to further improve the resilience of your system. Millions of documents is not an issue for ES On 1/29/15 4:29 PM, Manoj Singh

any chance to get rid of this query_string?

2015-01-29 Thread menschguenther
Hi all, i need to write a search query which collects documents from 3 types in my index. basically i would use a multi_match like { multi_match : { query:SearchQuery, fields: [ account.name, group.title, post.content ] } } but the result of this query needs to be filtered

Aggregation of count of terms (possibly...)

2015-01-29 Thread 'Clive Lawrence' via elasticsearch
Hi all, This is my first post as I'm relatively new to ElasticSearch, Logstash, Kibana etc. and I'm really enjoying the challenge of learning it all and applying it! I'm reasonably familiar with basic aggregations now, but I'm trying to produce a particular report from an index and I would

Elasticsearch on Yarn

2015-01-29 Thread Ramdev Wudali
Hi: The online documentation for Elasticsearch on yarn (version 2.1.0-Beta) indicates that ... Elasticsearch on YARN is a separate, stand-alone, self-container CLI (command-line interface)... Does this mean that this instance of Elasticsearch will only be accessible via CLI ? (curl commands

Connect to standalone node using TransportClient

2015-01-29 Thread Abid Hussain
Hi all, I'm running a standalone node (by using *node.local: true* in elasticsearch.yml) and want to connect to this node via TransportClient which fails. Connecting to the node via Sense succeeds. I didn't change the cluster.name property in elasticsearch.yml. My Code is: Client client =

Re: Importing Large Amounts of Data to Production Indices

2015-01-29 Thread webish
Hi Mark, Right now. 28 GB across two indices 5 shards 1 replica per index on 3 AWS large servers. Frequently 1-10 million records or more get imported. During this time all ES nodes hit a CPU usage of over 75%. We want to break the index down and add routing at some point. Refresh is

Re: Shield: node-to-node communication performance

2015-01-29 Thread David Pilato
Shay twitted this about this matter: https://twitter.com/kimchy/status/560124652472008704 https://twitter.com/kimchy/status/560124652472008704 Shay Banon @kimchy https://twitter.com/kimchyFollow https://twitter.com/kimchy @m_hughes https://twitter.com/m_hughes yes, it affects performance,

Re: Shield: node-to-node communication performance

2015-01-29 Thread Jin Huang
Well...this is hardly a satisfactory answer. Of course I expect a slowdown because encryption takes down. But how much, and what data does shield encrypt (e.g. only the initial authentication step or every bit of communication)? For example, I would not be surprised if Shield does the simplest

Shield: node-to-node communication performance

2015-01-29 Thread Jin Huang
Hi, Can anyone shed some light on the impact of Shield on performance, assuming that secured communication is enabled for node to node communication? When Elasticsearch team says that node-to-node encryption is enabled, does it mean that every bit of data transported on port 9300 is encrypted?

config tokenizer for each type in config file

2015-01-29 Thread weiyiju1992
Hi there I have two different types of data, for one type , I dont want it to be tokenized, so I write the config file elasticsearch.yml like this: index.analysis.analyzer.default: type: custom tokenizer: keyword filter: standard But, for other type of data, I want it be tokenized by

Re: optimize elasticsearch / JVM

2015-01-29 Thread Arie
Just an idea. You could try running two ES instances as a cluster on one machine if there is no other option. On Wednesday, January 28, 2015 at 2:09:22 PM UTC+1, Oto Iashvili wrote: Hi I have a website for classified. For this I'm using elasticsearch, postgres and rails on a same ubuntu

Re: Importing Large Amounts of Data to Production Indices

2015-01-29 Thread Mark Walkom
You should be using the bulk API, that's what it exists for! On 29 January 2015 at 19:13, webish greg...@yoursports.com wrote: Hi Mark, Right now. 28 GB across two indices 5 shards 1 replica per index on 3 AWS large servers. Frequently 1-10 million records or more get imported. During

Re: not able to refine from o/p of query in logstash

2015-01-29 Thread raj@
Can anyone help me on this problem, please ! -- View this message in context: http://elasticsearch-users.115913.n3.nabble.com/not-able-to-refine-from-o-p-of-query-in-logstash-tp4069573p4069775.html Sent from the ElasticSearch Users mailing list archive at Nabble.com. -- You received this

Re: optimize elasticsearch / JVM

2015-01-29 Thread Oto Iashvili
why not ? Could u tell me how to do such ? and also explain why will it be better ? thanks a lot for your help On Thursday, January 29, 2015 at 10:02:00 AM UTC+1, Arie wrote: Just an idea. You could try running two ES instances as a cluster on one machine if there is no other option. On

Re: Need review for my REST query (template modification)

2015-01-29 Thread Aldian
Don't tell me nobody here ever made such a simple request? On Thursday, January 22, 2015 at 11:57:26 AM UTC+1, Aldian wrote: Hi! I am using the usual ELK stack with the default template ( http://pastebin.com/DtYiazVr

Re: Need review for my REST query (template modification)

2015-01-29 Thread Magnus Bäck
On Thursday, January 22, 2015 at 11:57 CET, Aldian aldian...@gmail.com wrote: I am using the usual ELK stack with the default template ([1]http://pastebin.com/DtYiazVr). In every log message, the date in stored in field named log_date, which the date filter converts in a @timestamp. I

Re: logstash / kibana can't connect to instance

2015-01-29 Thread Magnus Bäck
On Thursday, January 29, 2015 at 06:51 CET, ma...@venusgeo.com wrote: Can anyone please look into this. This is a volunteer-based mailing list. If want a 24-hour SLA there are paid options for that. On Wednesday, January 28, 2015 at 5:43:23 AM UTC-8, ma...@venusgeo.com wrote: I don't

Re: Connect to standalone node using TransportClient

2015-01-29 Thread David Pilato
What about not setting node.local: true? -- David ;-) Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs Le 29 janv. 2015 à 14:18, Abid Hussain huss...@novacom.mygbiz.com a écrit : After doing some research, It seems to me that it is not possible to connect to a node configured as

Re: Connect to standalone node using TransportClient

2015-01-29 Thread Abid Hussain
Thanks for help. As you can see in the original quesion above, I already tried setting node.local: true. This works on server side, but I'm not able to connect to the node via TransportClient using the Java API. My requirements are: * run elasticsearch as single node * Use Java API to perform

multi_match query and match _all returns different set of results

2015-01-29 Thread Jan Prokeš
Hi there, I need to search in multiple fields where I do not know field names in advance, so I can't use multi_match syntax. So I found, that _all field aggregates all fields set to be included in all in mapping. Unfortunately it returns different result set, that multi_match. Here is complete

Issue removing index with index.blocks.read_only set to true

2015-01-29 Thread 'Nicolas Fraison' via elasticsearch
Hi, I was trying the settings to block data writes and also metadat writes to an index applying this: curl -XPUT 'http://testserver:9200/test_index/_settings' -d '{ index.blocks.read_only : true }' which works fine but now I would like to remove this index and I'm facing this issue;

Kibana 4 Not searching all fields

2015-01-29 Thread rrspyder
I am having an issue with queries in Kibana. It seems it is not searching all the fields. I have to specify id:1 or something similar to actually get any results. I am trying to figure out what configuration would cause this to happen? Would it have anything to do with the Elasticsearch

Re: Kibana 4 behind reverse proxy. Is it possible?

2015-01-29 Thread Konstantin Erman
Thank you for the good news! I'm a little swamped currently, but I will definitely give it a try when I get a minute. Just to make sure - disable Output cache for the website - where is it in IIS Management Console? On Wednesday, January 28, 2015 at 4:38:01 PM UTC-8, Cijo Thomas wrote: Its

Re: Kibana 4 behind reverse proxy. Is it possible?

2015-01-29 Thread Cijo Thomas
I have been fighting with this for quite some time, Finally found the workaround. Let me know if it helps you! On Thu, Jan 29, 2015 at 10:12 AM, Konstantin Erman kon...@gmail.com wrote: Thank you for the good news! I'm a little swamped currently, but I will definitely give it a try when I get

ES with Hadoop

2015-01-29 Thread Manoj Singh
Hi, I have one question related to performance of ES with Hadoop. Our Architecture: 1) use hadoop for storage big data as we have millions of data. 2) Feed to ES from Hadoop via API. 3) Search will work through ES. Will this architecture have performance issue ? OR We simple use ES for

Re: pagination with range queries giving duplicate results

2015-01-29 Thread Amish Asthana
Hi David We are aware of scroll API, and are not using it as it will not scale. That is the very reason I was stressing the fact that there is no update/delete/create; as with multiple queries all bets are off if any of this thing happen. However with steady state)no change in data) I would

Re: Connect to standalone node using TransportClient

2015-01-29 Thread Abid Hussain
Thanks to both, David and Jürgen. I used Davids solution which works well for know and keep in mind Jürgens proposal for production installation. Best regards, Abid Am Donnerstag, 29. Januar 2015 09:05:15 UTC+1 schrieb Abid Hussain: Hi all, I'm running a standalone node (by using

Re: Connect to standalone node using TransportClient

2015-01-29 Thread David Pilato
So disable multicast and you are done. See elasticsearch.yml file comments. -- David ;-) Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs Le 29 janv. 2015 à 14:56, Abid Hussain huss...@novacom.mygbiz.com a écrit : Sorry, I overread the not in you post ;-) Removing node.local:true

Re: Connect to standalone node using TransportClient

2015-01-29 Thread Abid Hussain
Sorry, I overread the not in you post ;-) Removing *node.local:true* works in terms that I am then able to connect to node via TransportClient. The reason for using *node:local:true *is that *I want to run several independent nodes in my network that do not communicate with each other.* ...?

Re: Connect to standalone node using TransportClient

2015-01-29 Thread Jürgen Wagner (DVT)
Hello Abid, you may bind the Elasticsearch network/transport interface to 127.0.0.1, prohibiting any connections from outside the local machine. This will effectively give you a fully-functional local node with transport connections enabled locally - not over the network from other machines.

Es-Hadoop Doc Value Access

2015-01-29 Thread Elliott Bradshaw
I'm curious about reaching deeper into the lucene internals with es-hadoop, in a similar way that the aggregations module works. While aggregations are amazing, there are cases where they aren't an ideal solution, mainly due to the inability to shuffle/repartition the data as it moves through

Re: [HADOOP] [Spark] Problem with encoding of parentId containing backslash

2015-01-29 Thread Costin Leau
What es-hadoop/spark are you using? Can you post snippet/gist on how you are calling saveToEs and what the Es-spark configuration looks like (does the RDD contain JSON or rich objects, etc..)? There are multiple ways to specify the parentId and in master (dev build) this should work no problem.

Changing TTL of all documents in an index?

2015-01-29 Thread Kevin Burton
What's the best way to change the TTL of all documents already written in an index? Can up just update the TTL or do I have to re-index everything? I was thinking that if I have to update the TTL often maybe I just write a manual garbage collector and do my own cleanup. -- You received this

Re: partial update elaticsearch-perl

2015-01-29 Thread Clinton Gormley
Hi Jorge The `doc` should be passed in the `body` parameter: $e-update( index = 'myindex', type = 'mytype', id= mykey, body = { doc = { link = http://www.nw-kicoso.com;, sortierung = 5 } } ); On 8 December 2014 at 10:29, Jorge von Rudno

How does ES do parallel dispatch for searches?

2015-01-29 Thread Kevin Burton
I can't RTFM on this because I can't find the documentation. It looks like some of our queries are taking about 1 second per index shard per index. However, the drives are still have low utilization. Around 10% ... so I'm trying to figure out how to improve performance. My hunch is that I

Re: Kibana 4 behind reverse proxy. Is it possible?

2015-01-29 Thread Konstantin Erman
Unfortunately I could not replicate your success :-( Let me show you what I did in case you may be notice any difference from your case. https://lh6.googleusercontent.com/-HzQRKhGl9ag/VMqfkWnSF8I/Ah0/SsXrJlQ2vW8/s1600/Output_Caching.png

Re: [HADOOP] [Spark] Problem with encoding of parentId containing backslash

2015-01-29 Thread Neil Andrassy
I'm using ES-Hadoop 1.2.0.Beta3 Spark variant with Scala 2.10.4 and Spark 1.1.0 Hadoop 2.4 (but without an actual Hadoop installation - I'm running on Windows). I'm working with a Map-based RDD rather than json. https://gist.github.com/andrassy/273179ed7cb01a38973d is a short example that

Re: How does ES do parallel dispatch for searches?

2015-01-29 Thread Mark Walkom
Each shard is queried in parallel. But if you don't have enough threads to query multiple shards at once, then it's not the strict definition of parallel as it has to context switch. On 30 January 2015 at 11:05, Kevin Burton burtona...@gmail.com wrote: Ha. I appreciate the feedback but this

Re: Kibana 4 behind reverse proxy. Is it possible?

2015-01-29 Thread Cijo Thomas
Can you show your URL rewrite rules ? Also are you using Kibana 4 beta 3 ? On Thu, Jan 29, 2015 at 1:09 PM, Konstantin Erman kon...@gmail.com wrote: Unfortunately I could not replicate your success :-( Let me show you what I did in case you may be notice any difference from your case.

Re: How does ES do parallel dispatch for searches?

2015-01-29 Thread Kevin Burton
Ha. I appreciate the feedback but this doesn't answer my question. Does it query them sequentially or in parallel. Using parallel dispatch can dramatically improve performance so I'm trying to track down how this works. and I'm aware that the documentation was there, but I couldn't find

Re: How does ES do parallel dispatch for searches?

2015-01-29 Thread Kevin Burton
I assume you mean hardware threads? What I want to avoid is a configuration setting. I want all the shards to execute in parallel. Not totally concerned about the physical hardware mapping as in practice this will be a few hundred nanoseconds :-P On Thursday, January 29, 2015 at 4:09:15 PM

Re: Kibana 4 behind reverse proxy. Is it possible?

2015-01-29 Thread Konstantin Erman
Yes, Kibana 4 beta 3. And I have just one URL rewrite rule (pictured). Were you getting the same error when it was not working for you? https://lh3.googleusercontent.com/-oDiu_ncjJlA/VMrEJL-Qj_I/Aic/so2IvrgTQbY/s1600/RewriteRule.png On Thursday, January 29, 2015 at 3:31:56 PM UTC-8,

Re: [HADOOP] [Spark] Problem with encoding of parentId containing backslash

2015-01-29 Thread Costin Leau
I suggest trying master (the dev build - see the docs for more information[1]). You should not have to use the JSON format. By the way, one addition in master is that you can use case classes instead of Maps and es-spark will know how to serialize them. That plus having the metadata separated from

Index size and no of docs is unknown

2015-01-29 Thread bvnrwork
Hi , am ingesting 6 million docs in elastic search,after 2.8 million docs ingested head show unkown for size and no of docs for the index . Any idea ? Any way I can use this index ?

Re: How does ES do parallel dispatch for searches?

2015-01-29 Thread Mark Walkom
Then each is queried in parallel. On 30 January 2015 at 11:18, Kevin Burton burtona...@gmail.com wrote: I assume you mean hardware threads? What I want to avoid is a configuration setting. I want all the shards to execute in parallel. Not totally concerned about the physical hardware mapping

Re: Building a Cluster / Adding a node.

2015-01-29 Thread GWired
Got it going as a service... ugh.. the user I was using didn't have rights to run a service. Had to do it in the services.msc instead of service manager. On Thursday, January 29, 2015 at 10:09:46 PM UTC-5, GWired wrote: I've been messing with things on Host2 and it will no longer Start as a

Re: any chance to get rid of this query_string?

2015-01-29 Thread vineeth mohan
Hello , You need to use a bool query http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-bool-query.html or a filtered query http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-filtered-query.html for this purpose. In bool , you can mix and

Best Bets implementation in Elastic Search

2015-01-29 Thread Nitesh Earkara
Hi All, I need to implement best bets using elastic search. Where in few results will be ranked and displayed at top depending upon the keyword searched by user. Please let me know if such implementation is possible using elasticsearch. If yes, any link/white paper/information on this would

Re: top_hits and post_filter

2015-01-29 Thread Radim Novotny
Thanks, that did the trick :) Radim Dne pondělí 26. ledna 2015 10:02:24 UTC+1 David Pilato napsal(a): Can you use a filter agg? http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-bucket-filter-aggregation.html David Le 26 janv. 2015 à 09:46, Radim

Re: how to measure the performance of ELK system?

2015-01-29 Thread Yiming Li
Hi Mark, Thanks for the reply. I will definitely try adding timestamp in the mapping, as discussed here: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-timestamp-field.html. It seems that logstash will also generate a default @timestamp, if there is no

Re: Building a Cluster / Adding a node.

2015-01-29 Thread Mark Walkom
Just set discovery.zen.ping.unicast.hosts: [host1.mydomain.com, host2.mydomain.com] on both hosts, unless you are changing the port it will use the default. Also, cluster.name needs to be exactly the same on both hosts. On 30 January 2015 at 14:35, GWired garrettcjohn...@gmail.com wrote: Got