Re: ElasticSearch Version problem

2014-05-22 Thread Mukul Gupta
There seems to be some problem when indexing Mysql data in ES. These are the logs of ES: [][DEBUG][NodeClient ] after bulk [18650] [succeeded=93255] [failed=0] [5ms] [][DEBUG][NodeClient ] before bulk [18654] of 5 items, 2407 bytes, 1 outstanding bulk requests

index paramerters

2014-05-22 Thread anass benjelloun
Hello, I need to index 100 000 documents with 1Mo. This is my configuration of ElasticSearch index: index: { type: doc, bulk_size: 100, number_of_shards : 5, number_of_replicas : 2 } I need to know what each parameters effect. -- You received this message because you

Creating Index not with field _id, but with my field/column name.

2014-05-22 Thread dharmendra pratap singh
Hi Guys, Can someone guide me, If I want to create the index with my field name not with _id column. looking for some help from your end. Regards Dharmendra -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and

Number of characters in field

2014-05-22 Thread David Nielsen
Hi. I am trying to find a way to express a character count filter in a querystring, for instance: I need to find all documents with field subject that holds less than 20 chars. How would i do that in a querystring ? /David -- You received this message because you are subscribed to the Google

Re: Aliases and percolators

2014-05-22 Thread Martijn v Groningen
Yes, that is correct. Martijn On 21 May 2014 02:34, Mark Dodwell m...@mkdynamic.co.uk wrote: Many thanks, that is a super clear answer. So, until that issue is addressed, am I correct in thinking I should do this when percolating an existing document: ``` curl

Store Elasticsearch Indices in a revision/audit-proof way

2014-05-22 Thread horst knete
Hey guys, in order to meet the german laws for logging, i got the order to store the elasticsearch indices in a revision/audit-proof way(Indices cannot be edited/changed after the storage). Are there any best practices or tips for doing such a thing?(maybe any plugins?) Thanks for your

Re: Store Elasticsearch Indices in a revision/audit-proof way

2014-05-22 Thread horst knete
Yeah it looks like that this would do the job, thanks for response Am Donnerstag, 22. Mai 2014 10:40:19 UTC+2 schrieb Mark Walkom: You can set indexes to readonly - http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-update-settings.html Is that what you're after?

Elasticsearch Facets + limit results

2014-05-22 Thread Martijn Dwars
I'm trying to construct the following SQL query in Elasticsearch: SELECT companyId, COUNT(*) c FROM visits GROUP BY companyId ORDER BY c DESC LIMIT 2 I came up with the following JSON body for the query: { facets: { company: { filter: { term: { entityType:

Re: Store Elasticsearch Indices in a revision/audit-proof way

2014-05-22 Thread Mark Walkom
Keep us up to date with your project, I'm sure there would be interested from others on a similar setup. Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email: ma...@campaignmonitor.com web: www.campaignmonitor.com On 22 May 2014 18:46, horst knete baduncl...@hotmail.de wrote:

Re: Store Elasticsearch Indices in a revision/audit-proof way

2014-05-22 Thread joergpra...@gmail.com
You have to add a facility to your middleware that can trace all authorized operations to your index (access, read, write, modify, delete) and you must write this to an append-only logfile with timestamps. If there is interest I could write such a plugin (assuming it can run in a trusted

How to speed up indexing by using Python API

2014-05-22 Thread 潘飞
Hi all: Now , I am trying to index my logs by using the elasticsearch Python API, but I only get about 600 records/s indexing speed. but, on the same ES cluster, with the same data, logstash(redis - logstash - elasticsearch) can index data at the speed about 3000records/s. any advice on how

Re: Store Elasticsearch Indices in a revision/audit-proof way

2014-05-22 Thread horst knete
Hi Jörg, thanks for your offer. I will contact you if there´s a need for such an plugin in our company. Also i will keep you up to date if there´s breaking changes in our project. Am Donnerstag, 22. Mai 2014 10:55:44 UTC+2 schrieb Jörg Prante: You have to add a facility to your middleware

Re: how manage insert and update sql (river) ?

2014-05-22 Thread joergpra...@gmail.com
If you use the column name _id, you can control the ID of the ES document you created by SQL. If you do not use _id, a random doc ID is generated. See the README at https://github.com/jprante/elasticsearch-river-jdbc Jörg On Thu, May 22, 2014 at 11:43 AM, Tanguy Bernard

issue with client.admin().cluster().prepareGetSnapshots(...)

2014-05-22 Thread Chetana
The call to prepareGetSnapshots(...) for getting a snapshot which is not existing, throws SnapshotMissingException. But I expect instead it should return response with a list of zero snapshots(getSnapshots()) or atleast isExist=false Is there any other way one can check the existence of a

File Descriptors

2014-05-22 Thread Shawn Ritchie
Hi guys, Kind of stuck with a fresh installation of an ElasticSearch cluster. everything is installed file descriptor limits are set yet when I run curl -XGET http://10.0.8.62:9200/_nodes?os=trueprocess=truepretty=true; stats.txt I get process : { refresh_interval : 1000,

Re: File Descriptors

2014-05-22 Thread Mark Walkom
What OS and how did you install it? (Running as root is a really bad idea by the way!) Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email: ma...@campaignmonitor.com web: www.campaignmonitor.com On 22 May 2014 20:19, Shawn Ritchie xritc...@gmail.com wrote: Hi guys, Kind of

Re: File Descriptors

2014-05-22 Thread Shawn Ritchie
so this issue only occurs on server restart. If I had to restart elasticsearch service it would load the correct number of file descriptors. Regards Shawn On Thu, May 22, 2014 at 12:19 PM, Shawn Ritchie xritc...@gmail.com wrote: Hi guys, Kind of stuck with a fresh installation of an

Re: File Descriptors

2014-05-22 Thread Shawn Ritchie
CentOS 6.5 and Java 1.7u55 On Thu, May 22, 2014 at 12:28 PM, Shawn Ritchie xritc...@gmail.com wrote: so this issue only occurs on server restart. If I had to restart elasticsearch service it would load the correct number of file descriptors. Regards Shawn On Thu, May 22, 2014 at 12:19

Re: File Descriptors

2014-05-22 Thread Mark Walkom
Did you use the RPMs? Where are you setting the ulimit? Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email: ma...@campaignmonitor.com web: www.campaignmonitor.com On 22 May 2014 20:30, Shawn Ritchie xritc...@gmail.com wrote: CentOS 6.5 and Java 1.7u55 On Thu, May 22, 2014

Re: File Descriptors

2014-05-22 Thread Shawn Ritchie
No I did not use RPM used .tar for the installation process and my ulimit settings are in /etc/security/limits.conf * - nofile 65535 /etc/sysctl.conf fs.file-max = 512000 On Thu, May 22, 2014 at 12:37 PM, Mark Walkom ma...@campaignmonitor.comwrote: Did you use the RPMs? Where are you

how to get only aggregation values from elasticsearch

2014-05-22 Thread Subhadip Bagui
Hi, I want to get the average value of MEMORY field from my ES document. Below is the query I'm using for that. Here I'm getting the aggregation along with the hits Json also. Is there any way we can get the aggreation result only. Please suggest. POST /virtualmachines/_search { query : {

Nodes restarting automatically

2014-05-22 Thread Jorge Ferrando
Hello We have a cluster of 3 nodes running Ubuntu 12.04.4 LTS 64bits, and elasticsearch v1.1.1 It's be running flawlessly but since the last weak some of the nodes restarts randomly and cluster gets to red state, then yellow, then green and it happens again in a loop (sometimes it even

Re: Nodes restarting automatically

2014-05-22 Thread Mark Walkom
How are you running the service, upstart, init or something else? ES shouldn't just restart on it's own, this could be something else like the kernel's OOM killer. Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email: ma...@campaignmonitor.com web: www.campaignmonitor.com On 22

Re: Nodes restarting automatically

2014-05-22 Thread Jorge Ferrando
elasticsearch nodes are launched through /etc/init.d/elasticsearch On Thu, May 22, 2014 at 2:13 PM, Mark Walkom ma...@campaignmonitor.comwrote: How are you running the service, upstart, init or something else? ES shouldn't just restart on it's own, this could be something else like the

Re: Nodes restarting automatically

2014-05-22 Thread Nikolas Everett
Like Mark said, check the oomkiller. It should log to syslog. Its is evil. Nik On Thu, May 22, 2014 at 2:14 PM, Jorge Ferrando jorfe...@gmail.com wrote: elasticsearch nodes are launched through /etc/init.d/elasticsearch On Thu, May 22, 2014 at 2:13 PM, Mark Walkom

Re: Number of characters in field

2014-05-22 Thread Dan Tuffery
You could use a script filter: filtered : { query : { ... }, filter : { script : { script : doc['subject'].value.length() 20 } } } Dan On Thursday, May 22, 2014 8:45:41 AM UTC+1, David Nielsen wrote: Hi. I am trying to find a way to

Re: Nodes restarting automatically

2014-05-22 Thread Jorge Ferrando
I've been checking syslog in all of the nodes and I found no mention to oom, process killed, out of memory or something similar... Just in caes I ran this commands in the 3 nodes and the problem persists: echo 0 /proc/sys/vm/oom-kill echo 1 /proc/sys/vm/overcommit_memory echo 100

Re: index paramerters

2014-05-22 Thread anass benjelloun
Hello, I found some informations wich are not complete : There is no “correct” number of actions to perform in a single bulk call. You should experiment with different settings to find the optimum size for your particular workload. Every time you index a document elasticsearch will decide

Re: Number of characters in field

2014-05-22 Thread David Nielsen
Well yes i know that one, is this really the only/best way to do it?. My application is forwarding an input field directly to a querystring, the user need to be able to query something like this: tags:h1 AND subject:lenght20 On Thursday, May 22, 2014 2:30:30 PM UTC+2, Dan Tuffery wrote:

Re: How to speed up indexing by using Python API

2014-05-22 Thread Honza Král
Hi, what method are you using in your python script? Have you looked at the bulk and streaming_bulk helpers in ealsticsearch-py? http://elasticsearch-py.readthedocs.org/en/master/helpers.html Hope this helps, Honza On Thu, May 22, 2014 at 11:09 AM, 潘飞 cnwe...@gmail.com wrote: Hi all: Now ,

Terms lookup mechanism with multiple lookup docs

2014-05-22 Thread Valery Ayala
Is it possible to use this feature with a lookup on multiple documents (multiple IDs) to supply the terms? http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-terms-filter.html#_terms_lookup_mechanism I tried this terms: { user: {

Is it wise to use ES for saving shopping Carts?

2014-05-22 Thread Matthias Feist
Hi Guys, I'm working on an online shop. Currently we are storing the cart's content in a MySQL Database so we can very easy access the amount of a certain product and determine the reserved quantity. This is very important as the amount in the user's carts is reserved so other users my not by

Re: Elasticsearch Facets + limit results

2014-05-22 Thread emeschitc
How is it possible that the count for term 2 is 3 in the first response, but 2 in the second response? From the docs: The size parameter defines how many top terms should be returned out of the overall terms list. By default, the node coordinating the search process will ask each shard to

Re: Setting up indices (mappings, settings etc.)

2014-05-22 Thread Peter Webber
Hi, if anyone could comment on my code I would be very greatful. I'd like to know whether my way to set up the index is as it is intended to be. Thanks! -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop

Re: [HADOOP] Elasticsearch and hive

2014-05-22 Thread Costin Leau
Hi, It looks like you have two tables - one that uses the JSONSerDe from cloudera and another one using es-hadoop. You configured your es-hadoop table to consider the input as json however it does not receive the proper format (as the exception indicates). See this [1] section of the

Analyzers and char_filters o_0 creepy outputs

2014-05-22 Thread georgi . mateev
Hi! This is a sample setup, close to what I am working with https://gist.github.com/anonymous/6e1457321a8ad78c6af8 As you can see, I am trying to remove the hyphens from all words, so that words like hand-made are indexed as handmade. The goal is to make a search for handmade find all

Wait for yellow status

2014-05-22 Thread Ivan Brusic
While doing some tests, I thought I uncovered a bug in the cluster-health/wait-for-yellow request. No matter what settings I tried, the request would always return immediately with no timeout. I then realized that the request is actually something like wait for AT LEAST yellow state. In other

Unassigned Shards Problem

2014-05-22 Thread Brian Wilkins
I have five nodes : Two Master Nodes, One Balancer Node, One Workhorse Node, and One Coordinator Node. I am shipping events from logstash, redis, to elasticsearch. At the moment, my cluster is RED. The shards are created but no index is created. I used to get an index like logstash.2014-05-22,

Re: how to get only aggregation values from elasticsearch

2014-05-22 Thread Ivan Brusic
You can set the size to 0. http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-from-size.html You will still get back the search metadata though. -- Ivan On Thu, May 22, 2014 at 4:46 AM, Subhadip Bagui i.ba...@gmail.com wrote: Hi, I want to get the

Re: Trigram-accelerated regex searches

2014-05-22 Thread Erik Rose
Martijn took a swing at it just now. He eliminated any scoring-based slowdown, like so (constant_score_filter)… curl -s -XGET 'http://127.0.0.1:9200/dxr_test/line/_search?pretty' -d '{ query: { filtered: { query: { match_all: {}

Re: Trigram-accelerated regex searches

2014-05-22 Thread Robert Muir
On Wed, May 21, 2014 at 6:01 PM, Erik Rose grinche...@gmail.com wrote: I'm trying to move Mozilla's source code search engine (dxr.mozilla.org) from a custom-written SQLite trigram index to ES. In the current production incarnation, we support fast regex (and, by extension, wildcard) searches

Elasticsearch 1.20 and 1.1.2

2014-05-22 Thread Ivan Brusic
Releases for some reason never get promoted on the mailing list, so here goes: http://www.elasticsearch.org/blog/elasticsearch-1-2-0-released/ The main reason why I posted about the release was because I tested out cross-version cluster compatibility with 1.1.1 and 1.2.0 nodes and everything

Re: Trigram-accelerated regex searches

2014-05-22 Thread Matt Weber
Leading wildcards are really expensive. Maybe you can try creating a copy of your content field that reverses the tokens using reverse token filter [1]. By doing this you turn those expensive leading wildcards into trailing wildcards which should give you better performance. I think your query

Re: Trigram-accelerated regex searches

2014-05-22 Thread Erik Rose
Leading wildcards are really expensive. Maybe you can try creating a copy of your content field that reverses the tokens using reverse token filter [1]. Good advice, typically, but notice I have wildcards on either side. Reversing just makes the trailing wildcard expensive. :-) -- You

Re: Trigram-accelerated regex searches

2014-05-22 Thread Itamar Syn-Hershko
Aye, and then you can use edit distance on single words (fuzzy query) to cope with fast typers On May 22, 2014 8:22 PM, Robert Muir robert.m...@elasticsearch.com wrote: On Wed, May 21, 2014 at 6:01 PM, Erik Rose grinche...@gmail.com wrote: I'm trying to move Mozilla's source code search engine

Re: Trigram-accelerated regex searches

2014-05-22 Thread Erik Rose
This is definitely a great approach for a database, but it won't work exactly the same way for an inverted index because the datastructure is totally different. Ah, I was afraid of that. I hoped, due to the field being unanalyzed (and the documentation's noted restriction that wildcard

Re: Trigram-accelerated regex searches

2014-05-22 Thread Erik Rose
Alright, try this on for size. :-) Since the built-in regex-ish filters want to be all clever and index-based, why not use the JS script plugin, which is happy to run as a post-processing phase? curl -s -XGET 'http://127.0.0.1:9200/dxr_test/line/_search?pretty' -d '{ query: {

Nested cardinality values way off with filter?

2014-05-22 Thread Phil Price
Hello, I'm trying to get produce the distribution of documents that matches vs don't match a query, and get the cardinality of a field for both sets. The idea is Users who did vs Users who did not. In reality I'm actually running another aggregation under did not (otherwise I'd just subtract

Re: Unassigned Shards Problem

2014-05-22 Thread Mark Walkom
It does create an index, it says so in the log - [logstash-2014.05.22] creating index - it's jut not assigning things. You've set routing.allocation.awareness.attribute, but have you set the node value, ie node.rack? See

Re: Is it wise to use ES for saving shopping Carts?

2014-05-22 Thread Mark Walkom
ES is eventually consistent, so it may not make sense if you're latency requirements are very strict. If you can introduce a delay then it should work. Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email: ma...@campaignmonitor.com web: www.campaignmonitor.com On 22 May 2014

Re: Elasticsearch 1.20 and 1.1.2

2014-05-22 Thread joergpra...@gmail.com
Plugin developers should watch out for changes in classes, e.g. XContentRestResponse (useful for REST actions) has gone, and there are some internal API changes in IndexShard methods, also new deprecations (IndicesStatusAction is now RecoveryAction) - maybe more I did not recognize yet in my

Re: Unassigned Shards Problem

2014-05-22 Thread Brian Wilkins
Thanks for your reply. I set the node.rack to rack_one on all the nodes as a test. In ElasticHQ, on the right it shows no indices. It is empty. In my master, I see that the nodes are identifying with rack_one (all of them). Any other clues? Thanks Brian On Thursday, May 22, 2014 5:10:25 PM

Re: Elasticsearch 1.20 and 1.1.2

2014-05-22 Thread Mark Walkom
Hurray! However they are still using the new version, new path release method, so if you want 1.2 you will need to update your sources to http://packages.elasticsearch.org/elasticsearch/1.2/$OS Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email: ma...@campaignmonitor.com web:

Re: Unassigned Shards Problem

2014-05-22 Thread Brian Wilkins
Went back and read the page again. So I made one master, workhorse, and balancer with rackid of rack_two for testing. One master shows rackid of rack_one. All nodes were restarted. The shards are still unassigned. Also,the indices in ElasticHQ are empty. -- You received this message because

Re: Nested cardinality values way off with filter?

2014-05-22 Thread Adrien Grand
distinct_countOn Thu, May 22, 2014 at 10:34 PM, Phil Price philpr...@gmail.com wrote: I would expect (aggregations.has_thing.dictinct_count.value + aggregations.does_not_have_thing.distinct_count.value) to be close to aggreations.total_distinct_count.value, but in reality it's pretty far off

Re: time taken by each stage for a query

2014-05-22 Thread Adrien Grand
That is not easy, and the reason is that Elasticsearch and Solr work in quite a different way eg. when it comes to compute facets/aggregations: Solr first computes top hits, and if facets are required, it will load the doc IDs of document matches into a bit set that will be used in a subsequent

Re: Is it wise to use ES for saving shopping Carts?

2014-05-22 Thread Adrien Grand
On Thu, May 22, 2014 at 3:54 PM, Matthias Feist matf...@gmail.com wrote: What do you think: Is it wise to implement such a system in elasticsearch? I'm mostly worried about the time between the add to cart (inserting a document) and being able to access the total value due to the flushing

Re: Nested cardinality values way off with filter?

2014-05-22 Thread Phil Price
Doh! You are correct, my bad. I assumed the filter was an exclusive per user property, but in fact - it is not. Thanks for getting back to me Cheers Phil On Thursday, May 22, 2014 4:36:02 PM UTC-7, Adrien Grand wrote: distinct_countOn Thu, May 22, 2014 at 10:34 PM, Phil Price

Re: why the special nested aggregation and query?

2014-05-22 Thread Adrien Grand
Although I would agree that being able to detect it automatically could make things simpler, I think that the fact that it is excplicit is more flexible. For example, it can make sense to copy field values into the root document[1]. This can help speed-up some queries that don't need to know about

Re: _search/scroll?search_type=scan bugs/inconsistencies

2014-05-22 Thread Adrien Grand
scan is mainly useful as a way to export data from the index. In the context of a user interface, I think scroll would make more sense[1]. On a side note, paging improved significantly for scroll requests in Elasticsearch 1.2 (in both terms of speed and memory usage). [1]

Re: indices.cache.filter.size limit not enforce ?

2014-05-22 Thread Adrien Grand
For those who would come to this thread through a search engine, Dan found the root cause of this issue https://github.com/elasticsearch/elasticsearch/issues/6268 On Wed, May 21, 2014 at 8:03 PM, Daniel Low dang...@gmail.com wrote: Hello, Has there been any updates to this? We are using

Re: _search/scroll?search_type=scan bugs/inconsistencies

2014-05-22 Thread Michael Schurter
Thanks for the response Adrien. I'm excited to upgrade to 1.2, but it seems strange to me that people refer to scan vs. scroll (you're not the first) as scan is simply a search_type that - AFAIK - can be used for any type of search (scroll or otherwise). It just seems strange that setting the

IncompatibleClassChangeError[Implementing class]

2014-05-22 Thread Olivier B
Hey! I'm using Elasticsearch 1.1.1 on ubuntu on java 7: java version 1.7.0_55 OpenJDK Runtime Environment (IcedTea 2.4.7) (7u55-2.4.7-1ubuntu1) OpenJDK 64-Bit Server VM (build 24.51-b03, mixed mode) It's working perfectly. But, when I try to upgrade to 1.2.0, elasticsearch won't start:

Reverse River?

2014-05-22 Thread Tim Uckun
I would like to have a river in reverse. Every time a document is inserted or modified I would like to push that into another destination like a database. Ideally this would be async or maybe even in batches. Has anybody done anything like this before? -- You received this message because

Re: Reverse River?

2014-05-22 Thread Ivan Brusic
Some relevant comments: https://github.com/elasticsearch/elasticsearch/issues/1242 -- Ivan On Thu, May 22, 2014 at 8:45 PM, Tim Uckun timuc...@gmail.com wrote: I would like to have a river in reverse. Every time a document is inserted or modified I would like to push that into another

Elastic search High threads , 100% utilization of non heap memory.

2014-05-22 Thread srikanth ramineni
Hi Team, We are experiencing issue with high usage of non heap memory and high thread count. Mostly we are seeing GC process is running. We are watching threads using big desk from past two days. Threads are reaching peak. We are not sure why it is reaching this much high. Two days back we ran

Connect through proxy

2014-05-22 Thread nilsga
Is it possible to connect with the TransportClient to an ElasticSearch cluster via a socks proxy? If yes, how? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to