Re: Multiple Types within an Index

2014-03-04 Thread David Pilato
That say, I'm wondering if you set your id as document id, it would be more efficient to use multiget API in that case. -- David ;-) Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs Le 5 mars 2014 à 06:49, Roland Pirklbauer a écrit : Am Donnerstag, 20. Februar 2014 04:37:34 UTC+1 schrieb

Re: Elasticsearch Performance Analysis

2014-03-04 Thread Itamar Syn-Hershko
Writing 500 documents per second is pretty easy to achieve, given a decent machine. Your code should just work and achieve that. Multithreading on the client side, and splitting the index to shards residing on different servers is usually the solution for achieving higher write throughput. But as

Re: Elasticsearch Performance Analysis

2014-03-04 Thread David Pilato
When Elasticsearch gives you the answer (actionGet()), Elasticsearch has your doc, whatever could happen after that. It does not mean that your doc is searchable yet. This will happen about 1 second later. So querying immediatly won't give you the result you are expecting. Bulk API is definitel

Re: upgrade elasticsearch server

2014-03-04 Thread David Pilato
I think you have to reindex. -- David ;-) Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs Le 5 mars 2014 à 07:36, Michael Huang a écrit : I have elasticsearch 1.0beta1. I want to upgrade to elasticsearch 1.0.1. What is the steps to upgrade it. is it simply shutdown beta1 version and

Re: node not allowed to joined cluster?

2014-03-04 Thread Lukáš Vlček
You can provide a list of nodes (IP addresses or DNS names) that are allowed to join the cluster. Other nodes will not be allowed. Lukáš Dne 5.3.2014 7:03 "rene z" napsal(a): > For security reasons, is it possible to prevent or not allowing a node to > join a cluster? > With the use of some key.

upgrade elasticsearch server

2014-03-04 Thread Michael Huang
I have elasticsearch 1.0beta1. I want to upgrade to elasticsearch 1.0.1. What is the steps to upgrade it. is it simply shutdown beta1 version and startup with 1.0.1 version with the same config? Thanks, Michael -- You received this message because you are subscribed to the Google Groups "

Elasticsearch Performance Analysis

2014-03-04 Thread Isaac Hazan
We are currently evaluating Elasticsearch as our solution for Analytics. The main driver is the fact that once the data is populated into Elasticsearch, the reporting comes for free with Kibana. Before adopting it, I am tasked to do a performance analysis of the tool. The main requirement is

node not allowed to joined cluster?

2014-03-04 Thread rene z
For security reasons, is it possible to prevent or not allowing a node to join a cluster? With the use of some key. Bye, René -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, se

Re: Multiple Types within an Index

2014-03-04 Thread Roland Pirklbauer
Am Donnerstag, 20. Februar 2014 04:37:34 UTC+1 schrieb Roland Pirklbauer: > > > > Am Samstag, 5. Januar 2013 12:16:39 UTC+1 schrieb Jörg Prante: >> >> >> Later while searching, you can direct your search client to the index >> "library", and all searches to the "identifier" field will be mapped >

Re: Configure elasticsearch to query files on file system

2014-03-04 Thread Ivan Brusic
The zip download available on github is not what you want. The format required for plugins is different from the source download found on github. Since it appears that you do not have download access, as Roland mentioned, your last option is to clone the project in git and build it yourself with ma

Re: Error "array index out of bounds java.lang.OutOfMemoryError: Java heap space"

2014-03-04 Thread prashy
Hi gkwelding, I am using the bulk API for indexing the data. And also refresh parameter is not set. So what could be the issue for that exception. Let me know if you require any other input for the same. -- View this message in context: http://elasticsearch-users.115913.n3.nabble.com/Error-ar

Re: unable to write data to elasticsearch using hadoop PIG

2014-03-04 Thread siva mannem
On Tuesday, March 4, 2014 9:32:55 PM UTC-8, siva mannem wrote: > > I installed ES(at the location /usr/lib/elasticsearch/) on our gateway > server and i am able to run some basic curl commands like XPUT and XGET to > create some indices and retrieve the data in them. > i am able to give single

unable to write data to elasticsearch using hadoop PIG

2014-03-04 Thread siva mannem
I installed ES(at the location /usr/lib/elasticsearch/) on our gateway server and i am able to run some basic curl commands like XPUT and XGET to create some indices and retrieve the data in them. i am able to give single line JSON record but i am unable to give JSON file as input to curl XPUT .

Re: Configure elasticsearch to query files on file system

2014-03-04 Thread Nitesh Earkara
Hi Roland, Thanks for the suggestion. I am getting error while trying to install using the way you suggested. Below is the error message C:\Users\ner\Downloads\elasticsearch-1.0.1(2)\elasticsearch-1.0.1\bin>plugin -install fr.pilato.elasticsearch.river/fsriver/0.4.0 -url C:\Users\ner\Download

Re: Configure elasticsearch to query files on file system

2014-03-04 Thread Roland Pirklbauer
Am Dienstag, 4. März 2014 12:27:33 UTC+1 schrieb Nitesh Earkara: > > Hi, > > I have been trying to configure ElasticSearch to query/search files in > file system on a Windows 7 Operating system. I have installed ElasticSearch > and its up and running. I have been trying to install fsriver plugi

Re: ELK source tar balls

2014-03-04 Thread Tomasz Kloczko
On Wednesday, 5 March 2014 03:58:41 UTC, Mark Walkom wrote: > > There are rpm/deb repos > http://www.elasticsearch.org/blog/apt-and-yum-repositories/ > Otherwise you can find the sources on github > rpm packages cannot be build from github repo so this is why I'm asking where are source tar ball

Re: Facet to find possible keys for querying

2014-03-04 Thread Corey Nolet
I forgot to mention, I need the ability for the user to specify they only care about keys for the entity.type === 'person' (or any type for that matter). On Tuesday, March 4, 2014 11:13:27 PM UTC-5, Corey Nolet wrote: > > Hello, > > I've got an "entity" document which looks like this: > > { >

Facet to find possible keys for querying

2014-03-04 Thread Corey Nolet
Hello, I've got an "entity" document which looks like this: { id: 'id', type: 'person', tuples: [ { key: 'nameFirst', value: 'john', type: 'string' }, key: 'age', value: '38', type: 'int' },

Re: ELK source tar balls

2014-03-04 Thread Mark Walkom
There are rpm/deb repos http://www.elasticsearch.org/blog/apt-and-yum-repositories/ Otherwise you can find the sources on github. Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email: ma...@campaignmonitor.com web: www.campaignmonitor.com On 5 March 2014 14:53, Tomasz Kloczko wr

ELK source tar balls

2014-03-04 Thread Tomasz Kloczko
Hi, On http://www.elasticsearch.org/overview/elkdownloads/ I see only binary packages and tar balls with jar files. Q: Where I can find source code ELK tar balls and for example rpm spec files used on build noarch.rpm packages? Tomasz -- You received this message because you are subscribed t

has_child query does not support post_filter?

2014-03-04 Thread Yuri Panchenko
Guys, I'm running the following query to test out one of the use cases: curl -X GET '0:9200/segmentation/animal/_search?pretty' -d '{ "query" : { "has_child" : { "type" : "visit", "query" : { "bool" : { "must" : [ {"term" : { "_parent" : "119000148-5661691" }},

Re: JDK 7 Issues Question

2014-03-04 Thread Ivan Brusic
The vectorization issue is not constrained to OpenJDK, and is still present in 7u45: https://twitter.com/thetaph1/status/423523708708208640 On Tue, Mar 4, 2014 at 3:52 PM, InquiringMind wrote: > Jörg, > > Just to clarify: The links below point to OpenJDK, not to the Oracle JDK? > I only ask beca

Re: Strange percolator behavior with extra field named "type"

2014-03-04 Thread James Bathgate
Brian, Using 1.0.0. You're probably right but I don't want to go back and change code in 1000 places if I can avoid it. On Tuesday, March 4, 2014 4:07:26 PM UTC-8, InquiringMind wrote: > > What version of ES are you using? I seem to recall reading about ambiguity > surrounding the use of "type"

Re: Strange percolator behavior with extra field named "type"

2014-03-04 Thread InquiringMind
What version of ES are you using? I seem to recall reading about ambiguity surrounding the use of "type" as a field name in one of the version's release notes, but I cannot find it quickly now. However, ES seems to overuse the word "type", referring to both the document type (roughly analogous

Re: Slow Percolator Indexing

2014-03-04 Thread James Bathgate
Thanks. On Tuesday, March 4, 2014 1:45:29 PM UTC-8, Martijn v Groningen wrote: > > Ok, I thought may be an old jvm version was causing this, but this one is > pretty recent. > > I took a better look at indexing percolator queries and there is indeed a > substantial difference in execution time c

Re: JDK 7 Issues Question

2014-03-04 Thread InquiringMind
Jörg, Just to clarify: The links below point to OpenJDK, not to the Oracle JDK? I only ask because the version and build numbers seem to track those in Oracle's JDK. For what it's worth, I am currently running ES 1.0.0 GA with the latest Oracle JDK 7u51 on Mac Mavericks and on Linux, and I hav

Re: What kind of encoding does the ES support?

2014-03-04 Thread InquiringMind
Ivan, Yes, ES stores all strings in UTF-8 encoding. Referring to your 3 POST commands, the first two succeeded because in the first one, you presented the data in the UTF-8 encoding and it was accepted. In the second one, you presented the same name but in using the \u notation which is valid,

Strange percolator behavior with extra field named "type"

2014-03-04 Thread James Bathgate
I've found a strange behavior where if I have an extra field in the percolator named "type" then range filters on my percolator don't work properly. See the following test case: https://gist.github.com/julesbravo/9357887 If I change the type field on lines 26 and 74 to "atype" OR change the typ

Re: multi_match boolean across fields

2014-03-04 Thread Thibaut
Thank you! I haven't tried it yet, but reading the documentation makes me understand that it will solve my needs. Is it possible to keep the boosting applied to the individual fields when computing the score ? Should I keep the original query and add the match/AND you're talking about, as a que

Re: Java bulk API slows down if client is not closed and reopened

2014-03-04 Thread InquiringMind
Are you using the BulkRequestBuilder? If so, create a new one for each bulk operation (and let the de-referenced old one be garbage collected); otherwise you'll be filling it up and times will drop as seen. At least, that's what I do, and it runs like the blazes for the entire 97M document load

Re: How to search data inside _source

2014-03-04 Thread Binh Ly
You'd have to turn dynamic back to true. Dynamic = false means ignore the "new" field and don't index it. If a field is not indexed, then you cannot search for it. -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group

Re: multi_match boolean across fields

2014-03-04 Thread Binh Ly
I'd probably just aggregate all the fields you are interested in into 1 field and then match/AND on that 1 field. You probably can do a copy_to to accomplish the aggregation to a single field: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-core-types.html#copy-to

Re: get elasticsearch settings

2014-03-04 Thread Binh Ly
Unfortunately, there is no 1 place to get all settings. However, you can usually get them as follows: 1) Cluster settings: curl "localhost:9200/_cluster/settings?pretty" 2) Index settings: curl "localhost:9200/foo/_settings?pretty" 3) Node settings: curl "localhost:9200/_nodes?pretty" So for e

Re: Sorting date fields

2014-03-04 Thread Adrian
On Wed, Feb 26, 2014 at 12:21:26AM +0100, joergpra...@gmail.com wrote: Jörg, sorry for the late answer. > Maybe you can set up an example of your sort as a demo, so that the error > can be reproduced? It turned out that this behaviour was caused by me, since documents contained the wrong times

Re: Slow Percolator Indexing

2014-03-04 Thread Martijn v Groningen
Ok, I thought may be an old jvm version was causing this, but this one is pretty recent. I took a better look at indexing percolator queries and there is indeed a substantial difference in execution time comparing to indexing a regular document. When I disabled the size calculation (in the code) f

"Failed to derive xcontent" when submitting queries in smile format over port 9200

2014-03-04 Thread John Ohno
My response looks like: {"error":"SearchPhaseExecutionException[Failed to execute phase [query], all shards failed; shardFailures {[pb-DdzK0S9uOU6yh9kg3zQ][trial][6]: RemoteTransportException[[IP_POC_NODE1][inet[/10.226.22.13:9700]][search/phase/query]]; nested: SearchParseException[[trial][6]

get elasticsearch settings

2014-03-04 Thread eunever32
I have what I think is an obvious question. If I tweak some settings such as: index.translog.flush_threshold_period or index.merge.policy.use_compound_file or index.refresh_interval or indices.memory.index_buffer_size or index.cache.field.type or index.gateway.snapshot_interval Is it poss

Re: rejected execution (queue capacity 50) in bulk process

2014-03-04 Thread joergpra...@gmail.com
Probably you should check why your cluster is yellow - is that a single node only? Bulk indexing with a green cluster should work flawlessly. Jörg On Tue, Mar 4, 2014 at 8:02 PM, Jose Gargallo wrote: > Ok, I'm gonna play with that, but it seems too complicated just for bulk > indexing taking

Re: 3,000 events/sec Architecture

2014-03-04 Thread Mark Walkom
You also definitely want an odd number of nodes to prevent potential split brain situations. Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email: ma...@campaignmonitor.com web: www.campaignmonitor.com On 5 March 2014 07:08, Zachary Lammers wrote: > My initial suggestion would

Re: failed to recover shard (after insertion full disk error)

2014-03-04 Thread Mark Walkom
Looks like your index mapping is invalid, you might have to delete the index and then reload your data. Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email: ma...@campaignmonitor.com web: www.campaignmonitor.com On 4 March 2014 23:01, Andrés wrote: > Hello, > > I'm having prob

Re: 3,000 events/sec Architecture

2014-03-04 Thread Zachary Lammers
My initial suggestion would be to set your templates to 3 shards, 1 replica. With three data nodes, you'd have two shards per index, at 5 indexes/day, that's 10 shards per day per index per node. 3 nodes/10 shards per day/30 days is 900 shards. I don't know any 'cutoff' per se, but 900 may b

Re: Too Many Open Files

2014-03-04 Thread smonasco
Sorry to have taken so long to reply. So I went ahead and followed your link. I'd been there before, but decided to give it a deeper look. I found actually, however, that bigdesk told me how many max open files the process was using and from there I was able to determine that my settings in

Re: Issue with facets and not_analyzed in mapping.

2014-03-04 Thread Yury Kisliak
Thanks Ivan, I've fixed issue. It was just in wrong mapping. On Tuesday, March 4, 2014 8:29:00 PM UTC+1, Ivan Brusic wrote: > > Judging by the output, the genre field is analyzed using the default > analyzer. Others can help debug if you provide your mapping. It is best to > use the get mapping

Re: Issue with facets and not_analyzed in mapping.

2014-03-04 Thread Ivan Brusic
Judging by the output, the genre field is analyzed using the default analyzer. Others can help debug if you provide your mapping. It is best to use the get mapping API [1] since it shows what you actually have instead of what you supplied at index creation. Depending on your use case, you might be

Re: rejected execution (queue capacity 50) in bulk process

2014-03-04 Thread Jose Gargallo
Ok, I'm gonna play with that, but it seems too complicated just for bulk indexing taking into account the low number of documents Thanks El 04/03/2014 19:53, "joergpra...@gmail.com" escribió: > Logging is not enough, you should care for the number of active requests > sent and the bulk request r

Re: rejected execution (queue capacity 50) in bulk process

2014-03-04 Thread joergpra...@gmail.com
Logging is not enough, you should care for the number of active requests sent and the bulk request responses that came back. So you can control the number of concurrent bulk requests that are active at a time, and if you do so, you can limit this number, before exceeding the bulk queue size of 50.

Re: rejected execution (queue capacity 50) in bulk process

2014-03-04 Thread Jose Gargallo
I'm using python, I just sent the log so you could figure out what I'm doing. I'm not evaluating the bulk response but logging them so I can see the 'rejected execution' error in most of them. On 4 March 2014 19:13, joergpra...@gmail.com wrote: > So you use plain HTTP API? Do you evaluate the

Re: rejected execution (queue capacity 50) in bulk process

2014-03-04 Thread joergpra...@gmail.com
So you use plain HTTP API? Do you evaluate the responses from POST /_bulk requests before sending next request? Jörg On Tue, Mar 4, 2014 at 6:59 PM, Jose Gargallo wrote: > This is what I'm doing: > > for each 24 locales: > POST /current_contenidos_[LOCALE] # creates index > PUT /curren

Re: rejected execution (queue capacity 50) in bulk process

2014-03-04 Thread Jose Gargallo
This is what I'm doing: for each 24 locales: POST /current_contenidos_[LOCALE] # creates index PUT /current_contenidos_es_es/_settings # sets refresh_interval = -1 POST /_bulk HTTP/1.1" 200 2386284 (12 times [mixed requests for 24 indices]) for each 24 locales: PUT /contenidos_es_es/

Re: I am confused by the postFilter and filter of elasticsearch‘s java client

2014-03-04 Thread Luca Cavanna
If you only need the filter part you can use a constant score query that only contains a filter, in that case you wouldn't need the match_all query anymore. On Friday, February 28, 2014 10:04:35 AM UTC+1, xzer LR wrote: > > AFAIK, There are three types search I can perform on elasticsearch: > >

Re: rejected execution (queue capacity 50) in bulk process

2014-03-04 Thread joergpra...@gmail.com
Without being able to look at source code, it is difficult if not impossible to find issues. "index.refresh_interval: -1" must be set to the respective index, preferably using the cluster update API (conf file or index creation settings is also possible but not a good place for temporary settings)

Re: How to get the json for a searchrequest?

2014-03-04 Thread Tebring Daly
Great! Thanks David! On Tuesday, March 4, 2014 10:53:24 AM UTC-6, David Pilato wrote: > > I think you can print a QueryBuilder or a SearchRequestBuilder but not a > SearchRequest. > > QueryBuilder qb = QueryBuilders.queryString("my text"); > logger.info("Your query is : {}", qb); > > // this wor

Re: rejected execution (queue capacity 50) in bulk process

2014-03-04 Thread Jose Gargallo
I understand, but I must be missing something. 5k documents * 24 indices = 120k requests, How am i supposed to bulk index them? I've tried to set "index.refresh_interval" to "-1" to speed up the process but still same result. splitting the bulk in different sizes didn't work either. On 4 March 2

Re: Apply synonyms that include confidence weights

2014-03-04 Thread Ivan Brusic
Hopefully you can find a way to make things work with less code. It would be great if payloads were more of a first class citizen in Elasticsearch, but it is up to the Lucene layer to handle analysis. I really need to play around with the "new" text scoring abilities. -- Ivan On Mon, Mar 3, 201

Re: rejected execution (queue capacity 50) in bulk process

2014-03-04 Thread joergpra...@gmail.com
Threads are pooled, they are not used per index. The queue length of 50 works in almost any case. 50 is also safe to protect a node before being overwhelmed by too many documents. If not, think about the bulk request size, and if your cluster is powerful enough for processing the transmitted docum

Re: How to get the json for a searchrequest?

2014-03-04 Thread Tebring Daly
On a toString() I get the typical unimplemented toString class reference : "org.elasticsearch.action.search.SearchRequest@142daa2e" I am using elasticsearch 1.0.0, wonder if something has changed from 0.90 or 1.0.1? Thanks for the response On Monday, March 3, 2014 8:46:25 PM UTC-6, amit.soni w

Re: How to get the json for a searchrequest?

2014-03-04 Thread David Pilato
I think you can print a QueryBuilder or a SearchRequestBuilder but not a SearchRequest. QueryBuilder qb = QueryBuilders.queryString("my text"); logger.info("Your query is : {}", qb); // this works as well node.client().prepareSearch

Re: Java bulk API slows down if client is not closed and reopened

2014-03-04 Thread joergpra...@gmail.com
Without seeing the code, it is impossible to make helpful statements. 1G is in general a small heap for bulk indexing. 275k documents will work anyway, they should be ready in ~30 seconds. Maybe you see GC starting to kick in. To make guesses about ES, you should run bulk indexing for at least 30-

Re: using specific analyzer for each nested type?

2014-03-04 Thread Mohamed Hedi ABIDI
Well, I found the answer : using dynamic template. "person" : { "dynamic_templates" : [ { "nested_template": { "match" : "fr|en", "match_pattern" : "regex", "mapping" : { "type" : "nested" } } }, { "template_fr" : { "path_match" : "fr.*", "match_mapping_type" : "string", "mapping" : { "type" : "

Re: What is the meaning of throttle_time_in_millis?

2014-03-04 Thread Adrien Grand
On Tue, Mar 4, 2014 at 4:01 PM, isaac hazan wrote: > Thx. > > Is a segment a single file with multiple documents? Or is it multiple > files that together form a segment? In other terms I don't fully understand > why the notion of segment exists? > The simple answer is that a segment is made of se

multi_match boolean across fields

2014-03-04 Thread Thibaut
Hello, I'd like to build a specific query to ES that I can't find currently, here is what I would like to do: The query would be a *multi_match* that keeps only the documents that have all the terms in the query (the terms can be spread across several fields). The default *multi_match* gives me

Re: Slow Percolator Indexing

2014-03-04 Thread James Bathgate
Martijn, I'm using Oracle Java 7u45 On Tuesday, March 4, 2014 1:05:13 AM UTC-8, Martijn v Groningen wrote: > > I see that you a lot of time is spend on just measuring how memory the > query takes in memory and not parsing the query. I think this slowness > might be jvm version dependent, what j

Re: 3,000 events/sec Architecture

2014-03-04 Thread Eric Luellen
Zach, Thanks for the information. With my POC, I have 2 10 gig VMs and I'm keeping 7 days of logs with no issues but that is a fairly large jump and I could see where it may pose an issue. As far as the 150 indexes, I'm not sure on the shards per index/replicas. That is the part that I'm the

rejected execution (queue capacity 50) in bulk process

2014-03-04 Thread jgargallo
Hello, I'm facing a problem bulk indexing 5k documents in 24 different indices (i18n). I'm using elasticsearch 1.0.1 with all default settings. I've read that a thread per index is used, that would mean I'm using 24 bulk threads at one time. Am I right? if so, why I'm getting this rejection sin

Re: 3,000 events/sec Architecture

2014-03-04 Thread Zachary Lammers
Based on my experience, I think you may have an issue with OOM trying to keep a month of logs with ~10gb ram / server. Say, for instance, 5 indexes a day for 30 days = 150 indexes. How many shards per index/replicas? I ran some tests with 8GB assigned to my 20x ES data nodes, and after a ~7 d

Re: What is the meaning of throttle_time_in_millis?

2014-03-04 Thread Garry Welding
Isaac, this (http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules-merge.html) gives a good explanation on what segments actually are. It also gives you the list of settings to do with merges and maybe you can find some optimisations in there somewhere. On Tuesda

3,000 events/sec Architecture

2014-03-04 Thread Eric Luellen
Hello, I've been working on a POC for Logstash/ElasticSearch/Kibana for about 2 months now and everything has worked out pretty good and we are ready to move it to production. Before building out the infrastructure, I want to make sure my shard/node/index setup is correct as that is the main pa

RE: What is the meaning of throttle_time_in_millis?

2014-03-04 Thread isaac hazan
Thx. Is a segment a single file with multiple documents? Or is it multiple files that together form a segment? In other terms I don't fully understand why the notion of segment exists? Does the fact that I have a high number in the throttling KPI mean that I have a problem in performance and if

elasticsearch with hadoop and pig

2014-03-04 Thread srivibalu
I am doing a poc project using elasticsearch-hadoop and using pig to create and read the index. Is there an elaborate documentation on how to use elasticsearch with pig and hadoop? I want to know if the index could be saved in HDFS permanently or is HDFS just a backup store only? Also, load

How to search data inside _source

2014-03-04 Thread Bhupali Kalmegh
*Tweet Index definition : * "tweet": { "tweet": { "index": { "number_of_shards": 2, "number_of_replicas": 1 } }, "mappings": { "blog": { "_all": { "enabled"

elasticsearch suggest by middle words by making preserve_position_increments: false

2014-03-04 Thread Shams Haque
Hi, I am trying to implement middle word search, as Completion Suggester doc said, preserve_position_increments: if disabled and using stopword analyzer, you could get a string starting wit

Re: What is the meaning of throttle_time_in_millis?

2014-03-04 Thread Adrien Grand
When you add documents to Elasticsearch, this creates new files on disk that form what is called a segment. Having several segments is fine, but when you start having too many of them, search is going to be slower this is why Elasticsearch has a background process that takes care of merging these s

Java bulk API slows down if client is not closed and reopened

2014-03-04 Thread Ondřej Spilka
Hi all, I'm using JAVA API on ES 1.0.1 to bulk index medium sized docs. Documents come from 150Mb XML. Average JSON document is about 500Bytes in 10 propeties, currently testing on 275.000 documents. Only some key properties are indexed, otherwise stored in _source. Bulk index is done in 5000 do

Re: Error "array index out of bounds java.lang.OutOfMemoryError: Java heap space"

2014-03-04 Thread prashy
Just wanted to know that as I was using 1GB as Heap Size I was getting an error. So I increased it to 2GB (Heap , system has 4GB) so in that scenario also I got the error at same point. So if I increased the memory from 1 GB to 2 GB, at least it should process one more record in compare to previou

Re: Error "array index out of bounds java.lang.OutOfMemoryError: Java heap space"

2014-03-04 Thread Garry Welding
you also don't give us much information on how you're trying to index this 3gb of information. Are you using the bulk API? Are you refreshing after every index action? etc... On Tuesday, March 4, 2014 1:40:58 PM UTC, Prashy wrote: > > I tried increasing the heap value by 2GB as well by ES_MAX_ME

Re: Error "array index out of bounds java.lang.OutOfMemoryError: Java heap space"

2014-03-04 Thread Garry Welding
Adding a bit more to my rather short answer. Both exceptions essentially mean the same thing. I would follow the basic heap allocation advice. Allocate 50% of your system RAM to ES as catastrophic things happen when ES runs out of RAM. Leave the other 50% to the system. So if you have a server

Re: Error "array index out of bounds java.lang.OutOfMemoryError: Java heap space"

2014-03-04 Thread prashy
I tried increasing the heap value by 2GB as well by ES_MAX_MEM: 2g but it gave the same error. -- View this message in context: http://elasticsearch-users.115913.n3.nabble.com/Error-array-index-out-of-bounds-java-lang-OutOfMemoryError-Java-heap-space-tp4050914p4050916.html Sent from the Elasti

Re: Error "array index out of bounds java.lang.OutOfMemoryError: Java heap space"

2014-03-04 Thread Garry Welding
The message is pretty obvious. Your node is running out of heap memory... Increase it. On Tuesday, March 4, 2014 1:36:51 PM UTC, Prashy wrote: > > Hi ES users, > > I am getting the following exception while indexing huge amount of > data(say > ~5GB) to ES node. > > Exception: > 1) /*array in

Error "array index out of bounds java.lang.OutOfMemoryError: Java heap space"

2014-03-04 Thread prashy
Hi ES users, I am getting the following exception while indexing huge amount of data(say ~5GB) to ES node. Exception: 1) /*array index out of bounds java.lang.OutOfMemoryError: Java heap space*/ Dumping heap to java_pid6721.hprof ... 2) /*java.lang.OutOfMemoryError: PermGen space Dumping heap t

How to do sum aggregations in Java API?

2014-03-04 Thread Daniel Guo
I have a query similar to the following SQL: select sum(count+displayCount) as total from day_inc_count group by video_id My problem is how to implement the above using aggregation in REST or Java API. I don't understand aggregations in elasticsearch 1.0 very much. Hope somebody could help.

failed to recover shard (after insertion full disk error)

2014-03-04 Thread Andrés
Hello, I'm having problems starting elasticsearch after an error. I was bulk indexing a new type when the disk got full and server returned this error: PHP Fatal error: Uncaught exception 'Elastica\Exception\Bulk\ResponseException' with message 'Error in one or more bulk request actions: ind

Re: Does Scan API support doc key sort order

2014-03-04 Thread Boaz Leskes
Hi Alex, The Scan API indeed works based on the order of documents in the lucene segments. This is the most efficient way to get bulks of data which is the intended use case of this API. Once 1.1 is released you'd be able to efficiently scroll while maintaining order (see this issue: https://

Configure elasticsearch to query files on file system

2014-03-04 Thread Nitesh Earkara
Hi, I have been trying to configure ElasticSearch to query/search files in file system on a Windows 7 Operating system. I have installed ElasticSearch and its up and running. I have been trying to install fsriver plugin by following instructions mentioned in link https://github.com/dadoonet/fsr

Boosting documents based on the query provided

2014-03-04 Thread Garry Welding
This is a bit of an odd one, although not really for those of us who work in the eCommerce world I suppose. I work for one of the largest children's retailers in the UK. We're currently knocking together a demonstration of our eCommerce platform with everything running off Elasticsearch rather

Re: Calculating a "computed" value based on index statistics / term frequencies

2014-03-04 Thread Kevin B
Quick correction. I remembered precomputing prior to population of the index wouldn't work for me in this case because there wouldn't be the term frequency data for the full corpus. On Tuesday, March 4, 2014 11:56:04 AM UTC+2, Kevin B wrote: > > As background I have some Lucene based code which

Issue with facets and not_analyzed in mapping.

2014-03-04 Thread Yury Kisliak
Hello, could u please explain what is wrong. http://pastebin.com/dMZudCsj my config: ... analysis: analyzer: default: type: custom tokenizer: standard filter: [mam_ngram, lowercase] filter: mam_ngram:

Calculating a "computed" value based on index statistics / term frequencies

2014-03-04 Thread Kevin Blaisdell
As background I have some Lucene based code which is used to manipulate index statistics to generate numeric document vectors. This code sits between systems that need document vectors for input and Lucene indexes that are the store of the source data & statistics (term & document frequencies)

Will there be performance issue in Index templates on using " * "

2014-03-04 Thread Mohit Golchha
Hi all, As there is no support of regex in Index templates as of now, in case if I am applying the simple matching pattern as "*" which will match all index, mapping will be created for indices which might not have that field in the future. Will there be any performance issue in th

Re: River documentation vanished?

2014-03-04 Thread Mark Walkom
This has been a problem with other documentation in the past, not that it explains what the problem is. Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email: ma...@campaignmonitor.com web: www.campaignmonitor.com On 4 March 2014 20:36, Lukáš Vlček wrote: > OK, I can see the doc

Re: River documentation vanished?

2014-03-04 Thread Lukáš Vlček
OK, I can see the doc page is still in sources: https://github.com/elasticsearch/elasticsearch/blob/master/docs/river/index.asciidoc and available at http://www.elasticsearch.org/guide/en/elasticsearch/rivers/current/index.html But for some reason it is not reachable via guide search. On Tue, Ma

scripting in java

2014-03-04 Thread Bernard CHAMBON
Hi, I'm trying to play with scripting (in java) and that's not obvious to me 1/ The smallest example don't compile due to doc().field not found public class Ex2ScriptFactory implements NativeScriptFactory { @Override public ExecutableScript newScript (@Nullable Map params) { return new

River documentation vanished?

2014-03-04 Thread Lukáš Vlček
Hi, we try to keep documentation for our rivers updated but I noticed that the general river documentation page [1] is probably no longer available on Elaticsearch.org site? Is this intentional? I can see twitter river is still pointing to it as well [2] [1] http://web.archive.org/web/20130819045

using specific analyzer for each nested type?

2014-03-04 Thread Mohamed Hedi ABIDI
Hi I'm using nested types to manage multilingual : the root object contains values ​​for the default language, and I have several nested objects that represents translation in other languages : fr, gr, ar ... Is there any way to define a specific analyzer for each nested type. Thanks in advan

Re: Slow Percolator Indexing

2014-03-04 Thread Martijn v Groningen
I see that you a lot of time is spend on just measuring how memory the query takes in memory and not parsing the query. I think this slowness might be jvm version dependent, what jvm version are you using? On 4 March 2014 01:33, James Bathgate wrote: > Martijn, > > 1. Not running low at all. >

Re: Low indices.ttl.interval is NOT working

2014-03-04 Thread Alexander Reelsen
Hey, wow. I only read what I wanted to read and skipped the first paragraph. sorry for that. The clean up thread basically sleeps the specified interval, only when the interval is updated and the old sleep interval has been finished, the new settings are applied. This means you might need to wait

[ANN] Elasticsearch Thrift transport plugin 2.0.0 released

2014-03-04 Thread Elasticsearch Team
Heya, We are pleased to announce the release of the Elasticsearch Thrift transport plugin, version 2.0.0. The thrift transport plugin allows to use the REST interface over thrift on top of HTTP.. https://github.com/elasticsearch/elasticsearch-transport-thrift/ Release Notes - elasticsearch-

[ANN] Elasticsearch Memcached transport plugin 2.0.0 released

2014-03-04 Thread Elasticsearch Team
Heya, We are pleased to announce the release of the Elasticsearch Memcached transport plugin, version 2.0.0. The memcached transport plugin allows to use the REST interface over memcached.. https://github.com/elasticsearch/elasticsearch-transport-memcached/ Release Notes - elasticsearch-tran

Re: Error "ReduceSearchPhaseException"

2014-03-04 Thread prashy
One more observation in above scenario. 1) The records returned by the "mobile" is 287 so if I set the "size" (in query) as < 287 it is working fine but in case of >=287 it is throwing the exception. 2) The records returned by the "samsung" is 191 so if I set the "size" as < 191 it is working fin

elasticsearch-py connection

2014-03-04 Thread eunever32
Hi Is there an example on how to construct a elasticsearch-py ConnectionPool? ie is it create a list of connections and pass that to the ConnectionPool? :arg connections: list of tuples containing the :class:`~elasticsearch.Connection` instance and it's options -- You received this