Re: doc_count on aggregations doesn't match total hits

2014-09-05 Thread David Pilato
It sounds like you are using nested documents. I guess it comes from here. -- David ;-) Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs Le 6 sept. 2014 à 07:49, Ron Sher a écrit : Hi, I've started to use aggregations and it works very fast and cool. But it seems to get the wrong doc_co

doc_count on aggregations doesn't match total hits

2014-09-05 Thread Ron Sher
Hi, I've started to use aggregations and it works very fast and cool. But it seems to get the wrong doc_count as opposed to the total hits. For example - here's a typical search I use (I got for it total hits - 111 and doc_count 1348) : { "query": { "filtered": { "query": { "m

setting es_heap_size requires re-install in Windows?

2014-09-05 Thread kti_sk
This is what I observed. I set es_heap_size from 2000m to 3000m (on 6gb machine) and restart ES service and marvel still shows the machine as having 2gb. I restarted the machine itself and still the same. I finally got Marvel to show the right data when i uninstalled and reinstalled the service.

Snapshot compress not compressing?

2014-09-05 Thread ppearcy
I am playing around with snapshot/restore and have a local 1.3.2 cluster running on Mac OS X with 894MB of index data. I have registered a backup repository like so (straight from the docs): curl -XPUT 'http://localhost:9200/_snapshot/my_backup' -d '{ "type": "fs", "settings": {

Re: How to filter out duplicate documents across multiple types?

2014-09-05 Thread Anand
Thanks Vineet, Well, I wanted to search across the types, i.e US and ES but only return one document not 2. The problem with the approach you suggested is that search is then limited to documents with isDuplicate=true/false On Friday, September 5, 2014 2:46:20 AM UTC-5, vineeth mohan wrote: >

Re: stuck thread problem?

2014-09-05 Thread Patrick Proniewski
Hi, Epic fail: [2014-09-05 22:14:00,043][DEBUG][action.admin.cluster.node.hotthreads] [Bloodsport] failed to execute on node [fLwUGA_eSfmLApJ7SyIncA] org.elasticsearch.ElasticsearchException: failed to detect hot threads at org.elasticsearch.action.admin.cluster.node.hotthreads.Transpor

What happened to multifield from 0.9 -> 1.0?

2014-09-05 Thread Justin Treher
Apparently I have left a serious bug in my searches. We upgraded from 0.9 to 1.0.1 some time ago. I changed all the multi_fields to the new format, where I put "type" : "string" instead of "multi_field." As documented here: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/_

Re: Significant terms - avoiding out of memory errors

2014-09-05 Thread Kevin B
Christoffer, How much JVM heap are you giving ES and what are the size of the sets? According to this http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules-fielddata.html it looks like in 1.4 you will be able to control the circuit breaker more via config. How

Re: elasticsearch processing pipeline capability?

2014-09-05 Thread Kevin B
Jorg, Thanks. I actually have used the term list plugin (thanks) for some quick prototype / experiments. I actually meant I am not familiar with SOLR. Lucene I do have some familiarity with. In this case I was wanting to really be able to send the analysed text on to some post processing ei

Getting ElasticsearchIntegrationTest teardown failures :: "Delete Index failed - not acked"

2014-09-05 Thread mooky
I am getting the following intermittent failure on random different tests (I presume during the teardown) when the build is running on TeamCity. I cant seem reproduce it locally. I get a failure about 1 in 10-20 test runs. Its not clear to me why I am getting the failure. Anyone have any sugges

Performing manual joins where one child has many parents and possibly on different childs

2014-09-05 Thread kazoompa
Hi, I would like to find the best efficient approach on performing manual joins. A little background: Our documents are updated quite frequently and they are rather large (reason why we don't nest them). In addition, one document may be related to two or more other documents (may be on differe

Re: Exists filter does not respect must_not bool filter

2014-09-05 Thread Ayush Sangani
> > > > Appreciate your explanation, and as per your suggestion range filter gives > correct results. > I am still confused with the usage of exists filter. > > As per my understanding the implementation of exists filter is changed in > v1.3 to increase the speed but why it deviates from it's ex

Re: IP geolocation without Logstash

2014-09-05 Thread Andreas Lehr
Hi Alex, how exactly could this work? For example we are using the pattern "Quotedstring" to extract the up to 4 IPs in the X-Forwared-For header of our Apache Logs. When we then try using this one in the geoip filter the filter seems to miss the IP. example: grok { type => http_log pa

Re: Issue creating S3 repository on ElasticSearch 1.3

2014-09-05 Thread Dylan Lingelbach
Ah, that makes total sense. Thanks David! On Fri, Sep 5, 2014 at 10:15 AM, David Pilato wrote: > Got it… > > You must not repeat the repo name in the JSON doc. It should be: > > { > "type":"s3", > "settings":{ > "region":"us-east", > "bucket":"my-bucket" > }

Re: Issue creating S3 repository on ElasticSearch 1.3

2014-09-05 Thread David Pilato
Got it… You must not repeat the repo name in the JSON doc. It should be: {       "type":"s3",       "settings":{          "region":"us-east",          "bucket":"my-bucket"       } } --  David Pilato | Technical Advocate | Elasticsearch.com @dadoonet | @elasticsearchfr Le 5 septembre 2014 à 14:

Re: Total number of documents be included in each query

2014-09-05 Thread David Pilato
Cool! Lot of things have changed since though! :) -- David ;-) Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs Le 5 sept. 2014 à 15:41, kazoompa a écrit : Fantastic, Thanks David. BTW, I have to thank you for your video presentation in French, it really helped me a lot to understand t

Why are children of the and-filter cached?

2014-09-05 Thread André Hänsel
According to http://www.elasticsearch.org/blog/all-about-elasticsearch-filter-bitsets/ filters inside an and-filter don't use a BitSet but only check the conditions inside the iterated documents. When I look at the explanation of GET megacorp/employee/_search?explain=1 { "query": { "filt

Re: Total number of documents be included in each query

2014-09-05 Thread kazoompa
Fantastic, Thanks David. BTW, I have to thank you for your video presentation in French, it really helped me a lot to understand the basics od ES two years ago. Cheers. On Thursday, September 4, 2014 5:09:35 PM UTC-4, David Pilato wrote: > > You could try > http://www.elasticsearch.org/guide/e

Re: Hit Counts within a Document

2014-09-05 Thread vineeth mohan
Hello Darren , If its term frequency of a word that you are looking for , you can use script fields - { "fields": [ "text" ], "query": { "term": { "text": "god" } }, "script_fields": { "tf": { "script": "_index['text']['god'].tf()" } } } SCRIPTING - ht

Does the pitfall of the and filter apply to the terms filter, too?

2014-09-05 Thread André Hänsel
This question is about querying documents from the whole index using a "filtered" query (and not about filtering further down some query results). According to http://www.elasticsearch.org/blog/all-about-elasticsearch-filter-bitsets/ the "and" filter is a very bad choice to query from all docum

Re: Hit Counts within a Document

2014-09-05 Thread vineeth mohan
Hello Darren , What do you mean by number of hits ? Is it the number of occurrence of a term in a document ? Thanks Vineeth On Fri, Sep 5, 2014 at 6:32 PM, Darren Trzynka wrote: > In our current application, it is important to know the number of times > hits were found within a doc

Re: High CPU load during search(elasticsearch 1.2.1)

2014-09-05 Thread PShah
Hi Jorg, Anton is right we removed the plugin and double checked ES is taking up our bulk of the time. We do see that number of evictions are high filter_cache: { memory_size_in_bytes: 10508060 evictions: 0 } id_cache: { memory_size_in_bytes: 276840500 } fielddata: { memory_size_in_bytes: 41618

Hit Counts within a Document

2014-09-05 Thread Darren Trzynka
In our current application, it is important to know the number of times hits were found within a document for a given search. We are considering using elasticsearch but this is one area I have yet to find a solution for with elasticsearch. The only thing I have found remotely possible is gett

Re: Indexing is becoming slow, what to look for?

2014-09-05 Thread Thomas
Got it thanks On Friday, 5 September 2014 11:44:42 UTC+3, Thomas wrote: > > Hi, > > I have been performing indexing operations in my elasticsearch cluster for > some time now. Suddenly, I have been facing some latency while indexing and > I'm trying to find the reason for it. > > Details: > > I

Re: Indexing is becoming slow, what to look for?

2014-09-05 Thread Nikolas Everett
Active in this contact means currently indexing documents. On Sep 5, 2014 8:17 AM, "Thomas" wrote: > Hi, > > I wanted to clarify something from the blog post you mentioned. You > specify that based on calculations we should "give at most ~512 MB > indexing buffer per active shard...". What i

Re: ElasticSearch-Hadoop:Loading data into Elasticsearch through hive querl showing DDLTask error.

2014-09-05 Thread Costin Leau
It looks like there's a classpath issue (notice the HiveUtils error there). Most likely because you have two versions of es-hadoop in your classpath (2.1.0.Beta1 and 1.3.0.M1). Use only one - I suggest 2.1.0.Beta1. Cheers, On 9/5/14 3:39 PM, Mohit Kumar Yadav wrote: hi folks, I facing followi

ElasticSearch-Hadoop:Loading data into Elasticsearch through hive querl showing DDLTask error.

2014-09-05 Thread Mohit Kumar Yadav
hi folks, I facing following error while load data into elasticsearch using hive query. ERROR:- 14/08/30 02:05:04 INFO log.PerfLogger: 14/08/30 02:05:04 INFO ql.Driver: Starting command: CREATE EXTERNAL TABLE eslogs (time STRING, extension STRING, clientip STRING, request STRING, response INT, age

Re: Indexing is becoming slow, what to look for?

2014-09-05 Thread Thomas
Hi, I wanted to clarify something from the blog post you mentioned. You specify that based on calculations we should "give at most ~512 MB indexing buffer per active shard...". What i wanted to ask is what do we mean with the term active? Do you mean the primary only or not? Thank you agai

Re: Issue creating S3 repository on ElasticSearch 1.3

2014-09-05 Thread Dylan Lingelbach
Thanks David. I typed the command I was running to try to create the repository incorrectly. I am running curl *-XPUT* 'localhost:9200/_snapshot/products-v0.2.1' -d '{ "products-v0.2.1": { "type": "s3", "settings": { "region": "us-east", "bucket": "my-bucket" } } }' So I am using the PUT meth

Re: java.lang.OutOfMemoryError causing cluster to fail

2014-09-05 Thread joergpra...@gmail.com
> Why is Elasticsearch allowed to get into this state? Is it poor configuration on our part or a bug in the software? It is the JVM with low memory condition. No Java code can execute when stack and heap is full and free memory is below a few bytes. ES works hard to overcome these situations. >

java.lang.OutOfMemoryError causing cluster to fail

2014-09-05 Thread Ollie
*Background* We have a three node cluster comprised of *prd-elastic-x*, *prd-elastic-y* and *prd-elastic-z*. Each box is an EC2 m2.xlarge, with 17.1 GB of RAM. Elasticsearch is run with the following java memory configuration: java -server -Djava.net.preferIPv4Stack=true -Des.config=/usr/local/e

Re: Indexing is becoming slow, what to look for?

2014-09-05 Thread Thomas
Thx Michael, I will read the post in detail and let you know for any findings Thomas. On Friday, 5 September 2014 11:44:42 UTC+3, Thomas wrote: > > Hi, > > I have been performing indexing operations in my elasticsearch cluster for > some time now. Suddenly, I have been facing some latency while

Re: Indexing is becoming slow, what to look for?

2014-09-05 Thread Michael McCandless
Maybe index throttling is happening (ES would say so in the logs) because your merging is falling behind? Do you throttle IO for merges (it's throttled at paltry 20 MB / sec by default)? What does hot threads report? How about top/iostat? We just got a blog post out about improving indexing thr

Re: writing a custom scoring plugin

2014-09-05 Thread joergpra...@gmail.com
You must handle exceptions very carefully in plugins. You should log errors to the log, skip/disable the plugin operation, and that's it. Jörg On Fri, Sep 5, 2014 at 3:14 AM, Srinivasan Ramaswamy wrote: > Hi Joerg, > > I tried the data loading part as a separate module and it works, but i > ha

Re: aggregations

2014-09-05 Thread Thomas
What version of es have you been using, afaik in later versions you can control the percentage of heap space to utilize with update settings api, try to increase it a bit and see what happens, default is 60%, increase it for example to 70%: http://www.elasticsearch.org/guide/en/elasticsearch/re

Indexing is becoming slow, what to look for?

2014-09-05 Thread Thomas
Hi, I have been performing indexing operations in my elasticsearch cluster for some time now. Suddenly, I have been facing some latency while indexing and I'm trying to find the reason for it. Details: I have a custom process which is uploading every interval a number of logs with bulk API.

creating nested fields from a plugin

2014-09-05 Thread Jakub Kotowski
Hi all, is there a way to create a nested field from a plugin such as the attachment mapper? The parseContext.externalValue(...), and fieldMapper.parse(parseContext) technique creates only normal Lucene fields as far as I can see. Thanks, Jakub -- You received this message because you ar

Re: High CPU load during search(elasticsearch 1.2.1)

2014-09-05 Thread Anton A
Hi, Jörg. Thanks for replay, as I said before we using this on multiply instances and met this problem only one particular one. I removed this plugin and checked again, this won't help. Here is hotthreads from this run. I realy appretiate if you suggest next steps what we can look. Thanks. четв

Re: shard allocated for local recovery (post api), should exists, but doesn't (org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException)

2014-09-05 Thread Mathias Gawlista
I manually deleted the indices through the following command and now it works: curl -XDELETE 'http://localhost:9200/index_name' -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it,

Significant terms - avoiding out of memory errors

2014-09-05 Thread Christoffer Vig
The significant terms aggregation is a really great feature that allows for some really interesting data analysis. We quite often experience out of memory errors, "CircuitBreakingException: Data too large, data would be larger than limit" Which is not hard to understand, due to the amount of dat

Re: How to filter out duplicate documents across multiple types?

2014-09-05 Thread vineeth mohan
Hello Anand , I dont see any direct way to do this from the query. The way i have in my mind goes like this 1. Identify duplicates while indexing. and mark the duplicate feed as duplicate. A field names "isDuplicate" : "true/false" would be the best. 2. While doing search filter out al