Re: ES OOMing and not triggering cache circuit breakers, using LocalManualCache

2015-02-11 Thread Zachary Tong
LocalManualCache is a component of Guava's LRU cache https://code.google.com/p/guava-libraries/source/browse/guava-gwt/src-super/com/google/common/cache/super/com/google/common/cache/CacheBuilder.java, which is used by Elasticsearch for both the filter and field data cache. Based on your node

Re: php api and self signed certs

2015-02-11 Thread Zachary Tong
Glad you got it working! Just a note, I don't monitor the ES mailing list very closely...but if you have any more problems/questions with the PHP client in the future, feel free to just open a tick at the repo and I would be happy to help :) -Z On Wednesday, February 11, 2015 at 7:09:57 AM

Re: PHP API : Hww to change the size or records returned

2014-07-01 Thread Zachary Tong
You can also set it as a parameter in the body, just like a regular query: $searchParams['index'] = 'my_index';$searchParams['type'] = 'my_type';$searchParams['body'] = [ 'query' = [ 'match' = [ 'test_field' = 'abc' ] ], 'size' = 20] $queryResponse =

Re: Optimal number of Shards per node

2014-03-24 Thread Zachary Tong
and your time. btw,which tool you use for monitoring ES cluster and what you monitor ? Thanks Rajan On Thursday, March 20, 2014 2:05:52 PM UTC-7, Zachary Tong wrote: Unfortunately, there is no way that we can tell you an optimal number. But there is a way that you can perform some capacity

Re: Complete cluster failure

2014-03-20 Thread Zachary Tong
at a point! Hmm...that's interesting. I would have recommended those two exact methods. I'll do some digging and see why they didn't work... -Z On Thursday, March 20, 2014 1:23:48 AM UTC-5, Ivan Brusic wrote: Responses inline. On Wed, Mar 19, 2014 at 7:25 PM, Zachary Tong zachar

Re: fuzziness score computation

2014-03-20 Thread Zachary Tong
You are correct in your analysis of the fuzzy scoring. Fuzzy variants are scored (relatively) the same as the exact match, because they are treated the same when executed internally. If you want to score exact matches higher, I would use a boolean combination of an exact match and a fuzzy

Re: Optimal number of Shards per node

2014-03-20 Thread Zachary Tong
Unfortunately, there is no way that we can tell you an optimal number. But there is a way that you can perform some capacity tests, and arrive at usable numbers that you can extrapolate from. The process is very simple: - Create a single index, with a single shard, on a single

Re: Sort before filter?

2014-03-19 Thread Zachary Tong
Sorry for the delay between my twitter response and my reply here. Basically, sorting first and then performing query/filter matches is not really a tenable solution, due to memory constraints. If you were to sort first, you would need to sort the documents (which may be very expensive over

Re: 2 clusters versus 1 big cluster?

2014-03-19 Thread Zachary Tong
Why limit shards to 5gb? Have you capacity tested your hardware + data and determined the max size is around 5gb? If the answer is no, I would encourage you to perform some capacity planning first. It may be that your system can handle 15 or 20gb per shard, or put a different way, 50m

Re: Sorting results with Elastic Search, Mongo Db and PHP

2014-03-19 Thread Zachary Tong
Could you provide a small recreation of the problem in a gist? Just a few sample documents and two searches with different sorts that don't work? It could be your mapping that is incorrect, or a syntax issue with the query/doc/mapping. It may be a typo, but you are sorting on title.value,

Re: OutOfMemoryError OOM while indexing Documents

2014-03-18 Thread Zachary Tong
es_log file the warnings of monitor.jvm are still present. Am Montag, 17. März 2014 14:32:29 UTC+1 schrieb Zachary Tong: Ah, sorry, I misread your JVM stats dump (thought it was one long list, instead of multiple calls to the same API). With a single node cluster, 20 concurrent bulks may

Re: OutOfMemoryError OOM while indexing Documents

2014-03-18 Thread Zachary Tong
have 12 available processors... Should we may slow down bulk loader with adding a wait of a few seconds? Am Dienstag, 18. März 2014 13:22:57 UTC+1 schrieb Zachary Tong: My observations from your Node Stats - Your node tends to have around 20-25 merges happening at any given time

Re: OutOfMemoryError OOM while indexing Documents

2014-03-14 Thread Zachary Tong
Are you running searches at the same time, or only indexing? Are you bulk indexing? How big (in physical kb/mb) are your bulk requests? Can you attach the output of these APIs (preferably during memory buildup but before the OOM): - curl -XGET 'localhost:9200/_nodes/' - curl -XGET

Re: function_score and elasticsearch-php

2014-03-13 Thread Zachary Tong
know :) Have a good day all, Erdal. Le mercredi 12 mars 2014 20:05:11 UTC+1, Zachary Tong a écrit : For the record, this array syntax should work as well: $qry = array( 'query' = array( 'function_score' = array( 'functions' = array( array

Re: elasticsearch memory usage

2014-03-13 Thread Zachary Tong
Can you gist up the output of these two commands? curl -XGET http://localhost:9200/_nodes/stats; curl -XGET http://localhost:9200/_nodes; Those are my first-stop APIs for determining where memory is being allocated. By the way, these settings don't do anything anymore (they were depreciated

Re: Constantly increasing memory outside of Java heap

2014-03-13 Thread Zachary Tong
I believe you are just witnessing the OS caching files in memory. Lucene (and therefore by extension Elasticsearch) uses a large number of files to represent segments. TTL + updates will cause even higher file turnover than usual. The OS manages all of this caching and will reclaim it for

Re: Constantly increasing memory outside of Java heap

2014-03-13 Thread Zachary Tong
that though. On Thursday, March 13, 2014 5:17:20 PM UTC-7, Zachary Tong wrote: I believe you are just witnessing the OS caching files in memory. Lucene (and therefore by extension Elasticsearch) uses a large number of files to represent segments. TTL + updates will cause even higher file

Re: Constantly increasing memory outside of Java heap

2014-03-13 Thread Zachary Tong
Also, are there other processes running which may be causing the problem? Does the behavior only happen when ES is running? On Thursday, March 13, 2014 8:31:18 PM UTC-4, Zachary Tong wrote: Cool, curious to see what happens. As an aside, I would recommend downgrading to Java 1.7.0_u25

Re: function_score and elasticsearch-php

2014-03-12 Thread Zachary Tong
Hey there. It's possible, but it requires explicit object creation via the PHP \stdClass() object. PHP isn't very good at deciding between arrays and objects when using json_encode, unless you use explicit objects. Untested, but try this: $scriptScore = new \stdClass();$scriptScore-script =

Re: function_score and elasticsearch-php

2014-03-12 Thread Zachary Tong
For the record, this array syntax should work as well: $qry = array( 'query' = array( 'function_score' = array( 'functions' = array( array(script_score = array('script' = doc['boostfield'].value)) ), 'query' = array(

Re: term lookup filter

2014-03-12 Thread Zachary Tong
1. I agree with your assessment, and there is no mechanism to optimize for this. There will be some overhead to filtering so many terms (although it's difficult to say how much), it will generate more churn on the filter cache, and updates to that single document may become

Re: ES Response Time

2014-02-26 Thread Zachary Tong
The `took` parameter is the number of milliseconds that the query took to execute on the Elasticsearch server. It's basically the time required to parse the query, broadcast it to the shards and collect the results. It doesn't include network time going to and from Elasticsearch itself (since

Re: elasticsearch cache configuration

2014-02-26 Thread Zachary Tong
If you simply want to decrease the amount of memory that Elasticsearch is using, you need to change your heap size (via the HEAP_SIZE environment variable). That controls the total memory allocated to Elasticsearch. Echoing what Binh said...try not to change the field-data settings unless you

Re: Enable Position Increments property not available

2014-02-06 Thread Zachary Tong
Hi Christian, I looked into the issue and it appears there isn't really an alternative at the moment. The property was removed because Lucene removed the underlying functionality, since it can potentially break token streams (usually in regards to synonyms). You can see the issue here:

Re: How to monitor for filter cache churn?

2014-01-28 Thread Zachary Tong
You can monitor filter cache from three different levels - index, node and cluster. The output is similar for all three outputs, you'll see a size in bytes and an eviction count. - Per-index: curl -XGET http://localhost:9200/my_index/_stats - Per-node: curl -XGET

Re: order of the elements does matter?

2014-01-28 Thread Zachary Tong
So the root cause is that your query is structured incorrectly. The match_all should be inside of a query element, inside the filtered query: curl -XGET http://localhost:9200/test_search/_search?pretty=true; -d' { query: { filtered: { query : {match_all: {}},

Re: Restarting an active node without needing to recover all data remotely.

2014-01-09 Thread Zachary Tong
Just wanted to add a quick note: long recovery times (due to divergence of shards between primary/replica) is an issue that we will be an addressing. No ETA as of yet, but something that is on the roadmap. :) -Zach On Wednesday, December 4, 2013 7:48:04 PM UTC-5, Greg Brown wrote: Thanks