Re: Elasticsearch and MongoDB without River

2015-05-24 Thread Michael Sick
I'd use Apache Storm, especially if it was used elsewhere in your organization. --Mike On Sat, May 23, 2015 at 4:28 AM, sriharshakiran < kiran.srihar...@imomentous.com> wrote: > Hi All > > Now that rivers are deprecated, I need to index data into ES from MongoDB. > Can anyone suggest an approach?

Re: Shield and Proxy Users

2015-05-05 Thread Michael Young
ntually migrate to a Kerberos implementation at some point across the entire stack. Is there any intent to enable Kerberos support in Shield? If there is, what sort of time frame are we looking at? -- Michael On Fri, May 1, 2015 at 2:28 PM, Jay Modi wrote: > Thanks Michael. Are you interest

Re: Shield and Proxy Users

2015-04-29 Thread Michael Young
If you would like to get more specific use case details, I'm more than willing to exchange emails or engage in phone calls. Michael On Wednesday, April 29, 2015 at 10:34:25 PM UTC-4, Michael Young wrote: > > I thought that might be the case. > > The problem with Shield f

Re: Shield and Proxy Users

2015-04-29 Thread Michael Young
he user account using a hash of the users group permissions from LDAP/AD. It's not ideal, but it'll probably get the job done until Shield is extended/enhanced. On Wednesday, April 29, 2015 at 5:03:51 PM UTC-4, Jay Modi wrote: > > Hi Michael, > > We don't currently ha

Re: Marvel "No reports" Warning Message

2015-04-29 Thread Michael Young
2:07:47 PM UTC-4, Michael Young wrote: > > I have a 6 node Elasticsearch cluster set up. I was running ES 1.4.4 and > Marvel was working without any issues. > > Today I upgraded to Elasticsearch 1.5.2. Each of my 6 nodes is configured > to use the same marvel exporter (host0

Shield and Proxy Users

2015-04-29 Thread Michael Young
I have Elasticsearch 1.5.2 and Shield 1.2.0 configured and working against Active Directory. This seems to work pretty well. However, I was wondering if there was a way to pass in a "proxy user" from an application to get the appropriate index filtering via access controls without having to p

Marvel "No reports" Warning Message

2015-04-29 Thread Michael Young
I have a 6 node Elasticsearch cluster set up. I was running ES 1.4.4 and Marvel was working without any issues. Today I upgraded to Elasticsearch 1.5.2. Each of my 6 nodes is configured to use the same marvel exporter (host01). It looks like all of the data is getting to the Marvel host.

Re: Elasticsearch ingest performance

2015-04-23 Thread Michael McCandless
You can try the ideas here too: https://www.elastic.co/blog/performance-considerations-elasticsearch-indexing Mike McCandless On Wed, Apr 22, 2015 at 8:00 PM, Kimbro Staken wrote: > Hello Brian, > > Many things will affect the rate of ingest, the biggest one is making sure > the load gets sprea

Re: Filtering on existing term in field returns nothing

2015-04-17 Thread Michael Czerny
;lowercase" token filter. > So the "topic" property of that document was indexed with "topic_4". A > term query/filter does not analyze the search value, so you are searching > for "Topic_4" in the index. > > > On Wednesday, April 15, 2015 at 5:56

Filtering on existing term in field returns nothing

2015-04-15 Thread Michael Czerny
Hello, I am using ES-1.5.0. I can't for the life of me figure this out. I am trying to do a simple filter query, like so: { "filter": { "bool" : { "must" : [ {"term": {"sentiment": "negative"}} ] } } } which returns something like: {"took":7,"timed_out":false,"_

Re: refresh_interval:"10s" is better than refresh_interval:"-1"?

2015-04-15 Thread Michael McCandless
On Tue, Apr 14, 2015 at 7:36 AM, Hajime wrote: > Possibly it is IO bound but I don't seem too many io wait on Cpu or write > activity on iostat.By the way,uses ssd and xfs as file system and default > Directory ( I think it becomes MMapDirectory). > Local SSD (not e.g. Amazon's EBS backed by SSD

Re: copy_to not working

2015-04-13 Thread Michael Young
Thank you Nikolas! -- Michael On Mon, Apr 13, 2015 at 12:59 PM, Nikolas Everett wrote: > Yes _but_ its generally better to do those transforms on the source > application. The idea is that you'll often want to return multiple things > from the source so loading the whole th

Re: copy_to not working

2015-04-13 Thread Michael Young
So if I use "stored" : true with the field definition, then it would be possible to fetch that field for display purposes? -- Michael On Mon, Apr 13, 2015 at 11:23 AM, David Pilato wrote: > Nothing. > > Elasticsearch does not modify _source field which contains the docume

Re: copy_to not working

2015-04-13 Thread Michael Young
ble to do with Solr, so I anticipated a similar capability. Thanks again! -- Michael On Mon, Apr 13, 2015 at 11:39 AM, Nikolas Everett wrote: > I want to expand on this a bit - both copy_to and transform only modify > the _indexed_ document, not the source document. The thinking is that you &

Re: refresh_interval:"10s" is better than refresh_interval:"-1"?

2015-04-13 Thread Michael McCandless
; Should I configure something like "*bulk.thread_pool*" size or > "indices.memory.max_shard_index_buffer_size" > ( > https://github.com/elastic/elasticsearch/blob/97559c0614d900a682d01afc241615cf5627fb4c/src/main/java/org/elasticsearch/indices/memory/IndexingMemoryControl

copy_to not working

2015-04-13 Thread Michael Young
t; : "name_phonetic" }, "name_phonetic" : { "type" : "string" } } } } }' curl -XPUT http://esnode:9200/test_index/default/1 -d '{ "name" : "smith" }' When I query against test_

Re: refresh_interval:"10s" is better than refresh_interval:"-1"?

2015-04-13 Thread Michael McCandless
You should see better performance with -1 refresh_interval, because Lucene will flush larger, single segments, causing less merging pressure. Are both of your tests (-1 vs 10s) fully saturating CPU and/or IO on your nodes? If not, then that can explain it: when you have 10s refresh_interval, a se

Re: ES ignores queries through Spark

2015-04-09 Thread Michael Czerny
t's > not recommended. > > The reason this happens is because everything in es-hadoop is parallelized > and the aggregation happens on the > Hadoop/Spark side. > > > On 4/7/15 10:38 PM, Michael Czerny wrote: > > Hi all, > > > > So I managed to get ela

ES ignores queries through Spark

2015-04-07 Thread Michael Czerny
Hi all, So I managed to get elasticsearch-spark_2.10 to work and I can query a database of tweets in Spark. The problem is it seems to ignore my specific queries, for example specifying the size or the fields to return. For example, this is my code: import org.apache.spark._ import org.apache.

Re: [Hadoop] Specifying username/password information for Shield-configured Elasticsearch

2015-04-01 Thread Michael Young
Ok, I found the docs with the referenced settings here: http://www.elastic.co/guide/en/elasticsearch/hadoop/2.1.Beta/configuration.html -- Michael On Wed, Apr 1, 2015 at 4:54 PM, Michael Young wrote: > When I look at: > http://www.elastic.co/guide/en/elasticsearch/hadoop/c

Re: [Hadoop] Specifying username/password information for Shield-configured Elasticsearch

2015-04-01 Thread Michael Young
When I look at: http://www.elastic.co/guide/en/elasticsearch/hadoop/current/configuration.html there are no references to es.net.http.auth.* Am I missing it somewhere? There are references to es.net.proxy.*, but not es.net.http.* that I can see. Thank you again! -- Michael -- Michael On Wed

Re: [Hadoop] Specifying username/password information for Shield-configured Elasticsearch

2015-04-01 Thread Michael Young
Thank you! I was looking in the ES-Hadoop documentation and didn't see these settings. I didn't realize they were in the Shield documentation. I'll give this a shot. -- Michael On Wed, Apr 1, 2015 at 2:44 PM, Costin Leau wrote: > See this section of the docs: > http://ww

[Hadoop] Specifying username/password information for Shield-configured Elasticsearch

2015-04-01 Thread Michael Young
I have Elasticsearch 1.4.4 and Shield 1.0.2 configured in my environment. I'm able to successfully connect to my cluster without issues using Active Directory as my authentication back-end. I'm trying to use es-hadoop to push data to Elasticsearch using Hive and/ Pig. However, there doesn't

_cat/allocation vs df

2015-03-27 Thread Michael Salmon
ignoring the reserved free space. Does anyone know if that is the case? /Michael -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc

Re: corrupted shard after optimize

2015-03-24 Thread Michael McCandless
Hmm, not good. Which version of ES? Do you have a full stack trace for the exception? To run CheckIndex you need to add all ES jars to the classpath. It's easiest to just use a wildcard for this, e.g.: java -cp "/path/to/es-install/lib/*" org.apache.lucene.index.CheckIndex ... Make sure you

Re: Logstash Geohash Question

2015-03-18 Thread Michael
Ok, seems I resolved the issue: In short: You have to use the default output index in logstash: output { elasticsearch { host => "localhost" protocol => "http" } } in ES this results in the index logstash-.MM.DD Since I try to deal with syslogs, I thought it was a good idea to create a ded

Re: Logstash Geohash Question

2015-03-18 Thread Michael
Do I have to add some extra fields to the coordinates field as described in http://www.tagwith.com/question_345822_kibana-3-geojson-vs-kibana4-geohash/ ? Problem here is that the logstash.conf does not seem to like 3-dim arrays ... add_field => [ "[geoip][coordinates][lat_lon]", true ] does not

Re: Logstash Geohash Question

2015-03-18 Thread Michael
Unfortunately not. I have the same problem what David described with his screenshot. The only aggregation that shows up in the left panel for geo coordinates is of type geohash and below there is no field to choose at all. Am Dienstag, 17. März 2015 17:44:32 UTC+1 schrieb Mark Walkom: > > It'll

Re: Logstash Geohash Question

2015-03-17 Thread Michael
What do you mean exactly? These are the fields I'm able to obtain, whereas geoip.coordinates is built by using add_field => [ "[geoip][coordinates]", "%{[geoip][longitude]}" ] add_field => [ "[geoip][coordinates]", "%{[geoip][latitude]}" ] in my logstash.conf. geoip.city_name Warsaw*t*geoip

Re: Shard copying performance

2015-03-17 Thread Michael Salmon
't what I want but I think that the guide should be more explicit as to when the checking is done. On Tuesday, 29 April 2014 15:50:05 UTC+2, Michael Salmon wrote: > > I am having trouble replicating a shard and I cannot see any possible > reason for it. After 15 minutes I get a timeo

Re: "now throttling indexing"

2015-03-13 Thread Michael McCandless
That is the right setting to disable store throttling, but even without throttling writes MB/sec for merges, the merges can still fall behind, leading to index throttling. ES does this to protect the health of the index because too many segments will cause all sorts of trouble. What IO system is

Re: Clear deleted docs

2015-03-13 Thread Michael McCandless
Note that only_expunge_deletes=true will only merge the segment away if it has > 10% delete docs by default, otherwise it leaves the segment as is. See http://www.elastic.co/guide/en/elasticsearch/reference/current/index-modules-merge.html on how to change that 10% default. But it's almost always

Re: OOME When large segments merge

2015-03-12 Thread Michael McCandless
Do you have many fields with norms enabled? Mike McCandless On Thu, Mar 12, 2015 at 1:20 PM, Mark Greene wrote: > I've noticed periodically that data nodes in my cluster will run out of > heap space when large segments start merging. I attached a screenshot of > what ma

Machine Learning / Decision Tree Learning with Elasticsearch

2015-03-12 Thread Michael Sander
decision trees <http://en.wikipedia.org/wiki/Decision_tree_learning>). Any guidance on how to implement advanced machine learning techniques on Elasticsearch would be helpful. Thanks, -Michael -- You received this message because you are subscribed to the Google Groups "elasticsearch"

Re: Elasticsearch import configuration files

2015-03-10 Thread Michael Power
Is there any other way of modifying the elasticsearch configuration without modifying the package manager installed elasticsearch.yml file? On Monday, March 9, 2015 at 4:18:19 PM UTC-7, Mark Walkom wrote: > > You cannot use an array in path.conf. > > On 9 March 2015 at 15:02, M

Elasticsearch import configuration files

2015-03-09 Thread Michael Power
Hello, Is there anyway to reconfigure elasticsearch without changing the main /etc/elasticsearch/elasticsearch.yml file? We want to setup an elasticsearch.yml file that is common for all our test environments. Then we want an additional file that is specific to the environment. That environm

Re: Missing SegmentInfo files after upgrade question (Issue 7430)

2015-03-05 Thread Michael McCandless
ought. Unfortunately, I did minimal replication and the > other copy was wiped out due to disk failure. Is there a way to run that > index without the bad shard (4 out of 5 still good)? I'm gonna guess no. > > Thanks, > Kris. > > On Thu, Mar 5, 2015 at 11:23 AM, Michael M

Re: Missing SegmentInfo files after upgrade question (Issue 7430)

2015-03-05 Thread Michael McCandless
That one shard is likely hosed. But if you a good replica of that shard then you may be able to delete the hosed shard and let ES recover from the good one. Or restore from snapshot... Mike McCandless http://blog.mikemccandless.com On Thu, Mar 5, 2015 at 2:13 PM, krispyjala wrote: > Hey all,

Re: elasticsearch Index throttling info message comes in es 1.3.1 version

2015-03-05 Thread Michael McCandless
This means Lucene's segment merges can't keep up. Try increasing or disabling the store level IO throttling: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules-store.html Mike McCandless http://blog.mikemccandless.com On Thu, Mar 5, 2015 at 5:53 AM, shanmuthu83

Saving Kibana Dashboard on Ubuntu

2015-03-03 Thread Michael Clayton
When I try to save a Kibana dashboard, I get "restricted" I created a user called kibana into the kibana group, and set permissions for the kibana user on my NGINX folder : /var/www/kibana3/ What am I doing wrong? -- You received this message because you are subscribed to the Google Groups "e

Re: Decreasing Heap Size Results in Better TPS, How can this happen??

2015-02-18 Thread Michael McCandless
Smaller JVM heap means more free RAM for the OS to cache hot pages from your index ... in general you should only give the JVM as much as it needs (will ever need) and a bit more for safety, and give the rest to the OS so it can put hot parts of your index in RAM. Mike McCandless http://blog.mike

Re: Read past EOF exception on .tis and .fdt file

2015-02-18 Thread Michael McCandless
ES has the index.shard.check_on_startup to run CheckIndex on startup of a shard: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules.html Mike McCandless http://blog.mikemccandless.com On Wed, Feb 18, 2015 at 1:17 PM, Jilles van Gurp wrote: > plus 1 for a less i

Page Load/Response time by Page - How?

2015-02-13 Thread Michael Kimber
We have the standard stack installed (LEK). We have just added IIS logs to this which is giving us some great information. However I am having trouble working out how to get mean Page Load/Response time for each of the pages that are loaded i.e. Page Mean Load Time -- --

Re: Can't get Documents Deleted below 40% - performance issues - help needed

2015-02-01 Thread Michael McCandless
It's normal to see 40-60% deleted docs if you frequently update existing documents. See this recent blog post I wrote for some details: http://www.elasticsearch.org/blog/lucenes-handling-of-deleted-documents/ Mike McCandless http://blog.mikemccandless.com On Sun, Feb 1, 2015 at 3:50 PM, Mark Wa

RE: Massive perf difference with filter versus filtered query

2015-01-30 Thread Michael Giagnocavo
query planner could improve, if it could realise one way is better than the other but produces equivalent results. Anyways, sticking everything in filtered queries fixed it all, so, hey, win! Maybe the docs should have a small warning note ;). -Michael From: elasticsearch@googlegroups.com

Massive perf difference with filter versus filtered query

2015-01-27 Thread Michael Giagnocavo
{ "term": { "ProjectId": 4191152 } } } }, "from": 0, "size": 50, "sort": [], "facets": {} } What am I misunderstanding? I've got 80M documents, 30 of which match this query, so the only thing I can guess is that somehow

Re: Confusing results from fuzzy query (1 term, 1 field)

2015-01-27 Thread Michael McCandless
Looks like this was answered on StackOverflow? Mike McCandless http://blog.mikemccandless.com On Mon, Jan 26, 2015 at 7:54 PM, Steve Pearlman wrote: > For a well formatted example, please see: > http://stackoverflow.com/questions/28161480/fuzzy-not-functioning-as-expected-one-term-search-see-e

Re: Better understanding Lucene/Shard overheads

2015-01-24 Thread Michael McCandless
On Fri, Jan 23, 2015 at 8:42 PM, Drew Kutcharian wrote: > Thanks Mike. I’m still a bit unclear on these comments: > > IndexReader requires some RAM for each segment to hold structures like > live docs, terms index, index data structures for doc values fields, and > holds open a number of file des

Re: Better understanding Lucene/Shard overheads

2015-01-23 Thread Michael McCandless
There is definitely a non-trivial per-index cost. >From Lucene's standpoint, ES holds an IndexReader (for searching) and IndexWriter (for indexing) open. IndexReader requires some RAM for each segment to hold structures like live docs, terms index, index data structures for doc values fields, and

Re: Regexp Filter boost

2015-01-19 Thread Michael Irwin
uot;: { "query": "frank", "max_expansions": 5 } } } ] } } } when using the analyzer as explained in http://www.elasticsearch.org/blog/starts-with-phrase-matching/ On Monday, January 19, 2015 at 1

Regexp Filter boost

2015-01-19 Thread Michael Irwin
Hello, I'm trying to figure out if it's possible to boost hits based on a regexp. For example, searching through records with user's names, I'd like to boost those that start with the query. I've tried a query like the following but it doesn't work like I'd like: { "query": { "function_s

Re: scrolling and lucene segments

2015-01-16 Thread Michael McCandless
The segments are effectively ref counted, so once the last scroll still using an old (already merged away) segment is deleted, it will be removed. Mike McCandless http://blog.mikemccandless.com On Fri, Jan 16, 2015 at 4:15 AM, Jason Wee wrote: > > http://www.elasticsearch.org/guide/en/elastics

Re: Migrating lucene drill sideways query to elasticsearch

2015-01-16 Thread Michael McCandless
I think you must do separate filters to compute the sideways facet counts. Mike McCandless http://blog.mikemccandless.com On Fri, Jan 16, 2015 at 10:15 AM, Bo Finnerup Madsen wrote: > Hi, > > I am trying to migrate a project from Lucene to elasticsearch, and for the > most part it is a pleasur

Re: Postings highlighter throws exception on some queries

2015-01-14 Thread Michael McCandless
Super, thanks for brining closure. Mike McCandless http://blog.mikemccandless.com On Wed, Jan 14, 2015 at 9:59 AM, Peter Bowyer wrote: > Hi Michael, > > Thanks for your response - it turned out to be user error. I'd set up the > mappings correctly, but a few records in my

Re: Postings highlighter throws exception on some queries

2015-01-13 Thread Michael McCandless
This looks like a bug: clearly from your mappings, field "content" was indexed with offsets, yet the error message (incorrectly) claims otherwise. Does the bug still happen on the last 1.4.x release (1.4.2)? If you search only on the content/content.english field does the error still happen? (i.

Re: Bucket query results | top hits performance

2015-01-07 Thread Michael Irani
effect on performance. The payload for each of our documents is about 5k. Michael. On Tuesday, January 6, 2015 11:20:08 PM UTC-8, Martijn v Groningen wrote: > > Hi Michael, > > In general the more buckets being returned by the parent aggregator the > top_hits is nested in, the m

Re: Bucket query results | top hits performance

2015-01-06 Thread Michael Irani
: { "terms": { "field": "fingerprint", "size": 100 } } } } The result's a bit too big to paste here. Anything specific about it you want me to expose? Michael. On Tuesday

Bucket query results | top hits performance

2015-01-06 Thread Michael Irani
s": { "top_tag_hits": { "top_hits": { "size": 1, "_source": { "include": [ "title" ]

RE: Field Mapping

2015-01-06 Thread Michael Giagnocavo
ginal JSON you put in, regardless of mapping. -Michael -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com.

Re: mass update of records - dns resolution

2015-01-02 Thread Michael
Oooo - That's a good idea Lance! On Thu, Jan 1, 2015 at 11:22 PM, Lance A. Brown wrote: > Mike Sheinberg wrote on 1/1/2015 11:00 PM: > > For the background, I'm using logstash as a netflow collector --> ES. I > was > > previously using the dns filter of logstash to reverse lookup IP fiel

Re: Guaranteed upper bound for near real time search

2015-01-02 Thread Michael McCandless
The 1s refresh_interval means that ES will open (takes some time) and warm (takes some more time) a new NRT reader, and after that reader is done opening, 1s later it will open again. So it's possible in your case it takes 2s to open + warm a new NRT reader (check the node's logs). But 2s is quit

Re: Data is not saved equally in each datanode

2014-12-25 Thread Michael deMan (ES)
I don’t know. > On Dec 25, 2014, at 10:19 PM, Xiaoliang Tian wrote: > > Thanks,And M using 0.9.13.can it enable auto-balancing manually? > > 2014-12-26 14:17 GMT+08:00 Michael deMan (ES) <mailto:elasticsea...@deman.com>>: > http://www.elasticsearch.org/guide/en/elas

Re: Data is not saved equally in each datanode

2014-12-25 Thread Michael deMan (ES)
s the API url exactly > > 2014-12-26 12:44 GMT+08:00 Michael deMan (ES) <mailto:elasticsea...@deman.com>>: > Also, higher shards will help with the new indexes but not the existing ones. > You can use the API to force ES to move shards off your ‘full’ disk over to > the new one

Re: Data is not saved equally in each datanode

2014-12-25 Thread Michael deMan (ES)
PM, Michael deMan (ES) > wrote: > > Try increasing the number of shards - maybe to 20 or 40. > >> On Dec 25, 2014, at 8:10 PM, Xiaoliang Tian > <mailto:xiaoliang.t...@gmail.com>> wrote: >> >> index number depend on how many days past,the index name is t

Re: Data is not saved equally in each datanode

2014-12-25 Thread Michael deMan (ES)
Try increasing the number of shards - maybe to 20 or 40. > On Dec 25, 2014, at 8:10 PM, Xiaoliang Tian wrote: > > index number depend on how many days past,the index name is the day epoch. > because we use elasticseach for log storage > shard number is 8 > replica is 1 > > 2014-12-25 15:49 GM

Multilevel aggregation of terms when using nested objects.

2014-12-19 Thread Michael
We have a document format where there are a lot of nested objects and wants to look at e.g. location / language distribution. When performing a aggregation where we have 2 levels of nested objects we get data only from the first level but not from the second level, it is always 0 in count. If

Elasticsearch Indexing slows down after having indexed 1000 Documents

2014-12-17 Thread Michael Hoppe
Hi all, I am testing indexing with elasticsearch 1.3.2 on my Ubuntu-PC with 8GB of RAM and an SSD-Disk. export ES_HEAP_SIZE=5g in elasticsearch.yml I set bootstrap.mlockall: true all other attributes are commented out First i am creating an index with curl -XPUT 'http://localhost:9200/mih' th

Re: Write amplification and SSD

2014-12-16 Thread Michael McCandless
It means that ES works well with SSDs since Lucene is write-once under the hood, so it is "easy" on the SSDs, vs other approaches which do random writes to different places causing the higher write amplification. But, this is balanced with the fact that Lucene must also periodically merge the segm

ANN Elastisch 2.1.0 is released

2014-12-07 Thread Michael Klishin
Elastisch [1] is a small, feature complete Clojure client for ElasticSearch   that provides both REST and native clients. Release notes:   http://blog.clojurewerkz.org/blog/2014/12/07/elastisch-2-dot-1-0-is-released/ 1. http://clojureelasticsearch.info -- @michaelklishin, github.com/michaelklis

Re: Completion suggester troubles

2014-12-05 Thread Michael Irwin
(i.e. have them always in the format artist - > songname). This is optional. Note: *The result is de-duplicated if > several documents have the same output*, i.e. only one is returned as > part of the suggest result. > > Michele > > On Friday, 5 December 2014 15:42:54 UTC+1, Mi

Completion suggester troubles

2014-12-05 Thread Michael Irwin
I'm using the Completion suggester with v1.4.1, and I'm noticing an issue regarding case sensitivity. For example, I have this mapping: { "mappings": { "artist": { "properties": { "suggest": { "type": "completion", "payloads": true }, "name"

Re: slow performance on phrase queries in should clause

2014-12-05 Thread Michael McCandless
It's likely the should is (stupidly) being fully expanded before being AND'd with the must ... but there are improvements here (XBooleanFilter.java) to this in master, are you able to test and see if it's still slow? Mike McCandless http://blog.mikemccandless.com 2014-12-04 19:21 GMT-05:00 Kiree

Re: Good merge settings for interactively maintained index

2014-12-04 Thread Michael McCandless
5:30 AM, Michael McCandless wrote: > 25-40% is definitely "normal" for an index where many docs are being > replaced; I've seen this go up to ~65% before large merges bring it back > down. > > On 2) there may be some improvements we can make to Lucene default > Tiere

Re: Good merge settings for interactively maintained index

2014-12-04 Thread Michael McCandless
25-40% is definitely "normal" for an index where many docs are being replaced; I've seen this go up to ~65% before large merges bring it back down. On 2) there may be some improvements we can make to Lucene default TieredMergePolicy here, to reclaim deletes for the "too large" segments ... I'll ha

Re: Periodic high load after upgrading from 0.90 to 1.3.4

2014-12-02 Thread Michael McGuinness
er 0.90 Lucene segments with new segments. By > doing that, the segments are upgraded in the background to a new Lucene > codec version. > > Jörg > > On Tue, Dec 2, 2014 at 5:15 PM, Michael McGuinness > wrote: > >> I've done some poking around with hot_threads dur

Re: Periodic high load after upgrading from 0.90 to 1.3.4

2014-12-02 Thread Michael McGuinness
n/elasticsearch/reference/current/cluster-nodes-hot-threads.html#cluster-nodes-hot-threads > > On 2 December 2014 at 08:23, Michael McGuinness > wrote: > >> I'm running a set of very simple elasticsearch instances as part of a >> graylog2 deployment. Each set has a single e

Periodic high load after upgrading from 0.90 to 1.3.4

2014-12-01 Thread Michael McGuinness
I'm running a set of very simple elasticsearch instances as part of a graylog2 deployment. Each set has a single elasticsearch node (its own cluster). I recently upgraded these sets to 1.34 as part of an upgrade to the graylog2 server, and I immediately noticed some periodic high loads on the m

Re: Odd behavior of bulk loading speed - good riddle?

2014-11-24 Thread Michael McCandless
Which version of ES? This is probably not related to the slowdown, but when using scripts for updating docs, it's best to keep the script constant, and use params for the changing values (all the $vars in your PHP script). This means ES will compile the script once and reuse that, vs paying compi

Re: Docker configuration to allow multicast across machines

2014-11-18 Thread Michael Delaney
you may not be able to do this, but using CoreOS & etcd i've setup a cluster using multicast. roughly following this: http://mattupstate.com/coreos/devops/2014/06/26/running-an-elasticsearch-cluster-on-coreos.html On Monday, 17 November 2014 12:16:28 UTC, Robin Clarke wrote: > > We are putting

Re: How can we get all ID's which are generated by Elastic Search for each record while using bulk insert ?

2014-11-17 Thread Michael McCandless
The bulk response tells you the id assigned to each indexed doc. See http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-bulk.html Mike McCandless http://blog.mikemccandless.com On Mon, Nov 17, 2014 at 1:56 PM, Subbarao Kondragunta < subbu2perso...@gmail.com> wrote: > Ho

Re: Delete all of type across all indices

2014-11-13 Thread Michael Irwin
I finally figured it out with the help of someone on SO. The query that worked was DELETE logstash*/_query?q=_type:error On Monday, November 10, 2014 2:31:51 PM UTC-5, Michael Irwin wrote: > > I'm using logstash to store logs. I'd like to delete all logstash entries > of type

Re: hardware recommendation for dedicated client node

2014-11-11 Thread Michael Hart
I have dedicated client nodes for some really intense queries and aggregations. Clients typically have 2GB of heap. Our experience is that 2GB of Heap is sufficient, the client node doesn't do a whole lot. The bulk of the work is done on the data nodes. cheers mike On Monday, November 10, 201

Re: Marvel creating disk usage imbalance

2014-11-11 Thread Michael Hart
I think it's related to this: https://github.com/elasticsearch/elasticsearch/pull/8270 which I believe was released with 1.4. We see the same thing, with hot spots on some nodes. You can poke the cluster to rebalance itself, which that #8270 fixes permanently, using "curl -XPOST localhost:9200

NRT Get API - VS - IDS Filters

2014-11-10 Thread michael
(via some local testing) that the IDs query/filter does not support realtime... Is there a way to make this custom filter work with NRT? If not, would different approach should I be considering? Here's the filter plugin so far: https://gist.github.com/schonfeld/5a487b34e786f0c52244 Th

Delete all of type across all indices

2014-11-10 Thread Michael Irwin
I'm using logstash to store logs. I'd like to delete all logstash entries of type 'error'. I checked out the Delete by Query API, but I can't seem to figure out how to do what I want in this situation. Help! -- You received this message because you are subscribed to the Google Groups "elastics

Re: _all field getting corrupted, no mapping changes possible anymore

2014-11-07 Thread Michael McCandless
On Fri, Nov 7, 2014 at 6:41 AM, wrote: > thank you for your fast reply. > > this actually happened with 1.4.0Beta1 so I am not sure if it's the same > issue. > Sorry, what I mean is that this issue, which adds checking for mapping conflicts in the _all field and was fixed in 1.4.0Beta1, causes t

Re: TokenStream contract violation: close() call missing

2014-11-07 Thread Michael McCandless
On Thu, Nov 6, 2014 at 3:05 PM, Richard Tier wrote: > Thanks for reply. > > The autocomplete analyzer: > > { > 'analysis': { > 'analyzer': { > 'autocomplete': { >'type': 'custom', >'tokenizer': 'standard', >'filter'

Re: _all field getting corrupted, no mapping changes possible anymore

2014-11-07 Thread Michael McCandless
Hmm this is likely due to https://github.com/elasticsearch/elasticsearch/pull/7377 (fixed in 1.4.0Beta1) which was done to prevent conflicting mapping changes to the _all field. What change are you trying to make, that hits this error? Is there a stack trace? Mike McCandless http://blog.mikem

Re: TokenStream contract violation: close() call missing

2014-11-06 Thread Michael McCandless
Hmm, not good. What does your "autocomplete" analyzer look like? Can you post the full stack trace? Mike McCandless http://blog.mikemccandless.com On Wed, Nov 5, 2014 at 7:05 PM, Richard Tier wrote: > An internal error happens when I do a "suggest" query. I get "TokenStream > contract violat

Re: Find the 100 closest neighbors to a point (lng, lat)

2014-11-03 Thread Michael Lumbroso
Anyone has an idea on which tool is the best to perform this task? This would be of a major help :-) Thanks for your answers! Le vendredi 24 octobre 2014 11:11:24 UTC+2, Michael Lumbroso a écrit : > > Hello, > > sorry if this question has already been asked, but I didn't find

query, but only in records having a specific category

2014-11-01 Thread Michael Hnat
Hi, I'm new to elasticsearch, but already very impressed. But actually I stuck with the following problem: This is part of my mapping: "mappings" : { "content" : { "properties" : { "contentid" : { "type" : "string", "index" : "not_analyzed", "store" : "yes" },

Re: Find the 100 closest neighbors to a point (lng, lat)

2014-10-28 Thread Michael Lumbroso
Hi Adrien, thanks for your answer, but actually, I need something really optimized, so I guess ES is not the way to go. Can you think of better ways to actually do that? Thanks 2014-10-27 18:27 GMT+01:00 Adrien Grand : > Hi Michael, > > You can do that using geo-distance sortin

Sorting weirdness

2014-10-25 Thread Michael Irwin
I have a mapping like this: "venue": { "type": "nested", "include_in_parent": true, "properties": { "name": { "type": "string" } } If I'm sorting by 'venue.name' ascending, why would a name like 'Terminal 5' be

Find the 100 closest neighbors to a point (lng, lat)

2014-10-24 Thread Michael Lumbroso
re an efficient way to do that? (performance is the most important parameter here) Are there plugins/libraries to help me do so? Are there better options than Elasticsearch for this very problem? Thanks for your help, and keep up the good work on this wonderful tool Michael -- You received this me

DELETE broken on _query

2014-10-21 Thread Michael Giagnocavo
eld to _id where the field contains a / literal) then want to go update it, that's a problem. -Michael -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an em

Re: Paging With Completion Suggester

2014-10-21 Thread Michael Delaney
so "suggest" does not support "from" ?? i guess this feature is not meant to be used with any reasonably sized data set?? On Tuesday, 21 October 2014 10:28:40 UTC+1, Michael Delaney wrote: > > Hi, > > I want to page my completion suggester results. > >

Re: Paging With Completion Suggester

2014-10-21 Thread Michael Delaney
so "suggest" does not support "from" ?? i guess this feature is not meant to be used with any reasonably sized data set?? On Tuesday, 21 October 2014 13:19:37 UTC+1, Michael Delaney wrote: > > For now i'm just going to go with <<< Previous N

Re: Paging With Completion Suggester

2014-10-21 Thread Michael Delaney
For now i'm just going to go with <<< Previous Next >>> buttons. if options.length >= pagesize then we have more. if page > 1, then we have less. On Tuesday, 21 October 2014 10:28:40 UTC+1, Michael Delaney wrote: > > Hi, > > I want to page

Paging With Completion Suggester

2014-10-21 Thread Michael Delaney
Hi, I want to page my completion suggester results. at present I have a request like the following: var skip = (pageNumber - 1) * this.pageSize; this.client.suggest({ index: 'books', type: 'book', body: { search_books : { text : te

Re: How does the date_histogram aggregation choose its buckets? Is this tunable?

2014-10-20 Thread Michael Herold
14, is 545 30-day-buckets from the UNIX epoch. Huzzah! I think you're right about the pre_offset and post_offset. I should be able to calculate the needed offset(s) to get the effect that I want. Thank you for taking the time to explain this to me. I appreciate it! Best, Michael On Sun, Oc

  1   2   3   >