Re: term suggester : strange results.

2014-12-04 Thread Nikolas Everett
On Thu, Dec 4, 2014 at 7:27 AM, DH ciddp...@gmail.com wrote: Hi, everyone, I'm trying to figure put some discrepencies (I think) in the results of my suggesters, with ES V0.90.5. My indices are big and can contain a wide array of language. when I do this (NB : tomate is the french for

Re: Sustainable way to regularly purge deleted docs

2014-12-03 Thread Nikolas Everett
On Wed, Dec 3, 2014 at 8:32 AM, Jonathan Foy the...@gmail.com wrote: Interesting...does the very large max_merged_segment not result in memory issues when the largest segments are merged? When I run my the cleanup command (_optimize?only_expunge_deletes) I see a steep spike in memor as each

Re: ES seems to be aliasing the byte type to the short type

2014-12-03 Thread Nikolas Everett
Sounds like a bug. If I had to guess I'd say Elasticsearch is rounding the type up to support unsigned bytes and not doing the range check but I haven't looked. Nik On Wed, Dec 3, 2014 at 9:34 AM, Damien Montigny damien.monti...@gmail.com wrote: Anyone ? Le jeudi 20 novembre 2014 16:27:44

Re: highlight content from crawl data from manifoldcf to ES

2014-12-02 Thread Nikolas Everett
On Mon, Dec 1, 2014 at 10:42 PM, N Bijalwan ahcir...@gmail.com wrote: We are using manifolcf to crawl web pages and then index them through Elastic search. Is there way to get only few lines that contain the searched keyword in response of elastic search query instead of whole content. Like

Re: highlight content from crawl data from manifoldcf to ES

2014-12-02 Thread Nikolas Everett
for performance. naveen On Tuesday, 2 December 2014 19:05:03 UTC+5:30, Nikolas Everett wrote: On Mon, Dec 1, 2014 at 10:42 PM, N Bijalwan ahci...@gmail.com wrote: We are using manifolcf to crawl web pages and then index them through Elastic search. Is there way to get only few lines

Re: Wrapping query text into results into match all

2014-12-02 Thread Nikolas Everett
Its a range query on all terms less than products. You'll want to use match instead of query_string and you won't see weird stuff like that. On Tue, Dec 2, 2014 at 11:02 AM, Anthony Andrushchenko amrma...@gmail.com wrote: Hi everybody, I have encountered very strange behaviour of the search

Re: Decommission of multiple nodes

2014-12-02 Thread Nikolas Everett
If you mean allocation filtering (like cluster.routing.allocation.exclude._ip) then you just need to specify all three ip addresses with commas between them. Nik On Tue, Dec 2, 2014 at 4:00 PM, Mark Walkom markwal...@gmail.com wrote: What do you mean by decommission, there is no API call for

Re: Sustainable way to regularly purge deleted docs

2014-12-02 Thread Nikolas Everett
I've had some issues with high IO exacerbated by lots of deleted docs as well. I'd get deleted docs in the 30%-40% range on some indexes. We attacked the problem in two ways: 1. Hardware. More ram and better SSDs really really helped. No consumer grade SSDs for me. 2. Tweak some merge

Re: heap size/filter cache

2014-12-01 Thread Nikolas Everett
Cache keys are pretty much the input filter by default. Specifying your own keys can certainly help. On Dec 1, 2014 7:05 AM, Audrius Bugas bugas.audr...@gmail.com wrote: Thank you for answer. We had lot's of cache evictions so we increased cache size. Should I use custom cache keys and try to

Re: Disk Watermark issues with 1.4.0

2014-12-01 Thread Nikolas Everett
On Mon, Dec 1, 2014 at 11:28 AM, Chris Neal chris.n...@derbysoft.net wrote: Hi all, I'm running 1.4.0. and using the default settings for: cluster.routing.allocation.disk.watermark.low and cluster.routing.allocation.disk.watermark.high I hit an OOME which caused me to need to cycle a

Re: Reliability of ElasticSearch

2014-12-01 Thread Nikolas Everett
99.9% uptime allows almost nine hours a year. You could totally manage that if you have someone on call 24/7. Elasticsearch itself is reasonably stable especially if you don't expose very complex queries to everyone. We do and we only have one person on call (me) and we've probably just made

Re: Java client - setTimeout vs actionGet(timeout)

2014-11-30 Thread Nikolas Everett
Timeouts are server side and best effort. I believe action get(timeout) is client side. I use the http client but use both and set the server side timeout to lower than the client side timeout. The server side timeout should return partial results if possible. On Nov 30, 2014 10:41 AM, Ron Sher

Re: Java client - setTimeout vs actionGet(timeout)

2014-11-30 Thread Nikolas Everett
Default for server side timeout is none and I don't know client side timeout. I imagine it is a long time. On Nov 30, 2014 1:46 PM, Ron Sher ron.s...@gmail.com wrote: Thanks for the info. Do you know what are the defaults? On Sunday, November 30, 2014 5:53:49 PM UTC+2, Nikolas Everett wrote

Re: is it possible to use transform scripts in mappings to alter document _id?

2014-11-28 Thread Nikolas Everett
On Nov 28, 2014 10:20 PM, Reason rea...@bazaarvoice.com wrote: The Elasticsearch documentation is always frustratingly silent on the things I seem to need to accomplish to make life easier. Sorry you feel that way. If you are willing to fix the documentation those pull requests are typically

Re: elasticsearch deployment advise

2014-11-27 Thread Nikolas Everett
We have 128gb on some nodes and run 30gb heaps. Lucene memory maps files so the extra memory would be put to good use. The 32gb memory limit comes from the JVM compressing pointers. It can't compress after 32 and so you see everything expand in size. On Nov 27, 2014 4:18 PM, Denis J. Cirulis

Re: highliting the field

2014-11-26 Thread Nikolas Everett
Highlighting extracts terms from the query and your query contains the words apple app store. You can fix this by providing a highlight_query or by setting another setting who's name I've forgotten. I believe it is require_field_match or something. On Nov 26, 2014 6:27 AM, pavan.530530

Re: Highlighting is not working

2014-11-26 Thread Nikolas Everett
Can you post an example document? Can you post your mapping? Mapping is important because depending on how your mapping is set up you'll get a totally different highlighter implementation. Nik On Wed, Nov 26, 2014 at 11:26 AM, Deepak Mehta hopeligh...@gmail.com wrote: I am using Elastic

Re: Main database

2014-11-26 Thread Nikolas Everett
There isn't such a thing. There are rivers that try to sync other sources with Elasticsearch but I'm not a big fan. I'd let your application keep the index up to date. Nik On Wed, Nov 26, 2014 at 11:26 AM, Lior Goldemberg lio...@gmail.com wrote: hi all, which is the most common main db,

Re: ES vs. Lucene memory

2014-11-26 Thread Nikolas Everett
Lucene runs in the same JVM as Elasticsearch but (by default) it mmaps files and then iterates over their content inteligently. That means most of its actual storage is off heap (its a java buzz-phrase). Anyway, Linux will serve reads from mmaped files from its page cache. That is why you want

Re: ES vs. Lucene memory

2014-11-26 Thread Nikolas Everett
, 2014 1:01:02 PM UTC-6, Nikolas Everett wrote: Lucene runs in the same JVM as Elasticsearch but (by default) it mmaps files and then iterates over their content inteligently. That means most of its actual storage is off heap (its a java buzz-phrase). Anyway, Linux will serve reads from

Re: Is re-election/assignment of the master node possible?

2014-11-26 Thread Nikolas Everett
On Wed, Nov 26, 2014 at 3:47 PM, Erik theRed j.e.redd...@gmail.com wrote: Is there any notion triggering a re-election of the master node? I'm currently running 1.2.4, and I have an instance that is scheduled for retirement (my favorite!) and it just so happens that it's my master node. What

Re: what data are cached exactly in filter cache?

2014-11-24 Thread Nikolas Everett
Its the set of documents that match the filter. Its cached per segment which are immutable so they can be effectively kept until the segment is removed or you run out of space in the cache. I believe the filter cache stores instances of Accountable

Re: cost of automatics refresh

2014-11-22 Thread Nikolas Everett
The cost of automatic refresh if you haven't written anything is pretty close to 0. I believe elasticsearch keeps a list of all indexes that have been written to rather than checking each one. On Nov 19, 2014 3:04 PM, Jinyuan Zhou zhou.jiny...@gmail.com wrote: I am curious about how much cost

Re: Why ES node starts recovering all the data from other nodes after reboot?

2014-11-22 Thread Nikolas Everett
Tiny shards have more ever head and aren't going to score results as accurately. On Nov 22, 2014 2:04 PM, Yves Dorfsman y...@zioup.com wrote: On 2014-11-22 09:35, Otis Gospodnetic wrote: Hi Konstantin, Check out http://gibrown.com/2014/11/19/elasticsearch-the-broken-bits/ Good writing!

Re: Why ES node starts recovering all the data from other nodes after reboot?

2014-11-21 Thread Nikolas Everett
everything! On 2014-11-20 22:02, Nikolas Everett wrote: The thing is that this is a disk level operation. It pretty much rsyncs the files from the current master shard to the node when it comes back online. This would be OK if the replica shards matched the master but that is only normally

Re: Issue with higlighting and analyzed tokens

2014-11-20 Thread Nikolas Everett
I remember there was a github issue about path specified analyzers and highlighting but I can't find it. Reading it may be your best bet. On Thu, Nov 20, 2014 at 5:14 AM, fe...@squirro.com wrote: Hi, I am experiencing an unexpected result with highlighting when using an _analyzer path in

Re: Why ES node starts recovering all the data from other nodes after reboot?

2014-11-20 Thread Nikolas Everett
The thing is that this is a disk level operation. It pretty much rsyncs the files from the current master shard to the node when it comes back online. This would be OK if the replica shards matched the master but that is only normally the case if the shard was moved to the node after it was mostly

Re: I can't find anything after hypens or underscores

2014-11-12 Thread Nikolas Everett
On Wed, Nov 12, 2014 at 8:15 AM, Alessandro Bonfanti bnf@gmail.com wrote: Hi, I'm very newbie on ElasticSearch. I'm try to indexing a set of biological data. There are some fields like 'gene_id' or 'gene_shortname' that should be processed as literal strings. When I try to search for

Re: I can't find anything after hypens or underscores

2014-11-12 Thread Nikolas Everett
On Wed, Nov 12, 2014 at 11:13 AM, Alessandro Bonfanti bnf@gmail.com wrote: Il 12/11/2014 15:25, Nikolas Everett ha scritto: On Wed, Nov 12, 2014 at 8:15 AM, Alessandro Bonfanti bnf@gmail.com wrote: Hi, I'm very newbie on ElasticSearch. I'm try to indexing a set of biological

Re: Different hardware capacity

2014-11-11 Thread Nikolas Everett
Elasticsearch doesn't let you weight nodes for balance and the disk space allocation decider really just puts soft limits on the amount of space elasticsearch can take up per machine. There really isn't anything to do it automatically. You could use a combination of allocation awareness,

Re: Filter cache - based on full set or result of previous filters?

2014-11-11 Thread Nikolas Everett
Term filters already use lucene's term dictionary as an index. Almost everything Elasticsearch does uses it. In fact term queries are so fast that Elasticsearch switched them from being cached by default to uncached by default (don't have version number handy). For the most part I wouldn't worry

Re: how to search non indexed field in elasticsearch

2014-11-10 Thread Nikolas Everett
]]; } Please help. Thanks in advance. Regards Ramky On Friday, November 7, 2014 5:49:04 PM UTC+5:30, Nikolas Everett wrote: The first example on http://www.elasticsearch.org/guide/en/elasticsearch/ reference/current/query-dsl-script-filter.html#query-dsl-script-filter should just if you replace

[ANN] Experimental Highlighter 0.0.13 released

2014-11-10 Thread Nikolas Everett
I released version 0.0.13 of the experimental highlighter. This version a problem with Lucene flavored Regular Expressions that can cause compiling them to consume amazing amounts of memory. It still targets Elasticsearch 1.3.X. Cheers, Nik -- You received this message because you are

Re: [ANN] Experimental Highlighter 0.0.13 released

2014-11-10 Thread Nikolas Everett
On Mon, Nov 10, 2014 at 10:54 AM, Nikolas Everett nik9...@gmail.com wrote: I released version 0.0.13 of the experimental highlighter. This version a problem with Lucene flavored Regular Expressions that can cause compiling them to consume amazing amounts of memory. It still targets

[ANN] Trigram accelerated regex queries for Elasticsearch version 0.0.2 released

2014-11-10 Thread Nikolas Everett
On Friday I released version 0.0.2 of an Elasticsearch plugin to perform accelerated regular expression search against source documents. This version has stability and speed improvements for complex queries. It: 1. Prevents the compilation step from consuming tons and tons of memory. Now it'll

Re: hardware recommendation for dedicated client node

2014-11-10 Thread Nikolas Everett
I don't use client nodes so I can't speak from experience here. Most of the gathering steps I can think of amount to merging sorted lists which isn't particularly intense. I think aggregations (another thing I don't use) can be more intense at the client node but I'm not sure. My recommendation

Re: Elasticsearch rolling restart problem

2014-11-10 Thread Nikolas Everett
You've followed the right procedure. The problem is that Elasticsearch doesn't always restore the shards back on the node that they came from. If the restarted shard and the current master shard have diverge at all it'll have to sync files _somewhere_ to make sure that the restarted shard gets

Re: Case sensitive/insensitive search combination in phrase/proximity query

2014-11-10 Thread Nikolas Everett
I don't believe there is a way to do that now. On Mon, Nov 10, 2014 at 12:22 PM, Zdeněk zdenek.s...@gmail.com wrote: Hi, is there any way how to search part of phrase as case-sensitive and part as case-insensitive? The only solution I found for case sensitive/insensitive querying is to

Re: Elasticsearch rolling restart problem

2014-11-10 Thread Nikolas Everett
cluster (100 nodes) then the cluster is permanently rebalacing (consuming network and performance) as nodes crash frequently. Is it the same if i put the index in read only mode ? Le lundi 10 novembre 2014 17:58:19 UTC+1, Nikolas Everett a écrit : You've followed the right procedure

Re: Infinite scroll best practices with ES

2014-11-09 Thread Nikolas Everett
Scan/scroll queries use too much memory to serve all clients. They also keep files around on disk after they would normally be deleted. On Nov 9, 2014 12:12 PM, pulkitsinghal pulkitsing...@gmail.com wrote: In this discussion, I will rely on this page for reference:

Re: how to search non indexed field in elasticsearch

2014-11-07 Thread Nikolas Everett
The first example on http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-script-filter.html#query-dsl-script-filter should just if you replace with .equals On Fri, Nov 7, 2014 at 2:11 AM, ramky panguluri.ramakris...@gmail.com wrote: Thanks Nikolas Everett for your

Re: I have a few million users, and I want to index for per user, but.....

2014-11-07 Thread Nikolas Everett
Just add the user's id or name to each document and add a term filter to all of their queries. Then use routing http://www.elasticsearch.org/blog/customizing-your-document-routing/. At least that is the canonical way to solve it. On Fri, Nov 7, 2014 at 3:09 AM, David shi fack...@gmail.com

Re: how to search non indexed field in elasticsearch

2014-11-06 Thread Nikolas Everett
You can totally use a script filter checking the field against _source. Its super duper duper slow but you can do it if you need it rarely. On Thu, Nov 6, 2014 at 11:13 AM, Ivan Brusic i...@brusic.com wrote: You cannot search/filter on a non-indexed field. -- Ivan On Wed, Nov 5, 2014 at

Re: Elasticsearch Development: Subsets of Documents

2014-11-04 Thread Nikolas Everett
If you implement your tweets that mention apple as a filter then it can be cached. Elasticsearch's cache is per segment so it should stay sane as you add more documents. That might be enough to make that fast. The other option is to walk those 1,000,000 million documents with a scan/scroll

Re: Avoid loading plugin in tests

2014-11-03 Thread Nikolas Everett
This has been in flux a bit lately but the last time I checked you had to intentionally load plugins by adding something like this to your test: /** * Enable plugin loading. */ @Override protected Settings nodeSettings(int nodeOrdinal) { return

Re: elasticsearch 1.3+ and secure scripts workflow

2014-10-29 Thread Nikolas Everett
I just use dynamic groovy scripts which are acceptable because they are sandboxed. If the sandbox is too restrictive you can carefully loosen it using configuration. Nik On Wed, Oct 29, 2014 at 10:08 AM, Deryk Wenaus de...@bluemandala.com wrote: I've upgraded to Elasticsearch 1.3.x and by

Re: I'm getting exceptions while searching using cirrussearch from Mediawiki

2014-10-28 Thread Nikolas Everett
I'll hop on irc and help from there. Depending on the version of cirrus you use it requires groovy or MVEL support. On Oct 28, 2014 4:12 AM, Isabel Drost-Fromm isabel.drostfr...@elasticsearch.com wrote: This looks like a configuration issue. The clue is in the following line:

Re: I'm getting exceptions while searching using cirrussearch from Mediawiki

2014-10-28 Thread Nikolas Everett
, Oct 28, 2014 at 8:18 AM, Nikolas Everett nik9...@gmail.com wrote: I'll hop on irc and help from there. Depending on the version of cirrus you use it requires groovy or MVEL support. On Oct 28, 2014 4:12 AM, Isabel Drost-Fromm isabel.drostfr...@elasticsearch.com wrote: This looks like

Re: analyzer settings for breaking up words on hyphens

2014-10-27 Thread Nikolas Everett
Or you could cheat and use a character filter to turn the hyphen into spaces. Lots of ways to skin a cat. On Mon, Oct 27, 2014 at 7:07 PM, Mike Topper top...@gmail.com wrote: Thanks! i'll go ahead and try the pattern tokenizer route. On Mon, Oct 27, 2014 at 1:22 PM, Ivan Brusic

Re: Can i elastic search as my primary store?

2014-10-24 Thread Nikolas Everett
I'd wait for 1.4 before considering it. There are lots of stability improvements there. One thing to consider is that updates are quite costly compared to Mongo/MySQL whatever. Nik On Fri, Oct 24, 2014 at 6:34 PM, Zennet Wheatcroft zwheatcr...@atypon.com wrote: I have heard from the source,

Re: Default index shard allocation

2014-10-23 Thread Nikolas Everett
Primary shards in elasticsearch don't really do much more work than replicas. Elasticsearch doesn't worry too much about them but tries to balance the total number of shards (including replicas) across all nodes. You can fiddle with that using the allocation settings if you need to. On Oct 23,

Re: update mapping

2014-10-19 Thread Nikolas Everett
Scan\scroll is indeed the way to go. The only thing I can add is that you should take the opportunity to run sanity checks on the documents like filtering out fields that were in the old index but shouldn't be copied to the new one. On Oct 19, 2014 5:24 PM, Adrien Grand

Re: Understanding HEAP usage

2014-10-17 Thread Nikolas Everett
Measuring heap usage in Java applications is very different than measuring memory usage for other stuff. 1. Usually java allocates all the heap its going to need up front at startup. At least, we do that in server applications. 2. Java's garbage collection is very lazy so heap usage will go up

Re: Scaling strategies without shard splitting

2014-10-13 Thread Nikolas Everett
On Mon, Oct 13, 2014 at 11:12 AM, Ian Rose ianr...@fullstory.com wrote: Hi - My team has used Solr in it's single-node configuration (without SolrCloud) for a few years now. In our current product we are now looking at transitioning to SolrCloud, but before we made that leap I wanted to

Re: Update merge settings pre-1.4 without downtime

2014-10-10 Thread Nikolas Everett
You could try switching merge policies and then switching back. I never tried that but I think it might work. Nik On Oct 10, 2014 12:33 AM, Jonathan Foy the...@gmail.com wrote: Hello Is there any way of changing the merge settings of a live index without downtime in ES versions prior to 1.4

Re: Turn on logging in live production

2014-10-10 Thread Nikolas Everett
Not without restarting it. On Oct 10, 2014 7:13 AM, Anantha Govindarajan ananthagovindara...@gmail.com wrote: Hi Nikolas, Is is possible to change log level to specific node ? This will be useful incase of heavy indexing clusters. -- You received this message because you are subscribed to

Re: Turn on logging in live production

2014-10-10 Thread Nikolas Everett
I don't believe there are plans for changing logging level on a specific node using the api. I'd file an issue with pretty much what you said. You might be able to limit the verbosity by just setting TRACE or DEBUG on the logger you need. Thats not as good as per node, but its something. On

Re: How many shards is to many shards per server on SSD?

2014-10-10 Thread Nikolas Everett
On Thu, Oct 9, 2014 at 6:34 PM, Kevin Burton burtona...@gmail.com wrote: On Wednesday, October 8, 2014 12:07:30 AM UTC-7, Jörg Prante wrote: With ES, you can go up to the bandwidth limit the OS allows for writing I/O (if you disable throttling etc.) This means, if you write to one shard,

Re: Turn on logging in live production

2014-10-09 Thread Nikolas Everett
You can do so with the logger prefix in the cluster update settings api: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/cluster-update-settings.html#logger You'll have to know the names of the loggers you want to change though. You can figure that out by looking around the

Re: Turn on logging in live production

2014-10-09 Thread Nikolas Everett
AM UTC-4, Nikolas Everett wrote: You can do so with the logger prefix in the cluster update settings api: http://www.elasticsearch.org/guide/en/elasticsearch/ reference/current/cluster-update-settings.html#logger You'll have to know the names of the loggers you want to change though. You can

Re: Difficulties in searching in strings where words are separated by dots, underscores and hyphens

2014-10-07 Thread Nikolas Everett
If you still need the standard analyzer's behavior for words but want to force separation on stuff containing dots and underscores you can use the mapper character filter ( http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-mapping-charfilter.html) to convert those

Re: How many shards is to many shards per server on SSD?

2014-10-07 Thread Nikolas Everett
We have hundreds of shards on machines with two SSDs each. Some are large shards (20GB) but most are small (a couple MB). It works fine except for some trouble how Elasticsearch picks what shards go where (hint: it doesn't take shard size into account beyond not filling up the disks). Nik On

Re: Architecture to prevent slow queries

2014-10-06 Thread Nikolas Everett
You could run less intense queries. Get more ram. Finally if io wait is a problem then you could switch to/add more solid state disks. Or you can add more nodes. We've done all of those for our Elasticsearch (no Logstash/Kibana in front though). Nik On Mon, Oct 6, 2014 at 4:43 AM,

[ANN] Released experimental highlighter version 0.0.12

2014-10-06 Thread Nikolas Everett
I just finished releasing the experimental highlighter https://github.com/wikimedia/search-highlighter Elasticsearch plugin version 0.0.12. Its fixes one bug: * Regex highlighting fails on strings containing multi-byte characters https://github.com/wikimedia/search-highlighter/issues/6 If you

[ANN] Released trigram accelerated regex queries for Elasticsearch version 0.0.1

2014-10-06 Thread Nikolas Everett
I just finished releasing the wikimedia extra https://github.com/wikimedia/search-extra Elasticsearch plugin which contains support for trigram accelerated regular expressions similar to PostgreSQL's implementation

Re: Trigram-accelerated regex searches

2014-09-29 Thread Nikolas Everett
On Thu, May 22, 2014 at 4:31 PM, Erik Rose grinche...@gmail.com wrote: Alright, try this on for size. :-) Since the built-in regex-ish filters want to be all clever and index-based, why not use the JS script plugin, which is happy to run as a post-processing phase? curl -s -XGET

Re: Which RAID config for ES?

2014-09-25 Thread Nikolas Everett
On Thu, Sep 25, 2014 at 9:58 AM, John Smith java.dev@gmail.com wrote: So given the built in fault tolerance of Elasticsearch across the cluster are people adventurous enough to use RAID0? Absolutely. We only do it with pairs of disks though because RAID0 on any more then two disks just

Re: list append mvel

2014-09-24 Thread Nikolas Everett
Elasticsearch will perform the write even if the document hasn't changed unless you set ctx.op to noop. Beside the point: you should try using groovy scripting! Its a less funky language and it seems to be more stable - like there aren't unexplained scripting errors. On Wed, Sep 24, 2014 at

Re: What does it mean to store a field?

2014-09-22 Thread Nikolas Everett
My understanding is that it is mostly more efficient to not store any fields and just let Elasticsearch load them from the source when needed. Nik -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving

Re: How to do case insensitive search on terms?

2014-09-15 Thread Nikolas Everett
Or if you want case insensitive search use a match query http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-match-query.html . On Mon, Sep 15, 2014 at 11:47 AM, joergpra...@gmail.com joergpra...@gmail.com wrote: I assume you use the standard analyzer which uses by

Re: How to do case insensitive search on terms?

2014-09-15 Thread Nikolas Everett
or anything like that) does it make a difference in performance? So far it seems like no at least when testing through Sense on the same amount of data. On Monday, 15 September 2014 11:49:33 UTC-4, Nikolas Everett wrote: Or if you want case insensitive search use a match query http

Re: How do I help the users understand some unexpected search hits (Or how can I do highlighting on _all)

2014-09-09 Thread Nikolas Everett
I imaging its caused by your analysis configuration. User the analyze api and check what is output for all those terms. On Sep 9, 2014 5:15 AM, mooky nick.minute...@gmail.com wrote: Is (2) expected? Is there a buggette? Anyone familiar with highlighting have any insight? On Monday, 8

Re: ElasticSearch multi-threading and the Java EE specifications

2014-09-08 Thread Nikolas Everett
I've heard of this rule but never seen anyone follow it. They'd mostly make threads/thread pools and just make sure they were shut down when the application was shut down. So my advice is to follow the intent of the law (shutdown your threads) rather than follow the letter of the law. Nik On

Re: Faster sloppy phrase queries

2014-09-08 Thread Nikolas Everett
On Mon, Sep 8, 2014 at 4:42 PM, Robert Muir robert.m...@elasticsearch.com wrote: On Mon, Sep 8, 2014 at 4:24 PM, Nikolas Everett nik9...@gmail.com wrote: One thing on my side is that I don't really _need_ phrase queries. I can play around with the specification a bit so long as I stay

Good merge settings for interactively maintained index

2014-09-08 Thread Nikolas Everett
My indexes change somewhat frequently. If I let leave the merge settings as the default I end up with 25%-40% deleted documents (some indexes higher, some lower). I'm looking for some generic advice on: 1. Is that 25%-40% ok? 2. What kind of settings should I set to keep that in an acceptable

Re: Faster sloppy phrase queries

2014-09-08 Thread Nikolas Everett
On Mon, Sep 8, 2014 at 5:08 PM, joergpra...@gmail.com joergpra...@gmail.com wrote: Is shingling for proximity boosting on multi term phrases an alternative, like in http://www.romseysoftware.co.uk/2012/09/27/proximity-boosting-in-elasticsearch/ ? I'm not sure if it'll be good enough though

Re: Indexing is becoming slow, what to look for?

2014-09-05 Thread Nikolas Everett
Active in this contact means currently indexing documents. On Sep 5, 2014 8:17 AM, Thomas thomas.bo...@gmail.com wrote: Hi, I wanted to clarify something from the blog post you mentioned. You specify that based on calculations we should give at most ~512 MB indexing buffer per active

Re: How do I help the users understand some unexpected search hits (Or how can I do highlighting on _all)

2014-09-04 Thread Nikolas Everett
On Thu, Sep 4, 2014 at 1:41 PM, mooky nick.minute...@gmail.com wrote: I am indexing some entities that have up to 140 fields in the resultant document - ie lots. I am providing a simple/powerful google-style search of such entities using the _all field - however, to make the user's life

Re: Looking for Elasticsearch projects

2014-09-02 Thread Nikolas Everett
We could always use help with CirrisSearch. It is the open source project that links MediaWiki to Elasticsearch. We have it installed on all the wikis at the wikimedia foundation but it isn't the default search backend on the largest ones yet. Selling points: Huge user community Basic queries

Re: phrase suggester's sort mode

2014-09-01 Thread Nikolas Everett
is quite basic, i think, that I thought maybe i am missing something and there is an easy way to do that. Thanks for the answer. On Sun, Aug 31, 2014 at 2:18 AM, Nikolas Everett nik9...@gmail.com wrote: You'd have to write a plugin or patch to elasticsearch. Plugin would be easier in the short

Re: phrase suggester's sort mode

2014-08-30 Thread Nikolas Everett
You'd have to write a plugin or patch to elasticsearch. Plugin would be easier in the short run but a patch in elasticsearch is more likely to be of higher quality because of the code review process. On Aug 30, 2014 4:13 PM, Heval Azizoğlu azizogluhe...@gmail.com wrote: Hi, Is there any way to

Re: How big can/should you scale Elasticsearch

2014-08-29 Thread Nikolas Everett
On Fri, Aug 29, 2014 at 1:27 PM, Rob Blackin robblac...@gmail.com wrote: We are trying to implement a 5 TB, 10 Billion item Elasticsearch cluster. The key is an integer and the item data is fairly small. We're running around 5.5TB right now without a problem. The biggest annoyance is that

Re: Duplicate function MVEL script

2014-08-27 Thread Nikolas Everett
I'd port to groovy and try again. MVEL is on its way out and has some stability issues anyway. Like, I really think it has some problems compiling multiple MVEL scripts concurrently. Nik On Wed, Aug 27, 2014 at 10:50 AM, k...@stylelabs.com wrote: Hello We are executing some concurrent

Re: indices.memory.index_buffer_size

2014-08-26 Thread Nikolas Everett
I just looked at this code! Its a setting that you set globally at the cluster level. It takes effect per node. What that means is that for every active shard on each the node gets an equal share of that much space. Active means has been written to in the past six minutes or so. When a node

Re: DOS attack Elasticsearch with Mappings

2014-08-24 Thread Nikolas Everett
If the cluster is that open to users I don't think it'd be easy to prevent a malicious user from intentionally DOSing it. But in this case I think you could make the default for all fields be non-dynamic. That way users have to intentionally send all mapping updates. It'd prevent this short of

Files not deleted on upgrade

2014-08-21 Thread Nikolas Everett
I started a rolling restart yesterday but has add to stop because the disks were filling up oddly. It looks like when the bode comes up it no longer deletes shards it can't use. Elasticsearch reports that the disk is nearly full but that it isn't using most of the space. When I look myself the

Re: Files not deleted on upgrade

2014-08-21 Thread Nikolas Everett
, Nikolas Everett wrote: I started a rolling restart yesterday but has add to stop because the disks were filling up oddly. It looks like when the bode comes up it no longer deletes shards it can't use. Elasticsearch reports that the disk is nearly full but that it isn't using most

Re: Files not deleted on upgrade

2014-08-21 Thread Nikolas Everett
:/var/lib/elasticsearch/production-search-eqiad/nodes/0/indices$ du -h | tail -n1 447G. Its like when we did the upgrade some files weren't deleted when they were no longer in use. On Thu, Aug 21, 2014 at 10:24 AM, Nikolas Everett nik9...@gmail.com wrote: Hi Lee! Thanks for responding. Ok

Re: Files not deleted on upgrade

2014-08-21 Thread Nikolas Everett
Whatson shows this very well: https://wikitech.wikimedia.org/wiki/File:Whatson_out_of_disk.png Other points of interest: 1. We're using auto_expand_replicas. 2. The logs are totally clean. Nik On Thu, Aug 21, 2014 at 10:35 AM, Nikolas Everett nik9...@gmail.com wrote: This gist shows

Re: Files not deleted on upgrade

2014-08-21 Thread Nikolas Everett
Moving this to https://github.com/elasticsearch/elasticsearch/issues/7386 . Its a bug, but I have no idea what caused it. Side note: after digging through the code for two hours I can't find anything that sweeps up files/directories/local shard storage that is unused. I see lots of deletes done

Re: Files not deleted on upgrade

2014-08-21 Thread Nikolas Everett
the more shards you'll be able to delete any way. https://github.com/s1monw On Thu, Aug 21, 2014 at 2:37 PM, Nikolas Everett nik9...@gmail.com wrote: Moving this to https://github.com/elasticsearch/elasticsearch/issues/7386 . Its a bug, but I have no idea what caused it. Side note: after digging

[ANN] Experimental Highlighter 0.0.11 Released

2014-08-18 Thread Nikolas Everett
I released version 0.0.11 of the Experimental Highlighter https://github.com/wikimedia/search-highlighter we've been using . Its compatible with Elasticsearch 1.3.x and has a few new features: 1. Conditional highlighting - skip highlighting fields you aren't going to use! Save time and IO

Re: Writing an article - topics to cover?

2014-08-14 Thread Nikolas Everett
On Thu, Aug 14, 2014 at 11:40 AM, Christopher Ambler const.dogbe...@gmail.com wrote: I've been tasked to write an article (that will be public-facing) on my experience setting up ElasticSearch as part of the project I'm working on at GoDaddy. I'd like to solicit input on any topics that I

Re: How to Not index a field

2014-08-13 Thread Nikolas Everett
I'm not sure the right way to do it but if you set dynamic to false and then just send the field it'll be stored but not indexed. On Wed, Aug 13, 2014 at 9:35 AM, Sam2014 sabdall...@gmail.com wrote: Is it possible in ElasticSearch? Assume I have a doc { field1:value1, field2:value2 ...}

Re: github issues search query DSL

2014-08-12 Thread Nikolas Everett
I'd implement the query parser in your application and then build a the queries and send then to Elasticsearch. The advantage of that is that you don't have to bounce all the Elasticsearch nodes when you upgrade your query language. Its what we did. Our code isn't elegant or pretty or anything

Re: How to receive part of the text field?

2014-08-12 Thread Nikolas Everett
I imagine script field can do this. On Aug 12, 2014 6:38 PM, Ivan Brusic i...@brusic.com wrote: If the 200kb number is fixed, then the simplest solution would be to store that content separately in a new field. It does not need to be analyzed, just stored. Perhaps highlighters might work.

Re: Shard rebalancing

2014-08-04 Thread Nikolas Everett
On Sun, Aug 3, 2014 at 7:43 PM, Mark Walkom ma...@campaignmonitor.com wrote: Shard size will depend entirely on how many shards you've set and how big the index is. Allocation of data to shards happens in a round-robin manner, so balancing isn't needed. What do you mean by shards changing

Re: Help needed understanding analyzer behavior

2014-07-30 Thread Nikolas Everett
It's probably easier to do a char filter to remove all non digits. On the other hand if you want to normalize numbers that sometimes contain area and country code to numbers you'll probably want to do that outside of elasticsearch or with a plugin. That gets difficult when you need to handle non

Re: Atomically create index with single alias

2014-07-29 Thread Nikolas Everett
I do an atomic alias swap all the time, but I do it after the index is created: curl -XPOST 'http://localhost:9200/_aliases' -d ' { actions : [ { remove : { index : index1, alias : alias } }, { add : { index : index2, alias : alias } } ] }' Nik On Tue, Jul 29, 2014

Re: Even Shard Distribution?

2014-07-24 Thread Nikolas Everett
, Nikolas Everett wrote: On Wed, Jul 23, 2014 at 9:21 AM, mic...@modernmast.com wrote: Thanks for that, Nik. I'm okay with evenly spreading all the indices, rather than just the one I'm having issues with. I'll give your config a try! Def no special configurations on that one. We didn't

<    1   2   3   4   >