Re: Unable to delete child documents by query that uses has parent (ES 1.4.1)

2015-08-27 Thread Nikolas Everett
You should try posting this on https://discuss.elastic.co/ . This email list has been deprecated in favor of using that. There are settings that make it function almost the same way the mailing list functioned. On Thu, Aug 27, 2015 at 3:39 AM, Ron Sher ron.s...@gmail.com wrote: Hi, I'm using

Re: How can we add new nodes in cluster at runtime?

2015-08-24 Thread Nikolas Everett
Try asking this at discuss.elastic.co. On Mon, Aug 24, 2015 at 12:47 AM, shoebalig shoeba...@gmail.com wrote: Hi Members, I have a cluster using N nodes, to scale up my cluster I want to add few more nodes at runtime without any downtime. Is there anyway to add nodes in cluster using Cluster

Re: If I have multiple doctypes for an index, how does it impact searching and indexing performance?

2015-06-01 Thread Nikolas Everett
From a lucene context doctypes are indexed together and filtered. So it is just like having one big index. If two doc types share the same field name then that fields IDF will be for both. You should test it but it's often OK. On Jun 1, 2015 7:06 AM, Avinash Pandey avinashpandey.i...@gmail.com

Re: Can someone point me to great live websites using ElasticSearch?

2015-05-30 Thread Nikolas Everett
Github. Stack overflow but their search isn't that nice the last time I checked. On May 30, 2015 2:53 PM, Flavio dep...@gmail.com wrote: Can someone point me to great live websites using ElasticSearch? Preferably with complex search scenarios using aggregations and many advanced features.

Re: How to migrate index without disturbing client?

2015-05-29 Thread Nikolas Everett
What you just described should work fine. exclude._ip will move the shards off of the nodes you exclude but queries and updates can proceed while this is happening because the data is still on the old nodes. The updates will make their way to the new copies via a transaction log reply mechanism.

Re: How to migrate index without disturbing client?

2015-05-29 Thread Nikolas Everett
to that old machines and new machines will not know each other( I mean unicast variable of the elasticsearch.yml, the old machines will know just old ones). Do you think it can cause a problem ? Thanks 29 Mayıs 2015 Cuma 14:45:40 UTC+2 tarihinde Nikolas Everett yazdı: What you just described

Re: Questions about dedicated master client node

2015-05-29 Thread Nikolas Everett
Dedicated master nodes are super convenient if you have the it infrastructure to host them on shared machines because they are very low load and its useful to be able to restart the master nodes quickly. We don't have that kind of infrastructure and our cluster is pretty large and not having it

Re: Are there any comprehensive documents which describe the detailed process of document indexing and searching?

2015-05-25 Thread Nikolas Everett
I get the sense that this is a good start though I haven't watched it myself: https://www.elastic.co/elasticon/2015/sf/elasticsearch-data-journey-life-of-a-document-in-elasticsearch On May 25, 2015 5:45 AM, Jason Wee peich...@gmail.com wrote: If you have that basic knowledge, perhaps the next

Re: When multiple nodes need to communicate to handle a query, what protocol do they use?

2015-05-21 Thread Nikolas Everett
On Thu, May 21, 2015 at 12:49 PM, Swaraj Banerjee swa...@expectlabs.com wrote: When multiple nodes need to communicate to handle a query, what protocol do they use? If I issue a search request to an index that lives on multiple shards (that are on separate nodes), I send the request to ES

Re: Lots of segments per index

2015-05-20 Thread Nikolas Everett
It merges segments in response to indexes and updates so an index that doesn't change will not have merges. You can manually optimize the index once, when it is mostly done with updates. Once the index is optimized further calls to optimize with the same parameters are noops. You can't really ads

Re: using RAID 0 vs multiple data paths after commit #10461

2015-05-18 Thread Nikolas Everett
I'm RAID 0 all the way. The striping is much more complete then ES's path.data and operations is more used to the tool around it. Software raid in linux is fine for this. We only do two disks in RAID 0 though because we don't like the increased failure chance. So 10 in RAID 0 is a bit much. 10

Re: scripting contains integer versus long

2015-05-11 Thread Nikolas Everett
I suspect at that point they'll pop out as Longs. Its just my suspicion. I haven't read that bit of the code. On May 11, 2015 10:08 AM, euneve...@gmail.com wrote: I have a mvel script (groovy looks the same) as follows: if (!ctx._source.list.contains(document)) {ctx._source.list +=

Re: Forums Are Now Live at http://discuss.elastic.co

2015-05-04 Thread Nikolas Everett
On Mon, May 4, 2015 at 12:12 PM, leslie.hawthorn leslie.hawth...@elastic.co wrote: Hello everyone, We took in feedback on moving to a Discourse based forum for about a month, and it sounds like most of the folks who thought it might not be optimal were people who preferred to interact with

Re: Forums Are Now Live at http://discuss.elastic.co

2015-05-04 Thread Nikolas Everett
I suspect its read only while they sort out resourcing issues. Cache hit rate is likely quite high while readonly. On May 4, 2015 12:38 PM, Jürgen Wagner (DVT) juergen.wag...@devoteam.com wrote: The site is read-only. No signups possible. Hmm... Good luck! --Jürgen -- You received this

Re: Index-per-user required for common terms query and cutoff_frequency?

2015-04-29 Thread Nikolas Everett
On Wed, Apr 29, 2015 at 2:53 PM, Loren lo...@siebert.org wrote: The docs http://www.elastic.co/guide/en/elasticsearch/guide/current/common-terms.html mention that One of the benefits of cutoff_frequency is that you get domain-specific stopwords for free. It seems like the index-per-user

Re: Script not executing _update_by_query

2015-04-29 Thread Nikolas Everett
Yup - still looks like a bug to me. I think the right thing to do is file it on github. On Wed, Apr 29, 2015 at 3:20 AM, Zaid Amir redserpe...@gmail.com wrote: Sorry for the delay was a bit occupied making sure everything worked as expected. So here, I created a gist of the issue and hope it

Re: Cluster with Different Node Sizes

2015-04-28 Thread Nikolas Everett
On Tue, Apr 28, 2015 at 12:43 PM, Ji ZHANG zhangj...@gmail.com wrote: Hi, I'm deploying ElasticSearch on a cluster with different node sizes, some have 32GB memory, and some have 16GB. I hope more shards will be allocated on nodes with bigger memory. I googled a bit, there're some settings

Re: inner_hits and highlighting

2015-04-28 Thread Nikolas Everett
If its not in the issues its unlikely that its planned. If it isn't planned I think filing an issue is a good thing - just be super clear what you want to do with examples in curl/gist form. If it is planned maybe add your proposed usage to the issue. Nik On Tue, Apr 28, 2015 at 11:26 AM, Ian

Re: Query boost values available in script_score?

2015-04-22 Thread Nikolas Everett
You may want to write your question in json form. Like with a little arrow saying this value is the one I want. On Wed, Apr 22, 2015 at 9:04 AM, Kevin Reilly kmreilly...@gmail.com wrote: Bump. On Monday, April 20, 2015 at 2:48:51 PM UTC-4, Kevin Reilly wrote: Hi. Are query boost values

Re: enabling filter cache

2015-04-22 Thread Nikolas Everett
On Wed, Apr 22, 2015 at 2:41 PM, Ed Kim edki...@gmail.com wrote: Hi, I have a dynamic query built via java api that assembles a filtered query depending on the parameter input. I have about a dozen filters (mostly term filters) that may or may not be used, and had a couple questions: 1. Is

Re: enabling filter cache

2015-04-22 Thread Nikolas Everett
at the individual filter level, as they will be bundled differently depending on the params. Thanks for the clarification! On Wed, Apr 22, 2015 at 11:53 AM, Nikolas Everett nik9...@gmail.com wrote: On Wed, Apr 22, 2015 at 2:41 PM, Ed Kim edki...@gmail.com wrote: Hi, I have a dynamic query

Re: Using serialized doc_value instead of _source to improve read latency

2015-04-20 Thread Nikolas Everett
Have you profiled it and seen that reading the source is actually the slow part? hot_threads can lie here so I'd go with a profiler or just sigquit or something. I've got some reasonably big documents and generally don't see that as a problem even under decent load. I could see an argument for a

Re: How many fields is too many?

2015-04-16 Thread Nikolas Everett
On Thu, Apr 16, 2015 at 10:21 AM, joergpra...@gmail.com joergpra...@gmail.com wrote: The time required for update depends on the peculiarities of the update operations, the massive scripting overhead, the refresh operation, and the segment merge activities that are related. The number of

Re: How many fields is too many?

2015-04-16 Thread Nikolas Everett
On Thu, Apr 16, 2015 at 10:54 AM, Mitch Kuchenberg mi...@getambassador.com wrote: Hey Nik, you'll have to forgive me if any of my answers don't make sense. I've only been familiar with Elasticsearch for about a week. 1. Here's a template for my documents:

Re: How many fields is too many?

2015-04-16 Thread Nikolas Everett
On Thu, Apr 16, 2015 at 9:40 AM, Mitch Kuchenberg mi...@getambassador.com wrote: I'm currently working on implementing ElasticSearch on a Django-based REST API. I hope to be able to search through roughly 5 million documents, but I've struggled to find an answer to a question I've had from

Re: copy_to not working

2015-04-13 Thread Nikolas Everett
Yes _but_ its generally better to do those transforms on the source application. The idea is that you'll often want to return multiple things from the source so loading the whole thing is usually better than loading a bunch of stored fields. If your looking for the minimal possible amount of

Re: copy_to not working

2015-04-13 Thread Nikolas Everett
I want to expand on this a bit - both copy_to and transform only modify the _indexed_ document, not the source document. The thinking is that you can modify the source document yourself in the source application but the source application _can't_ modify the indexed document without modifying the

Re: Does english analyzer prevent fields from highlighting?

2015-03-31 Thread Nikolas Everett
Using inline highlighters doesn't help highlighting. No. For the most part you should stay away from inline analyzers and use a mapping instead. On Tue, Mar 31, 2015 at 12:02 PM, Viacheslav Shalamov sslavian...@gmail.com wrote: Hi all, could you help me with little problem regarding

Re: [Java] Stream large file while indexing

2015-03-27 Thread Nikolas Everett
I believe elasticsearch loads the whole indexed document into ram before indexing. It certainly loads the whole document in ram for things like source filtering. Lucene doesn't require this, but elasticsearch does it because for the typical use case its fine. On Mar 27, 2015 2:59 PM, Hao

Re: How much Big json elasticsearch can store?

2015-03-25 Thread Nikolas Everett
My documents range from a couple of kilbytes to tens of megabytes and most things work fine. Beware the plain highlighter on long string fields but otherwise you are probably ok. Its certainly less efficient to store huge documents because when you want to return portions of them (other than

Re: Which kind of query style is recommanded to use, JSON style or Query_string sytel? Performance differes?

2015-03-25 Thread Nikolas Everett
query_string is a bit of a trap - if you write an invalid query it just crashes. So you find yourself working around it with tons of escaping. Its also really really powerful and shouldn't be exposed directly to end users unless you want them to be sneaky. For the most part I'd suggest using the

Re: RegEx Filter Not Matching on Hash tag (#)

2015-03-19 Thread Nikolas Everett
Try escaping the hash tag. It has a special meaning in the Lucene Dialect of Regular Expression https://lucene.apache.org/core/4_1_0/core/org/apache/lucene/util/automaton/RegExp.html?is-external=true . On Thu, Mar 19, 2015 at 11:44 AM, Mahesh Kommareddi mahesh.kommare...@gmail.com wrote: Hi,

Re: Operator and in highlighting

2015-03-17 Thread Nikolas Everett
On Tue, Mar 17, 2015 at 8:56 AM, Vlad Zaitsev vest...@gmail.com wrote: But it seems that highlighter ignore operator: “and” and highlight any term from queries. Its much more than that. For the most part highlighters reduce the query to a list of terms blindly. Some do phrases. They don't

Re: PayloadTermQuery in ElasticSearch

2015-03-17 Thread Nikolas Everett
I imagine the right way to do this is with a plugin but I'm not 100% sure. On Tue, Mar 17, 2015 at 11:47 AM, Devaraja Swami devarajasw...@gmail.com wrote: I plan to store floats in the payload and boost the score (multiplicatively) based on the average value of the payloads over the

Re: Optimizing Readonly Indices

2015-03-07 Thread Nikolas Everett
Have a look at what curator does. I believe it optimizes but I'm not sure how. On Mar 6, 2015 10:22 PM, Kadaan jbaran...@gmail.com wrote: Is there a recommended process for optimizing indices which have transitioned to a readonly state? For instance should we optimize indices to a single

[ANN] Released Experimental Highlighter v 1.4.1

2015-03-03 Thread Nikolas Everett
I just released version 1.4.1 of the experimental highlighter. It fixes a single issue that made the highlighter not work when highlighting *: * https://github.com/wikimedia/search-highlighter/issues/9 It might take sonatype an hour or so to sync it to central. Nik -- You received this

Re: Custom analyzer without a tokenizer

2015-03-03 Thread Nikolas Everett
On Tue, Mar 3, 2015 at 1:02 PM, Sagar Shah sagarshah1...@gmail.com wrote: Hello everyone, I am working on a defining a mapping in elastic search, which can have few fields on the fly. I can define the types index using dynamic templates, but I would like to know the difference between

Re: Disk awarnes on Indexing.

2015-02-20 Thread Nikolas Everett
I have 30GB shards and the biggest problem I have is that they take a long time to replicate to other machines. I believe there are memory issues for very large shards as well but I don't know them that well. Nik On Feb 20, 2015 7:31 PM, Prasanth R prasanth.sunr...@gmail.com wrote: Could you

Re: ElasticSearch search performance question

2015-02-12 Thread Nikolas Everett
You might want to try hitting hot threads while putting your load on it and seeing what you see. Or posting it. Nik On Thu, Feb 12, 2015 at 4:44 PM, Jay Danielian jay.daniel...@circleback.com wrote: Mark, Thanks for the initial reply. Yes, your assumption about these things being very

Re: Can ElasticSearch support IBM JVM?

2015-01-28 Thread Nikolas Everett
There are known big installations using OpenJDK and Oracle JDK. I don't know any using IBM. I imagine your more likely to find something on that JDK then others but you'll probably do ok. Certainly be sure to add the config parameter mentioned on that page and expect to have to fiddle with the

Re: Beginning Question: Memory consumption while idle

2015-01-24 Thread Nikolas Everett
You are likely observing how java heap works. Use a tool like jstat to check how much the heap is in use to see real usage. Nutshell: java never returns memory to the OS. You tell it a min it can use and it allocates that on startup. You tell it a max and it won't allocate more. Memory mapping

Re: Update an Elasticsearch document's array field without using scripting

2015-01-21 Thread Nikolas Everett
The current default scripting language, groovy, is sandboxed. If you still don't want to use it your only option is the get update put sequence. On Jan 19, 2015 1:29 PM, Jason Lee pump.min...@gmail.com wrote: I'm trying to add new values to an existing array field in a document. I've noticed

Re: How highlighting actually works?

2015-01-18 Thread Nikolas Everett
Highlighting is complex and more hacky than you'd imagine at first glance. Each highlighter is different and we can't tell which one you are using without seeing your mapping. For the plain highlighter the cost is roughly proportional to the length of the highlighted field. So in your case its the

Re: Is it possible to install plugin into a directory other than ${ES_HOME}/plugins?

2015-01-17 Thread Nikolas Everett
Yes. You can change the Dir scanned for plugins. Look at the init script for the name of the parameter. Or symlinks. Always your friend. On Jan 16, 2015 7:11 PM, Jinyuan Zhou zhou.jiny...@gmail.com wrote: Thanks, -- You received this message because you are subscribed to the Google Groups

Re: real time match analysis

2015-01-14 Thread Nikolas Everett
What about explain? On Wed, Jan 14, 2015 at 3:24 PM, Ed Kim edki...@gmail.com wrote: Just a friendly bump to see if anyone has any feedback. :) On Saturday, January 10, 2015 at 10:38:34 PM UTC-8, Ed Kim wrote: Hello all, I was wondering if anyone could offer some feedback on whether there

Re: es rolling upgrade 1.3.2-1.4.2

2015-01-13 Thread Nikolas Everett
On Tue, Jan 13, 2015 at 7:32 AM, Daniel Jansson daniel.jans...@dn.se wrote: Hi We are performing a rolling upgrade from 1.3.2 to 1.4.2. We have turned off reallocation. After upgrading 2 of 3 nodes we are receiving lots of warnings/errors in the log file: in node running 1.4.2:

Re: Elasticsearch cluster ip address

2015-01-13 Thread Nikolas Everett
Most clients will take a list and retry on connection failure. That is what you want. Nik On Tue, Jan 13, 2015 at 9:52 AM, Vasu Thota vasu@gmail.com wrote: Thanks David. Now, which HTTP URL of elastic-search i need to configure from my client application which is communicating with ES

Re: What charts library does Kibana use?

2015-01-12 Thread Nikolas Everett
Here are the javascript dependencies: https://github.com/elasticsearch/kibana/blob/master/bower.json I assume its one of those. On Mon, Jan 12, 2015 at 11:20 AM, Mauro Julián Fernández mauroj.fernan...@gmail.com wrote: I used Kibana for a couple of tasks in works and I like the charts it

Re: counting items in a list [array] returns (what we think) are incorrect counts via groovy

2015-01-08 Thread Nikolas Everett
On Thu, Jan 8, 2015 at 9:09 PM, Jeff Steinmetz jeffrey.steinm...@gmail.com wrote: Is there a better way to do this? Please see this gist (or even better yet, run the script locally see the issue). https://gist.github.com/jeffsteinmetz/2ea8329c667386c80fae You must have scripting enabled in

Re: counting items in a list [array] returns (what we think) are incorrect counts via groovy

2015-01-08 Thread Nikolas Everett
, Nikolas Everett wrote: On Thu, Jan 8, 2015 at 9:09 PM, Jeff Steinmetz jeffrey@gmail.com wrote: Is there a better way to do this? Please see this gist (or even better yet, run the script locally see the issue). https://gist.github.com/jeffsteinmetz/2ea8329c667386c80fae You must have

Re: counting items in a list [array] returns (what we think) are incorrect counts via groovy

2015-01-08 Thread Nikolas Everett
} } }' On Thursday, January 8, 2015 at 9:15:28 PM UTC-8, Nikolas Everett wrote: Source is going to be pretty sloe, yeah. If its a one off then its probably fine but if you do it a lot probably best to index the count. On Jan 9, 2015 12:04 AM, Jeff Steinmetz jeffrey@gmail.com wrote: Thank you

Re: regex + fixed string match needed

2015-01-06 Thread Nikolas Everett
There are two ways to perform regex matching with Elasticsearch and both require multi-fields http://www.elasticsearch.org/guide/en/elasticsearch/reference/0.90/mapping-multi-field-type.html . The first way is to create a not_analyzed subfield like on the link above and query it like

Re: Balance Between Heavy Indexing and Searching

2015-01-06 Thread Nikolas Everett
That is a ton of data to keep open. Can you squish it somehow? On Tue, Jan 6, 2015 at 3:24 PM, Mark Walkom markwal...@gmail.com wrote: The best way is to add more nodes. There isn't much you can do with that amount of data! On 7 January 2015 at 06:09, David Mavashev crypti...@gmail.com

Re: encoding is longer than the max length 32766

2015-01-02 Thread Nikolas Everett
The max length restriction is per token so its unlikely you'll see it unless use not_analyzed fields. You can work around it by setting the ignore_above option on the string type. That'll just throw away the token. Nik How does this MAX_LENGTH restriction impact on a custom_all field where we may

Re: Elasticsearch logging

2015-01-02 Thread Nikolas Everett
Logging.yml is a funky wrapper around log4j.properties style log4j configuration so that is why you don't see as much documentation on it. Do you see log lines smashed together and cut apart randomly? That'd be a bug. Its customary for logs to be single lines except for stack traces which

Re: Question about highlight query.

2015-01-01 Thread Nikolas Everett
://manning.com/synhershko/ On Wed, Dec 31, 2014 at 5:38 PM, Nikolas Everett nik9...@gmail.com wrote: Highlighting isn't a nice pretty thing - its kind of a hacky. There are three highlighters built in http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request

Re: Deduplication filter?

2015-01-01 Thread Nikolas Everett
Simplest way might be to push an update to the old versions of the documents to mark them as old and do aggregations filtering those out. There isn't a great way to deduplicate, really. On Thu, Jan 1, 2015 at 11:50 PM, Kshitij Gupta kshi...@vnera.com wrote: Hi, I am working on a system where

Re: incremental update document with id containing special charcters using api to capture page hit count.

2014-12-31 Thread Nikolas Everett
On Wed, Dec 31, 2014 at 8:37 AM, N Bijalwan ahcir...@gmail.com wrote: I am trying to update a document to capture page visit or hitcount which has id containing http:// say http://shashankp254.wordpress.com/about/feed/ That is probably a bad idea. Partial updates don't exist at the level of

Re: Question about highlight query.

2014-12-31 Thread Nikolas Everett
Highlighting isn't a nice pretty thing - its kind of a hacky. There are three highlighters built in http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-highlighting.html to Elasticsearch and they all work differently. You should try all of them and see if they do

Re: incremental update document with id containing special charcters using api to capture page hit count.

2014-12-31 Thread Nikolas Everett
, Nikolas Everett wrote: On Wed, Dec 31, 2014 at 8:37 AM, N Bijalwan ahci...@gmail.com wrote: I am trying to update a document to capture page visit or hitcount which has id containing http:// say http://shashankp254. wordpress.com/about/feed/ That is probably a bad idea. Partial updates don't

Re: How is data stored

2014-12-31 Thread Nikolas Everett
Use the analyze API to get a view into how your analysis chain (tokenizer and filters) affect text. The index itself is all jumbled together with all the documents and there isn't a good way to dig the data for a single document out of it. On Dec 31, 2014 10:36 PM, Bruno Kamiche

Re: ElasticSearch roadmap?

2014-12-29 Thread Nikolas Everett
Your best bet is to look at github issues and pull requests tagged for the next release. Elasticsearch the company has a roadmap for elasticsearch the open source project but it isn't public. Nik On Dec 29, 2014 6:57 AM, PrasathRajan prasanth.sunr...@gmail.com wrote: Hi All, Does

Re: Does elasticsearch support minimum score in highlight query?

2014-12-29 Thread Nikolas Everett
No it doesn't. Highlighting is way weirder to implement then it probably should be so concepts like score don't match over too well. They do weigh segments but that wight isn't the same beast as a document score. Its much more heuristicy. None of them support a minimum weight cutoff. You could

Re: Update to ES 1.4.2 gone terribly wrong - nodes won't start

2014-12-27 Thread Nikolas Everett
IcedTea isn't a JVM version. Give us `java -version`. It looks like that version of IcedTea could be OpenJDK 7u71 which is generally fine (we use it under plenty of loaf). It could also be jamvm or cacao or zero/shark. Those probably won't work. Lots of folks suggest oraclejdk so you may as well

Re: Update to ES 1.4.2 gone terribly wrong - nodes won't start

2014-12-27 Thread Nikolas Everett
Setting index.load_fixed_bitset_filters_eagerly to false fixed everything for now. I could argue that not running Gentoo in production is crazy, but it really depends on your personal preferences :) On Saturday, December 27, 2014 4:34:02 PM UTC+3, Nikolas Everett wrote: IcedTea isn't a JVM

Re: Convert unix timestamp (seconds) to java (milliseconds)

2014-12-27 Thread Nikolas Everett
Transform doesn't change the source, just how it is indexed. I made it that way because I figured I'd you want to change the source you can do it on the application feeding elasticsearch. Transform is a way to index stuff but leave it out of the source. Its copy_to on steroids. Another reason

Re: Add elastic search in my php application

2014-12-25 Thread Nikolas Everett
If you need an example CirrusSearch is the name of the plugin that uses elasticsearch for MediaWiki. I can't attest to the code quality but it certainly gets the job done. Nik On Dec 25, 2014 2:54 AM, Jason Zhang moc...@gmail.com wrote: Here's the official Elasticsearch PHP client

Re: Accuracy of Elastic Search Aggregates and Filters

2014-12-24 Thread Nikolas Everett
I think the key part of the question here is about filters? Filters are always up to date modulo refresh interval. Its pretty efficient because Lucene's segments are immutable so once a filter has been applied to a segment you can cache its results and merge it with the deletes list to have for

Re: Cascading cluster failure

2014-12-24 Thread Nikolas Everett
On Wed, Dec 24, 2014 at 2:03 PM, Mark Walkom markwal...@gmail.com wrote: You should drop your heap to 31GB, over that and you lose some performance and actual heap stack due to uncompressed pointers. I believe the magic number is 32GB:

Re: When to use fields and when to use source filtering

2014-12-22 Thread Nikolas Everett
General rule: - Use source filtering unless you can't. Source filtering works if the field is in the document you indexed. Fields is required if you want to load a stored field. You only _need_ to store fields if they are synthetic like from word count or from transform. Advanced thing I've never

Re: When to use fields and when to use source filtering

2014-12-22 Thread Nikolas Everett
Does source fallback? I remember trying and getting nothing. On Dec 22, 2014 7:33 AM, Itamar Syn-Hershko ita...@code972.com wrote: Fields are used to pull data from stored fields whereas source filtering is targeting _source. At the moment both fallback on each other, so the differences is in

Re: transform script when indexing data

2014-12-22 Thread Nikolas Everett
I'd add a new field and check for it. Or do a search that won't find anything unless it took effect. The document is stored untransformed so just fetching the document won't show you anything. On Dec 22, 2014 12:35 PM, Nick Wood nwood...@gmail.com wrote: Hello, I'm trying to implement a

Re: Russian search does not work for me

2014-12-21 Thread Nikolas Everett
I think you need type: custom inside analyzer: {default:{}}. On Dec 21, 2014 5:08 PM, Ilya Kantor ilia...@gmail.com wrote: Please let me know what I'm doing wrong or where to look/debug. 1. I git cloned https://github.com/asyncee/elasticsearch-russian-config/ 2. Downloaded elasticsearch-1.4.2

Re: Default shard allocation (where new shards are created)

2014-12-19 Thread Nikolas Everett
Check what curator is doing with your index. Its probably fiddling with index.routing.allocation.include and index.routing.allocation.exclude. When you create the new index just set it pick up the ssd tag. You'll have to make sure that curator knows how to strip that tag when the time comes to

Re: ElasticSearch as a Seach Engine for our Intranet Site

2014-12-19 Thread Nikolas Everett
On Fri, Dec 19, 2014 at 12:51 PM, Gill Singh parmvirgil...@gmail.com wrote: Hi, I am new here, just joined this group! We are looking for a new Search Engine for our Intranet site. Can ElasticSearch be used for Crawling, Indexing and Searching Intranet type sites? We will need to crawl/index

Re: Rolling restart

2014-12-19 Thread Nikolas Everett
You have to reenable allocation after the node comes back and wait for the shards to initialize there. On Fri, Dec 19, 2014 at 3:23 PM, iskren.cher...@gmail.com wrote: I'm maintaining a small cluster of 9 nodes, and was trying to perform rolling restart as outlined here:

Re: Rolling restart

2014-12-19 Thread Nikolas Everett
I believe so. On Fri, Dec 19, 2014 at 3:39 PM, iskren.cher...@gmail.com wrote: On Friday, December 19, 2014 12:31:33 PM UTC-8, Nikolas Everett wrote: You have to reenable allocation after the node comes back and wait for the shards to initialize there. So this means the tutorial

Re: Is ElasticSearch truly scalable for analytics?

2014-12-18 Thread Nikolas Everett
I think aggregating 32 shards on one node is a bit degenerate. I imagine its more typical to aggregate across one of two shards per node. Don't get me wrong, you can totally have nodes store and query ~100 shards each without much trouble. If aggregating across a bunch of shards per node were a

Re: Decommission of multiple nodes

2014-12-17 Thread Nikolas Everett
On Wed, Dec 17, 2014 at 6:03 PM, Ye D y...@volarvideo.com wrote: cluster.routing.allocation.exclude._ip: ip1, ip2 I use this one and I'm pretty sure its worked for me in the past. Nik -- You received this message because you are subscribed to the Google Groups elasticsearch group. To

Re: Elasticsearch index creation / deletion incredibly slow

2014-12-17 Thread Nikolas Everett
On Dec 17, 2014 11:20 PM, Swaraj Banerjee swaraj...@gmail.com wrote: Hi all, I have a an ES cluster hosted on amazon with ~ 7000 indexes (most of which are sparsely populated 100 docs). Up till today, creating or deleting an index in the cluster took ~3 seconds. All of a sudden, creating or

Re: Looking for a best practice to get all data according to some filters

2014-12-14 Thread Nikolas Everett
Search consumes O(offset + size) memory and O(ln(offset + size)*(offset+size) CPU. Scan scroll has higher overhead but is O(size) the whole time. I don't know the break even point. The other thing is that scroll provides a consistent snapshot. That means it consumes resources you shouldn't let

Re: Is there a way to do exact and full-text searching without creating two different fields?

2014-12-14 Thread Nikolas Everett
Look at multifields. They let you send the field once and analyze it multiple times. You also might want to use keyword ananlyzer and lowercase filter rather than not_analyzed. Folks are used to case insensitivity. Nik Is there a way to do exact and full text searches without having to create

Re: Is there a way to completely drop incoming documents from indexing based on some criteria?

2014-12-13 Thread Nikolas Everett
We solve problems like this in two ways: Adding queueing or concurrent request limits. Queueing buys retries for free and can absorb temporary shocks. You can also get things like priority, backlog monitoring, and manual backlog grooming. I think logstash already supports this, but I don't know

Re: To Raid or not to Raid

2014-12-12 Thread Nikolas Everett
Striping raid is viable for 2 or 3 disks because of the redundancy. Software raid works fine for me. Hardware raid enables battery backed write behind but I don't know how important that is with ssds. Either way, we go 2xSSDs per server with os in mirrored raid and data striped. Depending on your

Re: Is there a way to completely drop incoming documents from indexing based on some criteria?

2014-12-12 Thread Nikolas Everett
Best way to do it is on the client side I believe. You could probably abuse transforms to just blow up when you see something you don't like. I don't _think_ they have the ability to manipulate the operation (to make it noop) though. If they do there certainly aren't any tests to make sure that

Re: Cluster clients

2014-12-11 Thread Nikolas Everett
The only thing to keep in mind is that if the node is down you should just retry on another one. The client might handle that for you, I dunno. its important though because you don't want to lose 1/4 of your traffic when you restart a node. Nik On Thu, Dec 11, 2014 at 3:11 PM, Nick Canzoneri

Re: Behavior of detect_noop in script updates versus doc updates

2014-12-11 Thread Nikolas Everett
Yes. If you want noop script updates you have to do something else. There are docs on the script page. On Dec 11, 2014 3:45 PM, Loren lo...@siebert.org wrote: The documentation http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-update.html#docs-update for detect_noop

Re: Is shard splitting supported in Elastic search, any alternate

2014-12-11 Thread Nikolas Everett
Its never been a problem for me. Normally for time series data you handle this by creating a new index every day. For non-time series data I basically do this: http://www.elasticsearch.org/blog/changing-mapping-with-zero-downtime/ It has the advantage of letting me change the mapping and

[ANN] Released wikimedia-extra plugin (1.3.0 and 1.4.0)

2014-12-11 Thread Nikolas Everett
I just finished releasing the wikimedia extra https://github.com/wikimedia/search-extra Elasticsearch plugin (versions 1.3.0 and 1.4.0). This release adds two things: 1. Elasticsearch 1.4.0 support (in the 1.4.0 version) 2. A new ```safer``` query (in 1.4.0 and 1.3.0 versions). This query

Re: Frequent updates to documents

2014-12-11 Thread Nikolas Everett
A small change costs as much as a large one. Your best bet is to batch multiple updates for the same document together if possible. Also make sure that your updates actually change something. Sending the exact same document with the same ID still does an update. On Dec 12, 2014 12:24 AM, Jinal

Re: Query document size

2014-12-10 Thread Nikolas Everett
What are you looking to measure? The indexes don't really have a per document size because they, well, are indexes. The documents do taken up some space on disk but they are compressed. On Dec 10, 2014 6:02 AM, Jojo Juju tv.in.con...@gmail.com wrote: Hi, I'm fairly new to ES and I wonder if

Re: Query document size

2014-12-10 Thread Nikolas Everett
: Compressed size of a document on disk would be enough. We use store level compression not the per document. Would this be then actually possible? Thanks On Wednesday, December 10, 2014 1:27:49 PM UTC+1, Nikolas Everett wrote: What are you looking to measure? The indexes don't really have

Re: Query Millions of records in Elasticsearch

2014-12-08 Thread Nikolas Everett
On Mon, Dec 8, 2014 at 9:11 AM, Sushmitha Chakka sushmi...@sigmoidanalytics.com Hi, I have an index with 6 Crores of records. My usecase is to read the entire index, check each record, whether it is present in new index or not.If not I have to index into new index. I used scan and scroll

Re: Upsert and Script on large index cause the cluster to timeout.

2014-12-08 Thread Nikolas Everett
I'm not sure what is up but remember that post_ids in the script is a list not a set. You might be growing it without bounds. On Dec 8, 2014 2:49 PM, Christophe Verbinnen djp...@gmail.com wrote: Hello, We have a small cluster with 3 nodes running 1.3.6. I have an index setup with only two

Re: mapping definition help (copy_to, dates)

2014-12-06 Thread Nikolas Everett
Our you can always transform in you client application. The advantage of transform is that it is done _post_ source like copy_to. Meaning is you like the original format for disk space and highlighting purposes you should use transform. If you don't, transform in your app. Nik On Sat, Dec 6,

Re: understaning terms syntax

2014-12-06 Thread Nikolas Everett
Also, its usually better to use a match query if you want to analyze the query rather than query_string. Query string exposes a huge array of syntax which is both useful and terribly dangerous. Users can write regexes and huge range queries and fuzzy queries that use much much more cpu and ram

Re: Does Elastic Search 2.0 have shard splitting?

2014-12-05 Thread Nikolas Everett
I've never found myself wanting shard splitting. I always have an analysis update I want to apply when I want to reshard data anyway so I just scan from one index into one with new settings. I do find the the FAQ a bit odd though. Elasticsearch allows you to do lots of inefficient things and that

Re: Does Elastic Search 2.0 have shard splitting?

2014-12-05 Thread Nikolas Everett
That works ok if you are inserting but updates and deletes become more complex. Scoring can get a bit funky too because your shards don't have roughly equal frequencies. All and I'll I'd argue the adding more indecies behind and alias is only sometimes a solution to the problem. Nik On Fri,

Re: Spellchecking with term and phrase suggesters

2014-12-05 Thread Nikolas Everett
On Fri, Dec 5, 2014 at 11:49 AM, Michele Palmia micpal...@gmail.com wrote: Hi all, I need to set up a system that provides spellchecking functionality on user searches, similar to what Google does with its well known *did-you-mean *suggestions. The *term suggester* works very well for

Re: Spellchecking with term and phrase suggesters

2014-12-05 Thread Nikolas Everett
On Fri, Dec 5, 2014 at 12:43 PM, Nikolas Everett nik9...@gmail.com wrote: On Fri, Dec 5, 2014 at 11:49 AM, Michele Palmia micpal...@gmail.com wrote: Hi all, I need to set up a system that provides spellchecking functionality on user searches, similar to what Google does with its well

  1   2   3   4   >