Looking for Elasticsearch projects

2014-09-02 Thread Ivan Brusic
For those that are not regulars on the mailing list, I am a fairly active member that has used Elasticsearch for years. I am leaving my full-time job to focus on other (techie and non-techie) goals and would love to work on some interesting projects part-time. It can be either paid assignments or

Re: JVM crash on 64 bit SPARC with Elasticsearch 1.2.2 due to unaligned memory access

2014-08-29 Thread Ivan Brusic
to the fantastic Elasticsearch team who did not hesitate to test the fix immediately and replaced it with a better working solution, since the lzf-compress software is having weaknesses regarding threadsafety. Jörg On Wed, Aug 27, 2014 at 7:01 PM, Ivan Brusic iv...@brusic.com wrote: Amazing job

Re: EL setup for fulltext search

2014-08-29 Thread Ivan Brusic
, start_offset : 103, end_offset : 112, type : word, position : 18 } ] } So it seems the template is not used?! Any obvious reason/mistakes? Thx, Marc On Thursday, August 28, 2014 6:17:08 PM UTC+2, Ivan Brusic wrote: Use the Analyze API to view what tokens are being

Re: Explicitly Copying Replica Shards That Fail to Start

2014-08-29 Thread Ivan Brusic
I used to apply that trick all the time with older versions of Elasticsearch! Thankfully it has not occurred to me in years. -- Ivan On Thu, Aug 28, 2014 at 3:53 PM, Mark Walkom ma...@campaignmonitor.com wrote: Yep, the easiest way is to drop the replica and then add it back and see how

Re: Replica assignement on the same host

2014-08-29 Thread Ivan Brusic
. Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email: ma...@campaignmonitor.com web: www.campaignmonitor.com On 30 August 2014 05:16, Ivan Brusic i...@brusic.com wrote: The replica of a shard should never be on the same node as the primary. Where did you notice this anomaly

Re: EL setup for fulltext search

2014-08-28 Thread Ivan Brusic
appId=cs Times=Me:22/Total:22 (updated attributes=gps_lng: 183731222/ gps_lat: 289309222/ ) I cannot search for MyMDB or onMessage; only MyMDB.onMessage will work. Anymore Ideas? Cheers, Marc On Wednesday, August 27, 2014 9:20:49 AM UTC+2, Ivan Brusic wrote: Off the top of my head, I

Re: Is there an opposite of dismaxquery? E.g. take the minimum score of two queries?

2014-08-28 Thread Ivan Brusic
A dismax query will basically rewrite the query as a boolean query. Can you create your own query where one of the clauses has a negative boost? Still tricky since you basically need the inverse of cross/best_field where the field remains the same, but the query changes. -- Ivan On Thu, Aug

Re: Stop words and Keyword tokenizer

2014-08-28 Thread Ivan Brusic
Also note that the content returned will still contain the stop words. Only the inverted index will contain the stopword-less content. -- Ivan On Thu, Aug 28, 2014 at 11:55 AM, Itamar Syn-Hershko ita...@code972.com wrote: What would be the usecase for such a process (removing stop words

Re: Stop words and Keyword tokenizer

2014-08-28 Thread Ivan Brusic
:03 GMT-05:00 Ivan Brusic i...@brusic.com: Also note that the content returned will still contain the stop words. Only the inverted index will contain the stopword-less content. -- Ivan On Thu, Aug 28, 2014 at 11:55 AM, Itamar Syn-Hershko ita...@code972.com wrote: What would

Re: Stop words and Keyword tokenizer

2014-08-28 Thread Ivan Brusic
the only way to remove stop words from tokens obtained from a keyword tokenizer? Are those regular expressions not very performant? 2014-08-28 15:49 GMT-05:00 Ivan Brusic i...@brusic.com: You mentioned in your original post I'd like to obtain the original text without stop words

Re: EL setup for fulltext search

2014-08-27 Thread Ivan Brusic
Off the top of my head, I would use a custom analyzer with a whitespace tokenizer and a word delimiter filter (preserving the original tokens as well). Perhaps a shingle filter to create bigrams. Or better yet a pattern tokenizer with spaces and parenthesis. Cheers, Ivan On Tue, Aug 26, 2014

Composing function scores

2014-08-27 Thread Ivan Brusic
I have a couple of questions regarding function scores. I probably already know the answers, but just wanted to double-check with the community. I am still on version 1.1.1, so perhaps things have changed since then. First question is regarding ordering and efficiency. Currently, my query is a

Re: JVM crash on 64 bit SPARC with Elasticsearch 1.2.2 due to unaligned memory access

2014-08-27 Thread Ivan Brusic
Amazing job. Great work. -- Ivan On Tue, Aug 26, 2014 at 12:41 PM, joergpra...@gmail.com joergpra...@gmail.com wrote: I fixed the issue by setting the safe LZF encoder in LZFCompressor and opened a pull request https://github.com/elasticsearch/elasticsearch/pull/7466 Jörg On Tue,

Re: Getting different results while using bool query vs bool query with function score query

2014-08-27 Thread Ivan Brusic
The function score should not affect relevancy, only the scoring, so the number of results should not differ. Strange. Perhaps you do not need to use a function score. With the simple query string, you can append the boost parameter to the field name: simple_query_string: { query: 128, fields:

Re: When to multi_field

2014-08-27 Thread Ivan Brusic
The standard use case for a multi-field is when a field needs to be both analyzed (for searching) and not analyzed (for aggregating/sorting). In this case, there really is no workaround, so a multi-field is essential. In the different analyzer case, it gets more complicated. How much can you get

Re: Getting different results while using bool query vs bool query with function score query

2014-08-27 Thread Ivan Brusic
is not working. If you post some example documents and mapping, but others might be able to figure it out. -- Ivan On Wed, Aug 27, 2014 at 11:07 AM, Ivan Brusic i...@brusic.com wrote: The function score should not affect relevancy, only the scoring, so the number of results should not differ

Re: Determine if search term is a noun?

2014-08-27 Thread Ivan Brusic
This process is easier (but still not easy) if you pre-process your data on the client side at indexing time. You can mark your terms with their respective Parts of Speech using a payload filter:

Re: Can't open file to read checksums

2014-08-26 Thread Ivan Brusic
A few questions: What version of Elasticsearch are you using? Are you using the Java client and is it the same version of the cluster? Did you upgrade recently and was the index built with an older version of Elasticsearch? Elasticsearch recently added checksum verification (1.3?), so perhaps

Re: Elastic search dynamic number of replicas from Java API

2014-08-23 Thread Ivan Brusic
, and with the current transport request/response cycle, they must poll for new events ... Jörg On Thu, Jul 10, 2014 at 6:38 PM, Ivan Brusic iv...@brusic.com wrote: Jörg, have you actually implemented your own ClusterStateListener? I never had much success. Tried using that interface or even

Re: Optimizing queries for a 5 node cluster with 250 M documents (causes OutOfMemory exceptions and GC pauses)

2014-08-23 Thread Ivan Brusic
allocated 5*10G RAM to the cluster. Things are looking ok as of now, except that the aggregations (on strings) are quite slow. May be I would run these aggregations as batch and cache the outputs in a different type and move on for now. Thanks NY On Fri, Aug 22, 2014 at 10:34 PM, Ivan Brusic

Re: query_string can't find token that _analyze shows is generated, but term query can

2014-08-21 Thread Ivan Brusic
I suspect the issue is the way the query parser works. The query phrase exampleof bug will be parsed into a query for the tokens exampleof and bug that are adjacent to each other. The issue is that you do not have two such tokens, instead you have a token with the value exampleof bug, which is a

Re: query_string can't find token that _analyze shows is generated, but term query can

2014-08-21 Thread Ivan Brusic
documentation. Thanks! On Thursday, August 21, 2014 9:52:16 AM UTC-7, Ivan Brusic wrote: I suspect the issue is the way the query parser works. The query phrase exampleof bug will be parsed into a query for the tokens exampleof and bug that are adjacent to each other. The issue is that you do

Re: query_string can't find token that _analyze shows is generated, but term query can

2014-08-21 Thread Ivan Brusic
Here is the Lucene issue: https://issues.apache.org/jira/browse/LUCENE-2605 -- Ivan On Thu, Aug 21, 2014 at 10:09 AM, Ivan Brusic i...@brusic.com wrote: The query string query is a phrase query \exampleof bug\ The term query is looking for a single token exampleof bug The query parser

Re: query_string can't find token that _analyze shows is generated, but term query can

2014-08-21 Thread Ivan Brusic
, August 21, 2014 10:09:29 AM UTC-7, Ivan Brusic wrote: The query string query is a phrase query \exampleof bug\ The term query is looking for a single token exampleof bug The query parser will not use your tokenizer to parse the phrase. It will tokenize based on whitespace and then apply

Re: query_string can't find token that _analyze shows is generated, but term query can

2014-08-21 Thread Ivan Brusic
. Thanks for your help. On Thursday, August 21, 2014 10:42:32 AM UTC-7, Ivan Brusic wrote: In general, if you are using the keyword tokenizer or non analyzed fields, then query string queries should probably not be used. Phrase queries and the keyword tokenizer also do not mix well. Your

Re: Call when shard reallocation occurs

2014-08-21 Thread Ivan Brusic
AFAIK, there is no way to achieve such functionality. The only way I have figured out have similar functionality is to write a plugin with a cluster state listener and have the plugin reach out to some external service. Cheers, Ivan On Thu, Aug 21, 2014 at 10:02 AM, 'Sandeep Ramesh Khanzode'

Re: What the heck is this search?? :)

2014-08-20 Thread Ivan Brusic
Very strange query indeed. Wildcard search filtered by a match_all. What?!? It is not Elasticsearch, but perhaps some plugin. Itamar mentioned Kibana, although you did not mention it in your post. Any other plugins? Marvel? -- Ivan On Wed, Aug 20, 2014 at 12:43 PM, Itamar Syn-Hershko

Re: Ideal setup for EC2 cluster Config

2014-08-20 Thread Ivan Brusic
If you only have 3 nodes, I would just stick to the defaults, which is both master and data. Having dedicated master (no data) nodes helps because it helps eliminate OOM pressures since the actual data lives elsewhere. With so few nodes, every machine should hold a portion of the data. Dedicated

Re: Using a char_filter in combination with a lowercase filter

2014-08-19 Thread Ivan Brusic
letters? (note that we actually regard 'ij' to be a single character.) It's not like removing the accents from 'ä', or transcribing a Cyrillic number into it's Roman equivalent, or am I wrong to that regard? Regards, Matthias On Tuesday, August 19, 2014 6:37:29 AM UTC+2, Ivan Brusic wrote

Re: Any reason for this package org.elasticsearch.comon.netty.*?

2014-08-19 Thread Ivan Brusic
At one point Elasticsearch shaded several different libraries for various reasons, but thankfully this is no longer the case. From what I understand, the Jetty classes you are referring to are custom classes built for Elasticsearch that are not packaged with Jetty. Cheers, Ivan On Tue, Aug 19,

Re: how to get char_filter to work?

2014-08-18 Thread Ivan Brusic
at the index with non html tags if you will? On Friday, August 8, 2014 12:52:37 PM UTC-4, Ivan Brusic wrote: The field is derived from the source and not generated from the tokens. If we indexed the sentence The quick brown foxes jumped over the lazy dogs with the english analyzer, the tokens

Re: Using a char_filter in combination with a lowercase filter

2014-08-18 Thread Ivan Brusic
Char filters are applied before the text is tokenized, and therefore they are applied before the normal filters are used, which is why they are a separate class of filter. With Lucene, the order is: char filters - tokenizer - filters Have you looked into the ICU analyzer?

Re: Creating filters per aggregation similar to Facets

2014-08-12 Thread Ivan Brusic
Trying using a filter aggregation: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-bucket-filter-aggregation.html The idea is that the filter is the outer most aggregation and the aggregation you actually want to filter is the sub-aggregation. Cheers,

Re: github issues search query DSL

2014-08-12 Thread Ivan Brusic
Look into the Lucene query parser, which its the syntax that the query string query uses. After that, look into the various Lucene contrib modules that extend the query syntax (span near is one). I do not think that anyone has implemented a new query parser as an elasticsearch plugin yet, but I

Re: How to receive part of the text field?

2014-08-12 Thread Ivan Brusic
If the 200kb number is fixed, then the simplest solution would be to store that content separately in a new field. It does not need to be analyzed, just stored. Perhaps highlighters might work. Never used them, so it is just a guess. Cheers, Ivan On Aug 12, 2014 8:17 AM, Dmitriy Bashkalin

Re: Absolute scoring of fields

2014-08-11 Thread Ivan Brusic
You can wrap each individual match query in a constant score query and place them as clauses in a boolean query. The guide has an example: http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/ignoring-tfidf.html#constant-score-query Cheers, Ivan Bernhardt Scherer

Re: how to get char_filter to work?

2014-08-08 Thread Ivan Brusic
The analyzers control how text is parsed/tokenized and how terms are indexed in the inverted index. The source document remains untouched. -- Ivan On Fri, Aug 8, 2014 at 9:24 AM, IronMike sabdall...@gmail.com wrote: I also used Clint's example and tried to map it to a document and search

Re: how to get char_filter to work?

2014-08-08 Thread Ivan Brusic
not for the source. I thought I could query the field not the source and look at it with no html while the source was intact. Did I misunderstand? On Friday, August 8, 2014 12:36:16 PM UTC-4, Ivan Brusic wrote: The analyzers control how text is parsed/tokenized and how terms are indexed

Re: Analyzer doesn't seems to be working on special characters

2014-08-07 Thread Ivan Brusic
The problem might be with your encoding, not the analyzer. Your content is in one format and either your output is in another or your viewer (terminal, browser) is in another. Make sure everything is consistent (UTF-8 for most people). Where are you seeing the � character? -- Ivan On Thu, Aug

Re: transport client? really?

2014-08-06 Thread Ivan Brusic
Since version 1.0, there should be fewer binary protocol issues between any nodes, including the clients, making rolling upgrades doable. Older clients should be able to interact with newer server nodes, but the inverse is not always the case. -- Ivan On Wed, Aug 6, 2014 at 8:47 AM, Brian

Re: Stripping html for indexing only?

2014-08-06 Thread Ivan Brusic
1. Correct. 2. Also correct. The analysis chain only affects how the terms are indexed and placed in the inverted index. The original document remains as is. 3. Not sure since I have never done highlighting. Highlighting might not depend on the source since the term positions/offsets are used, but

Re: Search result only with unique value of the specific field

2014-08-06 Thread Ivan Brusic
Perhaps the top hits aggregation can help: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-metrics-top-hits-aggregation.html -- Ivan On Wed, Aug 6, 2014 at 11:21 AM, slavag slav...@gmail.com wrote: Hi, Need some advise. I have indexed documents,

Re: Search result only with unique value of the specific field

2014-08-06 Thread Ivan Brusic
: [ * ] }, size : 1 } } } } } } } What could be issue with my aggregation ? I'm using ES 1.2.1 Thanks On Wednesday, August 6, 2014 10:06:40 PM UTC+3, Ivan Brusic wrote

Re: Some observations with Curator

2014-08-05 Thread Ivan Brusic
I am still fully in the nothing but E stack! Is anyone else using Elasticsearch for ... search? :) -- Ivan On Tue, Aug 5, 2014 at 10:50 AM, Brian brian.from...@gmail.com wrote: Using the most recent release (1.2.2) of Curator, I noticed that the documentation says --logfile while curator

Re: How to rebalance shard

2014-08-05 Thread Ivan Brusic
Are you applying a custom routing to your documents? -- Ivan On Tue, Aug 5, 2014 at 2:33 AM, Warat Wongmaneekit canopyb...@gmail.com wrote: Now my cluster is not rebalance the data. How can I rebalance it please see the summary below.

Re: Multiple master problem in elasticsearch 0.90.10

2014-08-05 Thread Ivan Brusic
There should be no need to run a master and a data node on each machine. Only two masters is not enough to reliably form a consensus and you are only taking away processing power from the data node. -- Ivan On Tue, Aug 5, 2014 at 5:30 AM, Ankit Mittal ankit.lnc...@gmail.com wrote: Hi All,

Re: Inexplicable wrong results in automated tests

2014-08-04 Thread Ivan Brusic
Are you refreshing the index after inserting the test documents? I could be simply a matter of timing. -- Ivan On Sun, Aug 3, 2014 at 8:22 AM, John D. Ament john.d.am...@gmail.com wrote: Hi So after running a few rounds of local automated tests, I've noticed that sometimes I get the wrong

Re: Little problem

2014-08-04 Thread Ivan Brusic
The only way to achieve the result you are seeking is to use parent/child documents: http://www.elasticsearch.org/blog/managing-relations-inside-elasticsearch/ http://www.spacevatican.org/2012/6/3/fun-with-elasticsearch-s-children-and-nested-documents/ Cheers, Ivan On Mon, Aug 4, 2014 at 9:10

Re: How to debug NoNodeAvailableException

2014-08-04 Thread Ivan Brusic
You should switch to using bulk indexing instead of indexing an individual documents. Also, considering switching off the refresh interval (set it to -1) for the duration of your bulk indexing. Cheers, Ivan On Mon, Aug 4, 2014 at 3:08 AM, Dennis de Boer datdeb...@gmail.com wrote: Sure,

Re: boosting query howto?

2014-08-04 Thread Ivan Brusic
Javadocs also available at http://jenkins.elasticsearch.org/job/Elasticsearch%20Master%20Branch%20Javadoc/Elasticsearch_API_Documentation/ http://javadoc.kyubu.de/elasticsearch/ (unofficial) -- Ivan On Mon, Aug 4, 2014 at 5:28 AM, Bernd Fehling bernd.fehl...@gmail.com wrote: Thanks a lot.

Re: EsRejectedExecutionException

2014-08-01 Thread Ivan Brusic
are fixed on processor size, is there any chance of automatic updation on these? Thanks again for reporting this. On Thursday, 31 July 2014 22:41:19 UTC+5:30, Ivan Brusic wrote: The thread pool will reject any search requests when there are 1000 actions already queued. http

Re: Help with creating mapping for my products

2014-08-01 Thread Ivan Brusic
With inner dccuments, you cannot query two different attributes for the same inner object. Internally, Elasticsearch/Lucene will flatten all the inner properties into one array. You would need to switch to nested documents or even parent/child documents in order to query on a single deeper

Re: Pull Requests mounting?

2014-08-01 Thread Ivan Brusic
Once I hit 3 open pull requests, I stopped submitting them. :) I am down to only 1 open (from last year), but my confidence is still somewhat low. Most of my changes have been small, mainly because I am hesitant about dedicating more time for something that might be ignored. I need to simply

Re: Dealing with spam in this forum

2014-08-01 Thread Ivan Brusic
I just deleted a couple of non-related job postings and banned the poster. Going forward, is there a consensus among the community about whether or not job postings should be allowed? I do not mind postings that come directly from companies, especially those whose existing developers are already

Re: Failed to configure logging error on start up.

2014-07-31 Thread Ivan Brusic
That scenario should not happen since FileVisitOption.FOLLOW_LINKS is enabled. https://github.com/elasticsearch/elasticsearch/blob/fe86c8bc88a321bf587dd8eb4df52aaed9ed2156/src/main/java/org/elasticsearch/common/logging/log4j/LogConfigurator.java#L107 Seems like a bug somewhere. -- Ivan On

Re: Elasticsearch still scan all types in a index even if I specify a type

2014-07-31 Thread Ivan Brusic
All types eventually belong to the same Lucene index and Lucene cannot handle different types for the same field name. Avoid using the same name across types if the field type is different. http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/mapping.html#_avoiding_type_gotchas --

Re: Recommendation for using stop words

2014-07-31 Thread Ivan Brusic
Tuning stop words can be as long of a process as you want it to be. Saving your queries/results and doing some search analytics can help you fine tune the stop words. In general, the default stop words list is very good for English, but Twitterspeak is not really English. :) You can look at all

Re: EsRejectedExecutionException

2014-07-31 Thread Ivan Brusic
The thread pool will reject any search requests when there are 1000 actions already queued. http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-threadpool.html Do you have this many search requests at one time? Do you have warmers and/or percolators running since you

Re: Memory Explosion: Heap Dump in less than one minute

2014-07-31 Thread Ivan Brusic
Look into the curator, which should help: https://github.com/elasticsearch/curator If you have just a single development instance, perhaps Marvel is an overkill. Do you need historical metrics? If not, just use some other plugin such as head/bigdesk/hq. Cheers, Ivan On Thu, Jul 31, 2014 at

Re: Java transport client, which hosts to add?

2014-07-30 Thread Ivan Brusic
You should as many nodes as possible. If you enable client.transport.sniff, then the transport client will ask the nodes it does connect to about the other nodes in the cluster, which means you can potentially only need to specific a single node (not ideal in case that node is down). -- Ivan

Re: log index creation API requests

2014-07-30 Thread Ivan Brusic
The logging.xml file will only control which logging statements get outputed, not the amount of information it may contain. The log line in question does not have the source ip, which is long gone by the time the service gets the request.

Re: cluster.routing.allocation.enable behavior (sticky shard allocation not working as expected)

2014-07-30 Thread Ivan Brusic
, Andrew Davidoff david...@qedmf.net wrote: On Tuesday, July 29, 2014 3:27:13 PM UTC-4, Ivan Brusic wrote: Have you changed your gateway settings? http://www. elasticsearch.org/guide/en/elasticsearch/reference/ current/modules-gateway.html#recover-after It still remains a bit of black magic to me

Re: Rolling Upgrading from 1.2.1 to 1.3.0 – java.lang.IllegalArgumentException: No enum constant org.apache.lucene.util.Version.4.3.1

2014-07-29 Thread Ivan Brusic
It will depend on your merge settings and your shard size. Not sure why not, but I do not recall what the default settings are. -- Ivan On Mon, Jul 28, 2014 at 8:52 PM, smonasco smona...@gmail.com wrote: Do you happen to know if optimize will create a segment larger than 5 gigs? -- You

Re: How I can do exact search by not_analyzed fields?

2014-07-29 Thread Ivan Brusic
The _all field has its own analyzer, so the analyzer that is defined on the createdBy field is not applied. I have never tried, but I believe the best solution is to use copy-to to a custom field:

Re: How I can do exact search by not_analyzed fields?

2014-07-29 Thread Ivan Brusic
on the client side. My application knows which fields are analyzed and which are not and creates queries accordingly. -- Ivan On Tue, Jul 29, 2014 at 10:52 AM, Ivan Brusic i...@brusic.com wrote: The _all field has its own analyzer, so the analyzer that is defined on the createdBy field

Scripting aggregation order

2014-07-29 Thread Ivan Brusic
Would it be possible to script the sort order of aggregations? Let me explain with a contrived example: { aggs: { agg1: { terms: { field: somefield, order: { avg1: desc }, size : 100 }, aggs: {

Re: Atomically create index with single alias

2014-07-29 Thread Ivan Brusic
You can have an alias point to multiple indices. With time-series data, this should be a problem since you will not have overlap between the different indices. But I think you are correct in that there is no atomic way to accomplish all three. I only search against aliases, so having an index

Re: Atomically create index with single alias

2014-07-29 Thread Ivan Brusic
Typo: With time-series data, this should NOT be a problem On Tue, Jul 29, 2014 at 11:57 AM, Ivan Brusic i...@brusic.com wrote: You can have an alias point to multiple indices. With time-series data, this should be a problem since you will not have overlap between the different indices. But I

Re: cluster.routing.allocation.enable behavior (sticky shard allocation not working as expected)

2014-07-29 Thread Ivan Brusic
Have you changed your gateway settings? http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-gateway.html#recover-after It still remains a bit of black magic to me. Sometimes it works, sometimes it does not. Cheers, Ivan On Mon, Jul 28, 2014 at 1:52 PM, Andrew

Re: Scripting aggregation order

2014-07-29 Thread Ivan Brusic
, Adrien Grand adrien.gr...@elasticsearch.com wrote: Hi Ivan, This is not supported but there is an open issue that aims at making this kind of things possible https://github.com/elasticsearch/elasticsearch/issues/6917 On Tue, Jul 29, 2014 at 8:50 PM, Ivan Brusic i...@brusic.com wrote

Re: Use java Api to set a document's field as _id

2014-07-28 Thread Ivan Brusic
The behavior is applied at the creation of the index within the mapping, not during the prepareIndex call. The example you provided is part of the mapping and not part of the document that gets indexed. If you want to override which field will be used as the _id field, you would need to change

Re: Rolling Upgrading from 1.2.1 to 1.3.0 – java.lang.IllegalArgumentException: No enum constant org.apache.lucene.util.Version.4.3.1

2014-07-28 Thread Ivan Brusic
There was a bug in Lucene which caused problems with Elasticsearch 1.3.0. You might already know this, but 1.3.1 was released today to fix this issue: http://www.elasticsearch.org/blog/elasticsearch-1-3-1-released/ The issue should only affect older versions. Your version is newer, but the error

Re: rest api or java client?

2014-07-25 Thread Ivan Brusic
Answers inline. On Fri, Jul 25, 2014 at 3:06 AM, CB chen.be...@gmail.com wrote: thanks for the answers, here are my thoughts: 1. If using pure REST client - Using a Load Balancer will make sure that the endpoint address goes to any of the live nodes (round robin) so that if one of those

Re: prevent 'match_phrase' from evaluating score

2014-07-25 Thread Ivan Brusic
or configuration to completely skip the score evaluation process. Thank you, Ivan! Ivan Brusic於 2014年7月25日星期五UTC+8上午1時03分53秒寫道: I am not sure if there is a cleaner way to bypassing score, but if you explicitly sort against another value that is not the score, then by default scoring

Re: prevent 'match_phrase' from evaluating score

2014-07-24 Thread Ivan Brusic
I am not sure if there is a cleaner way to bypassing score, but if you explicitly sort against another value that is not the score, then by default scoring will not occur. Perhaps if you trace the code for sort, you can find a setting that disables scoring in general. -- Ivan On Wed, Jul 23,

Re: SIREn plugin for nested documents

2014-07-24 Thread Ivan Brusic
Thanks for chiming in Renaud. Hopefully I will have a chance to test out the plugin soon. My use case for nested documents is fairly simple. -- Ivan On Thu, Jul 24, 2014 at 4:00 AM, ren...@sindicetech.com wrote: Hi Brian, Our apologies for the issues with the web site, we had some problems

Re: Hit/Token Properties Advanced Scoring

2014-07-23 Thread Ivan Brusic
1. You can retrieve the term position, offset and payload using function score scripts: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-advanced-scripting.html 2: There are a couple of proposed solutions that would store the data in another index that is joined with

Re: one index has different _type and different _type have same field with different type will disturb?

2014-07-23 Thread Ivan Brusic
In the end, all the documents end up in the same Lucene index, and while Lucene is schema-less, all similarly named fields must be the same type. Types are useful in Elasticsearch to separate different type configurations, but will fail on similarly named fields. There is some work being done to

SIREn plugin for nested documents

2014-07-23 Thread Ivan Brusic
Has anyone else seen this plugin? http://siren.solutions/siren/overview/ There was some discussion between one of the developers and Jorg a while back, so I guess this is the outcome. Have not tried it yet, but I will give it a shot this weekend. I am hoping that it can fix a longstanding issue

Re: Can I filter with exact phrases?

2014-07-23 Thread Ivan Brusic
You can wrap any query with a query filter: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-query-filter.html -- Ivan On Wed, Jul 23, 2014 at 1:52 PM, IronMike sabdall...@gmail.com wrote: How can I exclude exact phrases with a filter? Lets say I want to

Re: Help with Synonyms

2014-07-22 Thread Ivan Brusic
Your issue is casing. You are only applying the synonym filter, which by default does not lowercase terms. You can either set ignore_case to true for the synonym filter or apply a lower case filter before the synonym. I prefer to use the latter approach since I prefer to have all my analyzed

Re: Help with Synonyms

2014-07-22 Thread Ivan Brusic
} } } } }' On Tuesday, July 22, 2014 11:56:40 AM UTC-5, Ivan Brusic wrote: Your issue is casing. You are only applying the synonym filter, which by default does not lowercase terms. You can either set ignore_case to true for the synonym filter or apply a lower case filter before

Re: Help with Synonyms

2014-07-22 Thread Ivan Brusic
-tokenstreams-are-actually.html http://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/ -- Ivan On Tue, Jul 22, 2014 at 11:03 AM, Ivan Brusic i...@brusic.com wrote: A couple of reasons. The biggest issue is multi word synonyms since the query parser will tokenize the query before

Re: Problems with Post filter in 0.90.2; script filter and related question

2014-07-22 Thread Ivan Brusic
The post filter is not a addition in 0.90.8, just a renaming of a field that was ambiguous: https://github.com/elasticsearch/elasticsearch/issues/4119 Pre filters are simply filtered queries. In most cases, you want to use the pre filters. Queries are expensive in Lucene since you have to score

Re: field region was indexed without position data; cannot run PhraseQuery

2014-07-22 Thread Ivan Brusic
Your mapping does not seem correct. Can you post the output of the get mapping API instead? It appears that the region field might be a geo type instead. Analyzed fields should have position data enabled by default. -- Ivan On Tue, Jul 22, 2014 at 8:46 PM, xu piao xupia...@gmail.com wrote: i

Re: Elasticsearch recovering process took a long time.

2014-07-21 Thread Ivan Brusic
Recovery is throttled since version 0.90.1 http://www.elasticsearch.org/guide/en/elasticsearch/reference/0.90/index-modules-store.html#store-throttling Increase indices.store.throttle.max_bytes_per_sec to a level that is suitable for your environment. Since IO should be the main bottleneck, the

Re: Direct buffer memory problem on master Discovery

2014-07-16 Thread Ivan Brusic
Most users do not set the direct memory setting. mlockall is set, but does the server allow it? You would see an error on startup if it didn't. Did you change the vm swapiness on the server? -- Ivan On Wed, Jul 16, 2014 at 2:40 AM, Pedro Jerónimo pedropregue...@gmail.com wrote: *Java: *java

Re: term query no hits was Re: No parser for element [term]

2014-07-16 Thread Ivan Brusic
By default, string fields are analyzed using the standard analyzer, which will tokenize and lowercase the input (I believe stop words are now NOT removed). A term query does not analyze the query, so it only works on non analyzed fields (or fields that use a keyword tokenizer). A term query for

Re: term query no hits was Re: No parser for element [term]

2014-07-16 Thread Ivan Brusic
what the problem is for that query. On Wed, Jul 16, 2014 at 10:27 AM, Ivan Brusic i...@brusic.com wrote: By default, string fields are analyzed using the standard analyzer, which will tokenize and lowercase the input (I believe stop words are now NOT removed). A term query does not analyze

Re: term query no hits was Re: No parser for element [term]

2014-07-16 Thread Ivan Brusic
behaving as if it has been analyzed. On Wed, Jul 16, 2014 at 11:00 AM, Ivan Brusic i...@brusic.com wrote: I would verify that the field is in fact non_analyzed and that your data is indexed in the way you think it is. Use the analyze API to analyze the term. Make sure you use the last

Re: term query no hits was Re: No parser for element [term]

2014-07-16 Thread Ivan Brusic
As predicted, your actual mapping does not match your perceived mapping. Something is not matching up. Perhaps the mapping is for a different index or type. Best way is to share your mapping and perhaps how you created your index as indicated at http://www.elasticsearch.org/help -- Ivan On

Re: Upgrade 0.26.6 - 1.2.2 any catches?

2014-07-15 Thread Ivan Brusic
Read more about it here: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules-store.html#store-throttling Previously it was unbounded, but now the default is 20mb, which I found to be extremely low. Also, prior to 1.2, there was a Lucene bug which made throttling

Re: Direct buffer memory problem on master Discovery

2014-07-15 Thread Ivan Brusic
Direct memory is off heap memory. Are elasticsearch and logstash the only processes on those servers? Did you set an explicit direct memory value? -- Ivan On Jul 15, 2014 3:46 PM, Mark Walkom ma...@campaignmonitor.com wrote: How much data do you have in ES, index count and total size of all

Re: Optimizing a query that matches a large number of documents

2014-07-14 Thread Ivan Brusic
Since the script is executed against lots of matched documents, perhaps converting it into a native Java script (not Javascript) would provide a performance boost. Note that using fields in scripts will force their values to be loaded into the cache. -- Ivan On Sun, Jul 13, 2014 at 8:54 AM,

Re: Elasticsearch 1.2 list of settings

2014-07-14 Thread Ivan Brusic
There are a few settings where the full named is not specified in the code, but is relative to the module it is in. Does your grep code account for these settings? A repo with pull requests might be too much for the maintainer, but a wiki would work well. Great job, Ivan On Mon, Jul 14, 2014

Re: How to change similarity settings runtime?

2014-07-14 Thread Ivan Brusic
Jörg is correct. In general, it would be a bad idea to change the similarity during runtime, but there are cases were it would be acceptable and the system should allow for those cases: https://github.com/elasticsearch/elasticsearch/issues/4403 -- Ivan On Mon, Jul 14, 2014 at 12:00 AM, Jörg

Re: Disabling _all-Field but keep Netflow-Events searchable

2014-07-14 Thread Ivan Brusic
This technically sounds like a Kibana question, so you might have better luck with the Logstash mailing list. Can't you simply prepend the field name in the query instead of relying on the default field? You can also change field names in Logstash. Another option is the copy-to-field. Similar to

Re: Upgrade 0.26.6 - 1.2.2 any catches?

2014-07-14 Thread Ivan Brusic
First of all, there is no version 0.26. I am assuming you meant 0.20.6. Either way, any upgrade from prior of 1.0 to 1.x will require a full cluster restart. 1. No clue 2. Many settings like omit_norms were deprecated, but are still support. I think that omit_tf has been changed. 3. I would

Re: spam

2014-07-12 Thread Ivan Brusic
Read the recent comments regarding with the recent spam: https://groups.google.com/d/msg/elasticsearch/byATcjKgdYE/_Neoiof4fKIJ This new spam account has been banned. -- Ivan On Sat, Jul 12, 2014 at 2:50 PM, Warner Onstine warn...@gmail.com wrote: Could we please turn on first post filters

Re: How to add several name fields to an unmach definition in a mapping definition?

2014-07-11 Thread Ivan Brusic
Besides stop works, you can use a bool query one clause is the match all, and the other clause is must not with the terms in question. Something like: { query: { bool: { must: [ { match_all: {} } ], must_not: [

<    1   2   3   4   5   6   >