Re: Dealing with spam in this forum

2015-09-28 Thread Ivan Brusic
Does the new mailing list have moderators to deal with spam? Cheers, Ivan Hi all Recently we've had a few spam emails that have made it through Google's filters, and there have been a calls for us to change to a moderate-first-post policy. I am reluctant to adopt this policy for the following

Re: Forums Are Now Live at http://discuss.elastic.co

2015-07-07 Thread Ivan Brusic
sure I'll be using discourse to great effect (and it seems to be being used). So I could be completly wrong about all my points. :) Doug On Sunday, May 10, 2015, Ivan Brusic i...@brusic.com wrote: I should have added something similar to what does expressed in his last paragraph. My feedback

Re: Forums Are Now Live at http://discuss.elastic.co

2015-05-10 Thread Ivan Brusic
folks are very smart. Email truly can become a first class experience and keep some of the great things about discourse. Cheers! Doug On Sunday, May 10, 2015, Ivan Brusic i...@brusic.com wrote: I really do not care for the new mailing list. First of all, I can no longer see real names

Re: Forums Are Now Live at http://discuss.elastic.co

2015-05-05 Thread Ivan Brusic
I am watching a few select categories with email notifications, but I still received notifications for other categories, Logstash in my case. Ivan On May 4, 2015 6:12 PM, leslie.hawthorn leslie.hawth...@elastic.co wrote: Hello everyone, We took in feedback on moving to a Discourse based forum

Re: Forums Are Now Live at http://discuss.elastic.co

2015-05-05 Thread Ivan Brusic
I might have found the conflicting setting. On May 5, 2015 9:43 AM, Ivan Brusic i...@brusic.com wrote: I am watching a few select categories with email notifications, but I still received notifications for other categories, Logstash in my case. Ivan On May 4, 2015 6:12 PM, leslie.hawthorn

Re: Split brain problem in 2 node elasticsearch cluster

2015-05-05 Thread Ivan Brusic
In non big data scenarios, having two servers for a database is simply done to achieve high availability. Most databases use a master client scenario, but Elasticsearch does not support such a setup. It really should because not everyone has tons of data. Ivan, not affiliated with the OP On May

Re: Field-length norm fails on fields with 3 and 4 words

2015-04-30 Thread Ivan Brusic
The field norm is computed at index time and is stored in a single byte, which can lead to a loss in precision. This behavior might have changed with newer versions of Lucene, but probably not. Ivan On Apr 30, 2015 6:42 PM, Fil ES lisowski.fili...@gmail.com wrote: Hello, I am experiencing an

Re: How to replicate this type of search

2015-04-30 Thread Ivan Brusic
Although the syntax is straight Lucene (query string query), I suspect that Github and other sites parse the query term to create a format similar to the one John mentioned. Cheers, Ivan On May 1, 2015 1:22 AM, Peter Sorensen peter.jens.soren...@gmail.com wrote: Sorry for the vague title. If I

Re: Apply word_delimiter token filter on words having 5 chars or more.

2015-04-24 Thread Ivan Brusic
Your best option would be to write your own filter. It should be easy since you have access to the source of the delimiter and length filters. Look at the existing filter plugins for examples on how to deploy. Ivan On Apr 24, 2015 10:39 AM, Nassim nassim.ka...@gmail.com wrote: Hi, Is it

Re: I have got a little Problem with my synonym filter ....

2015-04-21 Thread Ivan Brusic
What kind of query are you executing? Are you query against a specific field? A match query against the title field should work. When using the analyze API, explicit state the field and not the analyzer for more accurate behavior of what really goes on. Cheers, Ivan On Apr 21, 2015 11:40 AM,

Re: Evaluating Moving to Discourse - Feedback Wanted

2015-04-20 Thread Ivan Brusic
subforums. Ivan On Apr 15, 2015 7:21 PM, Leslie Hawthorn leslie.hawth...@elastic.co wrote: On Wed, Apr 15, 2015 at 9:02 AM, Ivan Brusic i...@brusic.com wrote: I should clarify that I have no issues moving to Discourse, as long as instantaneous email interaction is preserved, just wanted to point out

Re: Aggregation not limited to filter?

2015-04-15 Thread Ivan Brusic
Which version are you using! The old post filter methods simply named filter, should have been removed, or at least deprecated. Cheers, Ivan On Apr 13, 2015 1:33 PM, James Green james.mk.gr...@gmail.com wrote: Indeed. I had used postFilter to add my filters. The documentation for filters

Re: Evaluating Moving to Discourse - Feedback Wanted

2015-04-15 Thread Ivan Brusic
that impact your thoughts on moving to Discourse? Folks, please keep the feedback coming! Cheers, LH On Sat, Apr 11, 2015 at 12:09 AM, Ivan Brusic i...@brusic.com wrote: As one of the oldest and most frequent users (before my sabbatical) of the mailing list, I just wanted to say that I never had

Re: BM25 for query itself

2015-04-15 Thread Ivan Brusic
Isn't the point of BM25 to use variable document length normalization? It works when used on the entire index/corpus. It is meant to influence the TF values. Comparing results between Lucene queries is not advisable. Why did you switch to BM25? Do you field lengths vary much? Cheers, Ivan On

Re: Querystring search: Tokens are out of order

2015-04-15 Thread Ivan Brusic
You understanding is correct. The former will be translated into a Lucene phrase query, which uses the term doc positions to find matches. Both query terms are analyzed, but the latter will simply be a bag-of-words query, which ignores positions. Cheers, Ivan On Apr 14, 2015 10:38 PM, Dave Reed

Re: Elasticsearch Upgrade to Version 1.4.4

2015-04-12 Thread Ivan Brusic
In my experience, the client can be older than the server.* The server side code contains many version checks, so it should know how to handle requests from older clients. The inverse is much harder to support since clients do not change their requests based on the server. * Between minor

Re: Evaluating Moving to Discourse - Feedback Wanted

2015-04-10 Thread Ivan Brusic
As one of the oldest and most frequent users (before my sabbatical) of the mailing list, I just wanted to say that I never had an issue with it. It works. As long as I could continue using only email, I am happy. For realtime communication, there is the IRC channel. If prefer the mailing list

Re: elastic.co blog RSS URL missing

2015-03-26 Thread Ivan Brusic
I noticed the same thing. The link is redirecting for me, but my reader (AOL Reader) appears not to handle redirects. Ivan On Mar 25, 2015 9:11 AM, Magnus Bäck magnus.b...@sonymobile.com wrote: The not too widely announced move from elasticsearch.(com|org) to elastic.co the other week seems to

Re: Manually adjusting document sort based on queries

2015-03-22 Thread Ivan Brusic
Easiest option, in terms of complexity, would probably be to use a bool query, where product x and y are matched by an id query with high boosts. Best option is probably the function score query: http://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-function-score-query.html

Re: Is it possible to write my own filter ?

2015-03-12 Thread Ivan Brusic
Off the top of my head, I cannot think of an existing filter that accomplishes that task. Creating a custom filter is easy. Simply creating a Lucene filter and create a plug-in around it. Take a look at existing analysis plug-ins for inspiration.

Re: re-use zen discovery API

2015-03-11 Thread Ivan Brusic
The discovery API is not modularized enough to use it outside of the Elasticsearch context. I would simply use something like Zookeeper, which is built exactly for situations like yours. Cheers, Ivan On Mar 7, 2015 7:03 PM, Pierre de Soyres pdesoy...@gmail.com wrote: Hello, I would like to

Re: Ignore a field in the scoring

2014-12-26 Thread Ivan Brusic
Use the field in a filter and not part of the query. Is this field free text? Ivan On Dec 23, 2014 9:12 PM, Roger de Cordova Farias roger.far...@fontec.inf.br wrote: Hello Our documents have metadata indexed with them, but we don't want the metadata to interfere in the scoring After a

Re: Why ES node starts recovering all the data from other nodes after reboot?

2014-11-24 Thread Ivan Brusic
It used to be 2 concurrent streams. Has the default been upped in recent versions? I agree, that number is awfully low. If you can disable indexing during rolling restarts, those numbers can be much higher. -- Ivan On Sun, Nov 23, 2014 at 5:48 PM, joergpra...@gmail.com joergpra...@gmail.com

Re: Specifying search fields in search request

2014-11-24 Thread Ivan Brusic
It all depends on how many fields and how big they are. Retrieving a few specific fields might be faster in cases, but in general, each field is another seek in Lucene. Values are not retrieved at the same time. If you are going to get all the fields, just use the source. -- Ivan On Sun, Nov

Re: terms filter with the value to match in upercase is not possible?

2014-11-23 Thread Ivan Brusic
A term query will not analyze the search terms, so if your countries field is using the default analyze, there will be no match since the standard analyzer will lowercase the terms. Either set your field as not_analyzed or use another query such as match. -- Ivan On Sat, Nov 22, 2014 at 4:35

Re: 1.4.0 data node can't join existing 1.3.4 cluster

2014-11-22 Thread Ivan Brusic
Great work everyone. Feel better about upgrading now. On Nov 22, 2014 4:42 PM, Boaz Leskes b.les...@gmail.com wrote: Hi Christian, Daniel, I believe I found the issue - it has to do with the cloud plugins (both AWS and GCE) and the way they create the node list for the unicast based

Re: 1.4.0 data node can't join existing 1.3.4 cluster

2014-11-21 Thread Ivan Brusic
Has an official issue been created? I would like to track the status. So far, every 1.x.0 release has been buggy. :) -- Ivan On Fri, Nov 21, 2014 at 4:06 AM, Mark Walkom markwal...@gmail.com wrote: It's being looked at, but I don't know much beyond that at the moment sorry. On 21 November

Re: Why ES node starts recovering all the data from other nodes after reboot?

2014-11-21 Thread Ivan Brusic
Disabling allocation helps, but it does not solve the problem completely. Just like Nik, one of my complaints (although not my primary one). :) I found that recovery gets easier when doing a rolling restart. First few servers always rebalance, the last ones do not. -- Ivan On Thu, Nov 20, 2014

Re: ES backups without using snapshots?

2014-11-20 Thread Ivan Brusic
node was chosen to minimise data loss following a data centre failure, however because of the risk of split brain, the node actually functions more of a warm DR than any sort of HA... Regards, Mat On Thursday, November 20, 2014 2:32:14 PM UTC+13, Ivan Brusic wrote: How many shards for each

Increased query count after moving to nested documents

2014-11-20 Thread Ivan Brusic
We have always indexed nested documents, but never fully used them since issue 3022 is still outstanding. Finally made the move to actually filtering documents at the nested level. Tracking metrics with graphite/grafana, I noticed immediately that the active/current query count is much higher

Re: Does nested query with operator honor the operator or does it always display some default behavior

2014-11-19 Thread Ivan Brusic
fields cause the operator to be honored ? and Does the operator field within a nested query depend on if the field in the nested field is actually analyzed or not. ? Ramdev On Tuesday, 18 November 2014 14:45:53 UTC-6, Ivan Brusic wrote: I have never seen the array syntax with the match

Re: Please help me to do a filtered query!!!

2014-11-19 Thread Ivan Brusic
Try using Jorg's plugin: https://github.com/jprante/elasticsearch-plugin-arrayformat -- Ivan On Wed, Nov 19, 2014 at 7:15 AM, tch...@360incentives.com wrote: On Wednesday, 19 November 2014 13:07:13 UTC+2, tch...@360incentives.com wrote: Hi Everyone! please advice me how return only

Re: Does nested query with operator honor the operator or does it always display some default behavior

2014-11-18 Thread Ivan Brusic
I have never seen the array syntax with the match query, so I am not sure what the behavior should be. Since your search terms are not analyzed in your example, a terms query with a minimum match of 100% should work. If not, perhaps creating a single search term of your existing terms? -- Ivan

Re: EsStorage - Set field as not_analyzed

2014-11-18 Thread Ivan Brusic
You can set up an index template without creating the index. The template will only be read when the index is created. -- Ivan On Tue, Nov 18, 2014 at 11:10 AM, Kingsley Elmes kingsleyel...@gmail.com wrote: Hi, Does anyone know is it possible to set a field as 'not_analyzed' via the

Re: 1.4.0 data node can't join existing 1.3.4 cluster

2014-11-13 Thread Ivan Brusic
Rolling upgrades should be supported: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/setup-upgrade.html#rolling-upgrades How else can you perform a rolling upgrade without having a mixed cluster? -- Ivan On Thu, Nov 13, 2014 at 1:05 PM, joergpra...@gmail.com

Re: Aggregating on nested fields

2014-11-12 Thread Ivan Brusic
query. I also like the fact that it allows aggregations to not know about the query. On Tue, Nov 11, 2014 at 5:27 PM, Ivan Brusic i...@brusic.com wrote: I suddenly remembered when using facets that I had to apply the same query filter as a facet filter with the join option disabled. Turns out

Re: Exact name match search - can't get it to work

2014-11-12 Thread Ivan Brusic
The terms are copied to the full name and are not analyzed as specified. However, two terms are being copied, not one. The term query expects a single token of Jeremy Smith, while you have two separate non analyzed tokens. Cheers, Ivan On Nov 12, 2014 10:29 AM, Robert Alkire

Re: Custom cluster action

2014-11-12 Thread Ivan Brusic
There is also an ActionModule public void onModule(ActionModule module) { module.registerAction(MyAction.INSTANCE, TransportMyAction.class); } It is always easier to follow existing plugins. Cheers, Ivan On Wed, Nov 12, 2014 at 3:50 PM, Pawel pro...@gmail.com wrote: Hi, I'm thinking

Netflix releases Elasticsearch automation tool

2014-11-12 Thread Ivan Brusic
Interesting in that they use Cassandra for discovery. http://techblog.netflix.com/2014/11/introducing-raigad-elasticsearch-sidecar.html -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails

Re: Aggregating on nested fields

2014-11-11 Thread Ivan Brusic
#3022 and related issues can bring about less ambiguous aggregations. Nested aggregations on pre-filtered nested documents should work as is. If not, the global scope aggregation should be used. -- Ivan On Mon, Nov 10, 2014 at 3:43 PM, Ivan Brusic i...@brusic.com wrote: Reproducible gist

Re: Integrated authentication

2014-11-11 Thread Ivan Brusic
Quite the opposite: the Elasticsearch team and others have said that authentication belongs outside of the application. Or at least, security was not a high priority. It seems like they are working on security and a release should be forthcoming:

Re: Filter cache - based on full set or result of previous filters?

2014-11-11 Thread Ivan Brusic
The status filter cache will indeed contain all entries. And technically, the cache is per segment, and not across all documents, but this should be transparent. Caching is enabled by default for the term filters, but disabled for the bool filter. You can enable it if you think users will be

Re: accidently ran another instance of elasticsearch on a few nodes

2014-11-10 Thread Ivan Brusic
To avoid this situation in the future, besides using a service to start elasticsearch, you can enforce the max nodes setting: node.max_local_storage_nodes: 1 -- Ivan On Sun, Nov 9, 2014 at 5:24 PM, Mark Walkom markwal...@gmail.com wrote: Yellow means unassigned replicas, try removing them

Re: Looking for a sexy solution for Aggregations

2014-11-10 Thread Ivan Brusic
The only solution that I can think of is to execute your query with a post filter, not a filtered query. In this way, your aggregations by default will not be filtered. You can then have two histograms, one with the post filter used as an aggregation filter, and the other one left alone. -- Ivan

Re: min_score doesn't seem to be working with _count api

2014-11-10 Thread Ivan Brusic
Just a guess, but I would assume that the count API does not score documents, which is why it is faster, leading to a setting such as min_score to be obsolete. -- Ivan On Mon, Nov 10, 2014 at 10:55 AM, Roly Vicaria roly...@gmail.com wrote: Also, I'm trying this on v1.3.2 On Monday,

Re: does there exists an exists query

2014-11-10 Thread Ivan Brusic
Off the top of my head, the easiest option would be to use a constant score query. Wrap the original query and provide a boost to documents that satisfy your exist filter. Cheers, Ivan On Mon, Nov 10, 2014 at 12:54 PM, Volker s...@klest.de wrote: I would like to know whether there is an

Re: Aggregating on nested fields

2014-11-10 Thread Ivan Brusic
Reproducible gist: https://gist.github.com/brusic/81e1552ffd49a1f6a7aa Surely I cannot be the only one to have encountered this issue. -- Ivan On Mon, Nov 10, 2014 at 12:53 PM, Ivan Brusic i...@brusic.com wrote: Is it possible to aggregate only on the nested documents that are returned

Re: SpanNotQuery issues

2014-11-07 Thread Ivan Brusic
Yes, that post explained it a lot better than I wanted to. :) But basically yes, the exclude portion is only as part of an existing span, but a single span term is not really a span. Ultimately, span queries are not very flexible since they do not analyze terms, which is why I suspect there are

Re: Distributed Frequency Search

2014-11-06 Thread Ivan Brusic
We did some performance testing and found that the performance hit from using DFS was minor. -- Ivan On Wed, Nov 5, 2014 at 8:55 AM, Sofiane Cherchalli sofian...@gmail.com wrote: Answering myself: According to ES blog

Re: SpanNotQuery issues

2014-11-06 Thread Ivan Brusic
is there 2 matches? exclude : { span_term : { field1 : dog } } I though we should exclude match with dog... Could you please point me to proper information to understand what is happening? Thx, Jade Le mercredi 22 août 2012 02:01:09 UTC-4, Ivan Brusic a écrit

Re: ElasticSearch enable Snowball Analyzer and Synonym on Fields

2014-11-06 Thread Ivan Brusic
You would need to create a custom analyzer by basically repeating the configuration of the snowball analyzer, but adding in the synonym filter. You can't modify a stock analyzer, unless this has changed (if so, someone please correct me). -- Ivan On Wed, Nov 5, 2014 at 6:43 PM, Iqbal Ahmed

Re: how to search non indexed field in elasticsearch

2014-11-06 Thread Ivan Brusic
You cannot search/filter on a non-indexed field. -- Ivan On Wed, Nov 5, 2014 at 11:45 PM, ramakrishna panguluri panguluri.ramakris...@gmail.com wrote: I have 10 fields inserted into elasticsearch out of which 5 fields are indexed. Is it possible to search on non indexed field? Thanks in

Re: Announce Mailing List

2014-11-06 Thread Ivan Brusic
An announce list would be awesome, but at least something to this list with the [ANN] or [ANNOUNCEMENT] prefix like David has been doing. Elasticsearch 1.4.0 and 1.3.5 were released, but there is no announcement on the list. Elasticsearch also announced a product called Shield, which should

Re: Bool Queries and MUST/SHOULD combinations

2014-11-04 Thread Ivan Brusic
:15 PM UTC-5, Ivan Brusic wrote: Must clauses are queries that must return a document. In the first query, any document returned MUST have a location of Germany. The valueType should clause is optional and actually pointless as a filter since it does not contribute to scoring. Can you explain

Re: Disabling default fields (_index, _type, _id, _score) in result list

2014-11-04 Thread Ivan Brusic
Are you using REST? If so, Jorg wrote a plugin to help with such a task: https://github.com/jprante/elasticsearch-arrayformat -- Ivan On Mon, Nov 3, 2014 at 8:36 AM, Lasse Schou lassesc...@gmail.com wrote: Hi, I want to know if it's possible to disable the _index, _type, _id and _score

Re: Bool Queries and MUST/SHOULD combinations

2014-11-03 Thread Ivan Brusic
Must clauses are queries that must return a document. In the first query, any document returned MUST have a location of Germany. The valueType should clause is optional and actually pointless as a filter since it does not contribute to scoring. Can you explain what your query should be doing in

Re: Using function_score error

2014-10-29 Thread Ivan Brusic
Mvel has been removed in recent versions of Elasticsearch due to security issues. Either change your script to use Groovy (preferred) or install the mvel plugin. Cheers, Ivan On Oct 29, 2014 2:44 PM, Manuel Sciuto msci...@viajeros.com wrote: Hello everyone Do not understand why it does not

Re: How is it calculated _score

2014-10-28 Thread Ivan Brusic
The default scoring algorithm is based on TF-IDF. http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/practical-scoring-function.html http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/scoring-theory.html You can enable explain to see how documents are scored:

Re: How is it calculated _score

2014-10-28 Thread Ivan Brusic
2014 16:47:56 UTC-3, Ivan Brusic escribió: The default scoring algorithm is based on TF-IDF. http://www.elasticsearch.org/guide/en/elasticsearch/guide/ current/practical-scoring-function.html http://www.elasticsearch.org/guide/en/elasticsearch/guide/ current/scoring-theory.html You can enable

Re: plan for river

2014-10-27 Thread Ivan Brusic
There is nothing magical about rivers. With some Java code changes, most rivers can be made to run as standalone Java processes. The only thing the rivers do is (weakly) guarantee that only one river instance is run per cluster. Cheers, Ivan On Mon, Oct 27, 2014 at 4:11 AM,

Re: Aggregation buckets, with an additional key:value inside.

2014-10-25 Thread Ivan Brusic
I maintain a mapping on the client side to due the lookups. Thankfully my taxonomy is static (but somewhat large). There is a PR to do server-side mappings, but I don't think it would apply to aggregations and is quite old. An alternative solution would be to create compound values such as

Re: Migration of 0.90.3 cluster to new cluster running 1.3.4

2014-10-24 Thread Ivan Brusic
Unless you are moving to new hardware, there is no need to rsync your data. Both Elasticsaerch 0.90.x and 1.3.x are based on Lucene 4, so the underlying data is compatible. Of course, you should backup your data before such an upgrade. After restarting your new cluster with your old data, I would

Re: OutOfMemory

2014-10-23 Thread Ivan Brusic
There is more to the issue than merely your configuration. What are your queries? Are you doing a lot of aggregations, especially on on high-cardinality fields. What kind of hardware are you running now? Using the API, looks at your field cache usage. The field cache is held within the Java heap

Re: Shard Recommendation for Elasticsearch

2014-10-19 Thread Ivan Brusic
? And also, is this safe? On Saturday, October 18, 2014 2:41:50 PM UTC-4, Ivan Brusic wrote: The number of shards will help you scale out in case you add more nodes in the future. With your current shard count at 5, you cannot optimally deploy and distribute a 6+ node cluster. However, your

Re: running master nodes on application servers

2014-10-18 Thread Ivan Brusic
Is there some reason why your Elasticsearch nodes cannot serve as both master and data? I believe that dedicated master nodes should only come into play with large clusters, way beyond the 3 you have. If your master nodes are tied to your app nodes, then I believe you will have less resiliency

Re: Filters: odd behavior

2014-10-18 Thread Ivan Brusic
The structure of your query is odd. Either it is some format that I am not aware of or the Elasticsearch parser is not doing a good job at determining it is invalid. Your two filters should be joined via a bool filter. Sometime like (not tested): { query: { filtered: { query: {

Re: Shard Recommendation for Elasticsearch

2014-10-18 Thread Ivan Brusic
The number of shards will help you scale out in case you add more nodes in the future. With your current shard count at 5, you cannot optimally deploy and distribute a 6+ node cluster. However, your data is time-based, one per day. Are queries on historical data important? I would start off with a

Re: Get only ids with no source Java API

2014-10-17 Thread Ivan Brusic
Have you tried setting no fields to be returned or the explicit setNoFields() method? http://jenkins.elasticsearch.org/job/Elasticsearch%20Master%20Branch%20Javadoc/Elasticsearch_API_Documentation/org/elasticsearch/action/search/SearchRequestBuilder.html#setNoFields() -- Ivan On Thu, Oct 16,

Re: Filters: odd behavior

2014-10-17 Thread Ivan Brusic
They are indeed executed in the defined order. Filters that are more specific should be placed early on and those that cannot be cached (geo/timebased) should be placed last. Cheers, Ivan On Thu, Oct 16, 2014 at 5:16 AM, @mromagnoli marce.romagn...@gmail.com wrote: Hi everyone, I have a

Re: Update similarity measure for existing index

2014-10-09 Thread Ivan Brusic
You cannot change the similarity on an existing index. There is no technical measure why it could not occur, it appears to be simply a method in place to prevent users from creating potentially huge errors. I say that developers should have the option to shoot themselves in the foot! Cheers,

Re: Filter by specific value without mapping

2014-10-07 Thread Ivan Brusic
The field do not need a custom analyzer, they just need to be simply marked as non_analyzed. You can setup a dynamic template that states any new field should be non analyzed. http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-root-object-type.html#_dynamic_templates

Re: Understanding doc_values?

2014-10-07 Thread Ivan Brusic
Perhaps it is easier to talk about the downsides of doc_values. If you have slow disks, common when using low level VMs with shared disks, then retrieving your data will be much slower. Also, you cannot filter on doc_values fields, so it depends on your other use cases. The amount field seems

Re: Pattern replace apostrophes?

2014-10-07 Thread Ivan Brusic
What type of query are you using? Perhaps the query you are using is not using the same analyzer at search time. -- Ivan On Tue, Oct 7, 2014 at 6:06 AM, Lee Gee lee...@gmail.com wrote: My users have issues with apostrophes: I need to index and search aaa's as it is, and without the

Re: Recommendation on reading the heart of the code.

2014-10-06 Thread Ivan Brusic
There is no voting or other gamification, just a plain ol' mailing list. Many of us respond as just another way to contribute to open-source. -- Ivan On Mon, Oct 6, 2014 at 3:37 AM, ahmed jamal maaz jamalahmedm...@gmail.com wrote: Hi all, These are very good advises. I really appreciate it.

Re: Recommendation on reading the heart of the code.

2014-10-05 Thread Ivan Brusic
The code is difficult to debug due to the distributed nature of Elasticsearch. Requests get serialized and are sent via a binary protocol, so you cannot focus on specific classes. Dependency injection adds in additional complexity. You cannot simply determine the construction time values in the

Re: recent elasticsearch vs solrcloud comparison ?

2014-10-03 Thread Ivan Brusic
I have not looked at that video, but most comparisons are with Solr and not SolrCloud. -- Ivan On Oct 3, 2014 12:37 PM, Gaurav gupta gupta.gaurav0...@gmail.com wrote: Kevin, I found the recent comparision from the search experts @

Re: Upgrading from very old version of ES with zero down time

2014-10-02 Thread Ivan Brusic
Your indices should be fine as is. Lucene is guaranteed to be able to read data from 1 major revision prior. Elasticsearch 0.20 is Lucene 3 and the latest Elasticsearch is Lucene 4. Because of various bugs at the Lucene level, you should run an optimize (normally discouraged) to upgrade the

Re: Error starting up Elasticsearch

2014-09-29 Thread Ivan Brusic
Seems like a version mismatch. What versions of elasticsearch/logstash are you using? Are you using the 'elasticsearch' output in logstash or 'elasticsearch_http'? Try using the latter. -- Ivan On Mon, Sep 29, 2014 at 1:24 PM, larrychu...@gmail.com wrote: I get this in the logs when starting

Re: issue with elastic search TransportClient of java API

2014-09-26 Thread Ivan Brusic
In general, newer client libraries should not be used with older clusters. Most of the version checking happens on the server side and the older code does not know about the newer client. -- Ivan On Fri, Sep 26, 2014 at 9:54 AM, David Pilato da...@pilato.fr wrote: I have no idea. Could be an

Re: Unnecessary Cache Eviction Explained

2014-09-23 Thread Ivan Brusic
Otis, from what I understand, the default size for the cache is unbounded, so cache eviction should not occur due to inconsistent range checks in the default case. -- Ivan On Mon, Sep 22, 2014 at 9:27 PM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Hi, It sounds like every single ES

Re: Problem with word-separators in bool search with standard tokenizer

2014-09-22 Thread Ivan Brusic
The query string query is working because the ampersand is also being stripped from the query. Your best bet is to use the pattern tokenizer and explicitly define which characters to split the input text on.

Re: Custom Collector using a plugin

2014-09-19 Thread Ivan Brusic
You basically want to create your own aggregation, which are basically collectors at the Lucene level. Look at existing plugins which provide custom aggregation. Basically, elasticsearch uses a scatter-gather/map-reduce model for distributed collections. -- Ivan On Sep 18, 2014 12:56 AM, tim

Re: Boosting a type

2014-09-17 Thread Ivan Brusic
I have yet to switch over to groovy, so I can't comment on where your current script is wrong (it looks good to me as well). However, you can use the standard function score, which are easier to understand and do not rely on scripting (technically better performance).

Re: Linking of query/search

2014-09-12 Thread Ivan Brusic
You cannot join documents in Lucene/Elasticsearch (at least not like a RDBMS). You would need to either denormalize your data, join on the client side or execute 2+ queries. -- Ivan On Fri, Sep 12, 2014 at 12:45 AM, matej.zerov...@gmail.com wrote: Hello! Can anyone shine some light on my

Re: Do I need the JDBC driver

2014-09-12 Thread Ivan Brusic
I would strongly prefer to maintain control of the indexing side and not in Elasticsearch. In fact, the Elasticsearch team has talked about deprecating river plugins. I do not have any numbers, but I would suspect that the majority of users do not use a river plugin. And yes, the correct term is

Re: Do I need the JDBC driver

2014-09-12 Thread Ivan Brusic
new to this so I find some of the information hard to understand. So sorry if I am asking stupid questions. On 12 Sep 2014, at 18:26, Ivan Brusic i...@brusic.com wrote: I would strongly prefer to maintain control of the indexing side and not in Elasticsearch. In fact, the Elasticsearch team has

Re: Elasticsearch 1.4.0 release data?

2014-09-10 Thread Ivan Brusic
I think this release might be their biggest one since 1.0. Lots of big changes including a change in the consensus algorithm. It might take time, but that is only a guess. -- Ivan On Wed, Sep 10, 2014 at 2:57 AM, joergpra...@gmail.com joergpra...@gmail.com wrote: I use the Github issue

Re: Aggregation framework, Java API

2014-09-09 Thread Ivan Brusic
filter. It seems strange to me to use the FilteredQuery when the query string is empty, but perhaps that would be the most straight forward way of doing this. thank you, emanuel Dne pondělí, 8. září 2014 17:21:21 UTC+2 Ivan Brusic napsal(a): Which filter was ignored? I am assuming you meant

Re: Faster sloppy phrase queries

2014-09-09 Thread Ivan Brusic
Hopefully Mike McCandless will get some of the new Lucene features into Elasticsearch: http://blog.mikemccandless.com/2014/08/a-new-proximity-query-for-lucene-using.html I suspect it will come soon. -- Ivan On Mon, Sep 8, 2014 at 2:11 PM, Nikolas Everett nik9...@gmail.com wrote: On Mon,

Re: elasticsearch Java API for function_score query

2014-09-09 Thread Ivan Brusic
Malini, I would suggest starting a new thread instead of adding to an old one. I find the Java API for the boost functions to be confusing, or at least, not as clean as the rest of the Java API. I wonder if the Elasticsearch team would accept a PR. Jörg's example above could be used as a skeleton

Re: Aggregation framework, Java API

2014-09-08 Thread Ivan Brusic
Which filter was ignored? I am assuming you meant the post filter (which might be still called filter at the Java API), which in this case the filter is bypassed by design. Post filters allow you to filter the documents returned, but leave the aggregations as is. Sounds like you are looking for

Re: should ES_HEAP_SIZE be less than 31G?

2014-09-04 Thread Ivan Brusic
On Wed, Sep 3, 2014 at 11:47 AM, joergpra...@gmail.com joergpra...@gmail.com wrote: ES scales best over multiple machines horizontally, not vertically. More RAM does not automatically mean better performance at linear scale at a certain point - it depends on the JVM if it can keep up.

Re: Is it possible to add yet another score value based on similarity (same words) to differentiate between two _scores ?

2014-09-04 Thread Ivan Brusic
Can you simply boost the non analyzed field? If the scores are still too similar, try using a dis_max query with the non analyzed query getting a higher boost: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-dis-max-query.html -- Ivan On Wed, Sep 3, 2014 at 7:16

Re: How do I remove _index, _type, _id and _score from output?

2014-09-04 Thread Ivan Brusic
There is a plugin which can help: https://github.com/jprante/elasticsearch-index-termlist -- Ivan On Wed, Sep 3, 2014 at 11:47 AM, David Pilato da...@pilato.fr wrote: I don't think you can as far as I remember the same thread about it some months ago. -- David ;-) Twitter : @dadoonet /

Re: Learning optimal boost weight [ML]

2014-09-04 Thread Ivan Brusic
I have something similar which uses search analytics to determine relevant filters. No plugin or framework since everything works on the client side during the creation of the query. The process is far from ideal and is currently very conservative, providing only a slight boost. It does not work

Re: should ES_HEAP_SIZE be less than 31G?

2014-09-03 Thread Ivan Brusic
The actual limitation in Java is compressed pointers: http://docs.oracle.com/javase/7/docs/technotes/guides/vm/performance-enhancements-7.html#compressedOop Thankfully Elasticsearch can run multiple nodes on the same server. Just pay attention to the direct size (off heap memory), mlockall

Re: Looking for Elasticsearch projects

2014-09-03 Thread Ivan Brusic
Working on it is my full time job so review would be quick Nik On Sep 2, 2014 6:51 PM, Ivan Brusic i...@brusic.com wrote: For those that are not regulars on the mailing list, I am a fairly active member that has used Elasticsearch for years. I am leaving my full-time job to focus on other (techie

Re: Exists filter does not respect must_not bool filter

2014-09-03 Thread Ivan Brusic
Is giving.assignee a sub-object or a nested document? Can you provide your mapping? Use the mapping API for exact results ( http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-get-mapping.html ) Perhaps enabling explain would provide some hints, -- Ivan On Wed, Sep

Re: Looking for Elasticsearch projects

2014-09-03 Thread Ivan Brusic
ago. Would love to work on something in that regard. Cheers, Ivan On Wed, Sep 3, 2014 at 2:16 PM, Itamar Syn-Hershko ita...@code972.com wrote: On Thu, Sep 4, 2014 at 12:10 AM, Ivan Brusic i...@brusic.com wrote: Thanks Jörg. The incentives for an open-source project is to pad my resume

Re: Dynamic mapping stops at a field called _id

2014-09-03 Thread Ivan Brusic
The _id field is one of the few reserved field names in Elasticsearch: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-id-field.html You can set it to whatever you want, as long as it is not an object or (empty) array, as in your case. I have no idea what the proper

  1   2   3   4   5   6   >