Re: Issue with facets and not_analyzed in mapping.

2014-03-04 Thread Ivan Brusic
Judging by the output, the genre field is analyzed using the default analyzer. Others can help debug if you provide your mapping. It is best to use the get mapping API [1] since it shows what you actually have instead of what you supplied at index creation. Depending on your use case, you might

Re: JDK 7 Issues Question

2014-03-04 Thread Ivan Brusic
The vectorization issue is not constrained to OpenJDK, and is still present in 7u45: https://twitter.com/thetaph1/status/423523708708208640 On Tue, Mar 4, 2014 at 3:52 PM, InquiringMind brian.from...@gmail.comwrote: Jörg, Just to clarify: The links below point to OpenJDK, not to the Oracle

Re: Configure elasticsearch to query files on file system

2014-03-04 Thread Ivan Brusic
The zip download available on github is not what you want. The format required for plugins is different from the source download found on github. Since it appears that you do not have download access, as Roland mentioned, your last option is to clone the project in git and build it yourself with

Re: Apply synonyms that include confidence weights

2014-03-03 Thread Ivan Brusic
You can always disable term frequencies on a field to eliminate the td-idf issue, but then scoring would be affected and perhaps be more detrimental than the original problem. The standard solution in Lucene is to use payloads, which is metadata associated with a term in the index. The synonym

Re: Apply synonyms that include confidence weights

2014-03-03 Thread Ivan Brusic
? Or can this be achieved with scripts in ES? Thanks for your help, Jake On Monday, March 3, 2014 7:36:26 AM UTC-8, Ivan Brusic wrote: You can always disable term frequencies on a field to eliminate the td-idf issue, but then scoring would be affected and perhaps be more detrimental than

Re: JDK 7 Issues Question

2014-03-03 Thread Ivan Brusic
I did some stress tests with different versions of the JDK using a test cluster running on VMs (which only contains a smaller amount of data than our production cluster), and I was never able to uncover any issues. However, since I do not like getting called in the middle of the night to fix a

Re: [Book] Mastering ElasticSearch Review

2014-02-25 Thread Ivan Brusic
I purchased the book when Packt was having a $5 ebook sale a couple of months ago. Did not really need the book, but it was cheap and I wanted to support the author who has posted on the mailing list in the past. Overall a decent book, recommended for anyone getting started with Elasticsearch. My

Re: Compute TF/IDF across indexes

2014-02-25 Thread Ivan Brusic
I have never tried or looked at the code, but off the top of my head perhaps the DFS query type would work: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-search-type.html#dfs-query-then-fetch Since the DFS query type calculates the TF/IDF values based on the

Re: Elasticsearch 1.0.0 is now GA

2014-02-25 Thread Ivan Brusic
I do not use quotes at all. Simply: node.name: ${HOSTNAME} -- Ivan On Tue, Feb 25, 2014 at 7:56 AM, InquiringMind brian.from...@gmail.comwrote: I always start Elasticsearch from within my own wrapper script, es.sh. Inside this wrapper script is the following incantation:

Re: Put mapping documentation -- What options are available? Specifically, how to store a property but without indexing it?

2014-02-25 Thread Ivan Brusic
Luke? :) On Tue, Feb 25, 2014 at 1:09 PM, Daniel Winterstein daniel.winterst...@gmail.com wrote: Dear Hariharan, Alex, Luke, My apologies. You're quite right. The information is there -- I just didn't read far enough down. Thank you for your help persistence. Best regards, - Daniel

Re: Compute TF/IDF across indexes

2014-02-25 Thread Ivan Brusic
Ivan, The DFS query then fetch worked very well! Thank you! Cheers, Luiz Guilherme On Tue, Feb 25, 2014 at 5:15 PM, Ivan Brusic i...@brusic.com wrote: I have never tried or looked at the code, but off the top of my head perhaps the DFS query type would work: http://www.elasticsearch.org

Re: Efficiency of GET by doc id

2014-02-21 Thread Ivan Brusic
David has the ideal solution, however just wanted to point out one key difference in the get API is that it would look for a document in the transaction log before it looks in the index. So if you execute a search query after an insert, but before a refresh/flush, you will not see the changes.

Re: How to specify execution order of filter and query?

2014-02-19 Thread Ivan Brusic
string based queries by filtering to a subset of documents first! On Tuesday, February 18, 2014 3:19:50 PM UTC-6, Ivan Brusic wrote: The documentation suddenly made me doubt if we I knew was wrong. :) The default strategy for Elasticsearch's filtered query is a custom random access one. For each

Re: can't order by _boost field, even when index:not_analyzed

2014-02-19 Thread Ivan Brusic
The boost field, like other special fields such as timestamp and id, is set at the root level of a type, not as a property: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-boost-field.html Perhaps I should have added this confusion as yet another reason why to avoid

Re: How to specify execution order of filter and query?

2014-02-18 Thread Ivan Brusic
Hi Ivan (awesome name BTW), Read my recent reply about filters: https://groups.google.com/forum/?fromgroups#!topic/elasticsearch/xZGnyTI6lmo Basically the filter in a filtered query is executed before the query. IMHO that documentation is misleading. Use post filters (simply 'filter' before 1.0)

Re: How to specify execution order of filter and query?

2014-02-18 Thread Ivan Brusic
. The original description of the filtered query comes directly from Lucene: http://lucene.apache.org/core/4_6_0/core/org/apache/lucene/search/FilteredQuery.html Cheers, Ivan On Tue, Feb 18, 2014 at 1:02 PM, Ivan Brusic i...@brusic.com wrote: Hi Ivan (awesome name BTW), Read my recent reply

Re: Charfilter support in analysis api

2014-02-17 Thread Ivan Brusic
Since your code is running as the same JVM as Elasticsearch, you have a few more options. Depending on your use case, you can bypass the analysis API and use the analyzers directly. Look at the test code in the analysis package, in particular the AnalysisModuleTests class. The key is building the

Re: Charfilter support in analysis api

2014-02-17 Thread Ivan Brusic
The change seemed simple, so I went ahead and implemented it: https://github.com/elasticsearch/elasticsearch/pull/5148 The tricky part is getting the Elasticsearch team to notice. :) -- Ivan On Mon, Feb 17, 2014 at 2:13 PM, Ivan Brusic i...@brusic.com wrote: Since your code is running

Re: Automatic Range Filter. Possible ?

2014-02-14 Thread Ivan Brusic
There is no automatic bucketing in Elasticsearch. I mimic the behavior with an expensive process that uses many smaller fixed ranges which are reduced into the number of buckets needed on the client side. Easily the slowest part of my query. My goal was to wait for the facet refactor (which has

Re: Automatic Range Filter. Possible ?

2014-02-14 Thread Ivan Brusic
Here is a previous discussion on Rice/Sturges: https://groups.google.com/forum/#!msg/elasticsearch/CAZhIHtB1UI/Exzd2_DanbAJ Never did sit down and finally understand the paper Jörg linked. :) I really should find the time to revisit the issue since my implementation is costly. Ivan On Fri, Feb

Re: NumberFormatException when sorting by numeric document ID

2014-02-13 Thread Ivan Brusic
I doubt this issue will ever be fixed since the limitation exists in Lucene. All types belong to the same index and a field's data needs to be uniform in Lucene's eyes. A document's type is used to indicate different mappings for a document, but not different ways to segment the data types in the

Re: Suggestion: DistanceUnit.NAUTICALMILES is a worthy addition

2014-02-12 Thread Ivan Brusic
I'm glad I was able to steer you in the right direction. I flubbed a PR recently since I have not used git consistently in the past few years, so I am glad someone else can learn from my mistakes. Your PR seemed to have gained some attention! :) Ivan On Tue, Feb 11, 2014 at 1:17 PM,

Re: setFilter in Java API

2014-02-12 Thread Ivan Brusic
The documentation has not been correct for version 1.0 [1]. The method should be now called setPostFilter. Better yet, you should look into filtered queries [2]. [1] https://github.com/elasticsearch/elasticsearch/pull/4461 [2]

Re: setFilter in Java API

2014-02-12 Thread Ivan Brusic
(FilterBuilders.termFilter(multi, test)) 3) setQuery(QueryBuilders.filteredQuery(QueryBuilders.matchAllQuery(), FilterBuilders.termFilter(multi, test))) I'm guessing 3? But the matchAllQuery() makes it feel like that is wrong... On Wed, Feb 12, 2014 at 9:25 PM, Ivan Brusic i...@brusic.com wrote

Re: Data Loss

2014-02-12 Thread Ivan Brusic
On Wed, Feb 12, 2014 at 1:58 PM, joergpra...@gmail.com joergpra...@gmail.com wrote: For my requirements, downtime of 15 min is acceptable. I can only wish! I run an ecommerce site, so my requirement is no downtime. Ever. -- Ivan -- You received this message because you are subscribed to

Re: Odd hot MVEL

2014-02-11 Thread Ivan Brusic
Great catch. Which Elasticsearch version and which JDK? Thankfully my documents are uniform, so I have been able to skip isEmpty checks. -- Ivan On Tue, Feb 11, 2014 at 7:52 AM, Nikolas Everett nik9...@gmail.com wrote: Sorry to resurrect a dead thread, but I figured it out:

Re: One particular value in a field isn't indexed

2014-02-11 Thread Ivan Brusic
Very rookie problem. :) The default (aka standard) analyzer uses a stopword filter and it is a stopword. Try configuring your field with a custom analyzer which does not use stopwords or a custom set of stopwords. Cheers, Ivan On Tue, Feb 11, 2014 at 7:57 AM, felix.kof...@gameforge.de wrote:

Re: One particular value in a field isn't indexed

2014-02-11 Thread Ivan Brusic
are faster, can be cached and do not influence scoring. Cheers, Ivan On Tue, Feb 11, 2014 at 9:00 AM, Ivan Brusic i...@brusic.com wrote: Very rookie problem. :) The default (aka standard) analyzer uses a stopword filter and it is a stopword. Try configuring your field with a custom analyzer which

Re: scoring on a multi_field

2014-02-11 Thread Ivan Brusic
Try setting use_dis_max to false in your query. http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-query-string-query.html#_multi_field_2 Cheers, Ivan On Tue, Feb 11, 2014 at 1:15 AM, Alexander Ott alexander.ott...@gmail.comwrote: Hallo, i have the same

Re: Suggestion: DistanceUnit.NAUTICALMILES is a worthy addition

2014-02-11 Thread Ivan Brusic
Create a branch for your changes. Submit a PR from the branch and not master. Make sure to update DistanceUnitTests.java as well. The trickiest part is getting the Elasticsearch team to notice your PR. :) They must be super busy with the 1.0 release. Lots of tutorials online:

Re: get MapperParsingException failed to parse in 0.90.10

2014-02-11 Thread Ivan Brusic
What is your current mapping? Use the GetMapping API. The file field is an inner object, but you do not have one defined in your mapping. Very likely you already have indexed a document with the file field as another type. -- Ivan On Tue, Feb 11, 2014 at 7:12 AM, Stefan Sabolowitsch

Re: get MapperParsingException failed to parse in 0.90.10

2014-02-11 Thread Ivan Brusic
That is your template. Use the Get Mapping API to find out what actually is in effect. http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-get-mapping.html On Tue, Feb 11, 2014 at 12:17 PM, Stefan Sabolowitsch sabolowitsc...@in-trier.de wrote: Hi Ivan, thanks for

Re: Computing idf in elasticsearch

2014-02-11 Thread Ivan Brusic
Oops, obvious answer. :) I see questions about incorrect TFIDF scores and my mind automatically goes to DFS scoring (which is actually about TF, not IDF). -- Ivan On Tue, Feb 11, 2014 at 10:22 AM, Binh Ly b...@hibalo.com wrote: Also be aware that the log should be a natural log, i.e. the

Re: Recommended jre version/update for Elasticsearch 1.0.0.Beta2

2014-02-10 Thread Ivan Brusic
My understanding was that the scope of the JFK/Guava bug was limited and did not affect Lucene/Solr. Elasticsearch uses Guava for collections and caching, not sure about reflection. -- Ivan joergpra...@gmail.com wrote: Do not use Java 7u51, Lucene bug still not fixed

Re: Recommended jre version/update for Elasticsearch 1.0.0.Beta2

2014-02-10 Thread Ivan Brusic
is lucky... Jörg On Mon, Feb 10, 2014 at 4:00 PM, Ivan Brusic i...@brusic.com wrote: My understanding was that the scope of the JFK/Guava bug was limited and did not affect Lucene/Solr. Elasticsearch uses Guava for collections and caching, not sure about reflection. -- You received

Re: Suggestion: DistanceUnit.NAUTICALMILES is a worthy addition

2014-02-10 Thread Ivan Brusic
Aircraft use nautical miles? You learn something new every day! -- Ivan On Mon, Feb 10, 2014 at 3:21 PM, InquiringMind brian.from...@gmail.comwrote: Would it be reasonable to create an issue to request nautical miles (nm as the abbreviation) for the DistanceUnit enumeration? This would

Re: Recommended jre version/update for Elasticsearch 1.0.0.Beta2

2014-02-09 Thread Ivan Brusic
The issues that plague 7u40 and 7u45 are related to Lucene and not directly to Elasticsearch. Lucene is being continually being built with Java 7u51 and 7u60 and I do not think any major issues have been found yet. There are several Lucene committers on this list, so maybe they will chime in (I

Re: how to find precision

2014-02-05 Thread Ivan Brusic
Relevancy is very context dependent. Only you have the knowledge to determine if a result is relevant for the query or not. Lucene/elasticsearch's default TFIDF algorithm should theoretically give you the relevant results. Do the results change if you tweak your query slightly? Recall is even

Re: Fault Tolerance Fallacy

2014-02-05 Thread Ivan Brusic
implement RAFT or something similar, I was there and heard that. Otis -- Performance Monitoring * Log Analytics * Search Analytics Solr Elasticsearch Support * http://sematext.com/ On Wed, Feb 5, 2014 at 2:58 PM, Ivan Brusic i...@brusic.com wrote: Otis, you can listen to the comment here

Re: how to calculate relevancy by the help of precision and recall

2014-02-04 Thread Ivan Brusic
Interesting topic. Not elasticsearch specific, but nevertheless interesting. One method to calculate relevancy given the precision and recall of a query is by using the F1 score: http://en.wikipedia.org/wiki/F1_score F1 would be equal to 2 * (P * R) / (P + R), where P is the precision and R is

Re: Excluding multiple documents from query

2014-02-03 Thread Ivan Brusic
Try using a ids filter combined with a not filter instead; http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-ids-filter.html Filters are faster, cacheable (if desired) and will not influence scoring. If you do not want the excluded ids to be added to the facet

Re: Elasticsearch gives failed to send join request to master on startup

2014-02-03 Thread Ivan Brusic
So the versions of 192.168.100.210 and 192.168.100.90 are different? Although elasticsearch is supposed to support cluster that differ between minor versions, this scenario is no longer true with later versions of 0.90. -- Ivan On Mon, Feb 3, 2014 at 6:02 AM, Sambhav Sharma

Re: Custom Score Query not working in elasticsearch-0.90.3

2014-01-31 Thread Ivan Brusic
? On Fri, Jan 31, 2014 at 5:53 AM, Ivan Brusic i...@brusic.com wrote: BTW, if you do not want to upgrade, you can also simply maintain two fields. One to use as a document boost (the _boost) and another to use for scripting/sorting/faceting. -- Ivan On Thu, Jan 30, 2014 at 4:00 PM, Ivan

Re: What will be the equivalent query of the following java api ?

2014-01-31 Thread Ivan Brusic
Not sure what behavior you are observing since you did not add any results. One thing however is you are essentially boosting everything twice. Once with a document boost and then again with the custom score. Remove the document boost. -- Ivan On Fri, Jan 31, 2014 at 12:49 PM, Mukul Gupta

Re: Stopping and Staring a big cluster : best practice?

2014-01-31 Thread Ivan Brusic
I would add to flush the transaction log after you have indexed all your content. -- Ivan On Fri, Jan 31, 2014 at 4:57 PM, joergpra...@gmail.com joergpra...@gmail.com wrote: Shutdown: curl -XPOST node:9200/_shutdown In the latest versions (1.0.0.RC1) ES shutdown chooses a strategy in

Re: Are query scores ordinal for different documents across queries?

2014-01-30 Thread Ivan Brusic
Query scores are not ordinal. Lucene discourages comparing scores from different queries since the search context is important in scoring. Query norms are supposed to help normalize scores across queries, but as you discovered, that is not normally what happens. Your theory will probably hold true

Re: Custom Score Query not working in elasticsearch-0.90.3

2014-01-30 Thread Ivan Brusic
I fixed this bug a while ago and it should be fixed in versions 0.90.6 and higher. https://github.com/elasticsearch/elasticsearch/issues/3752 https://github.com/elasticsearch/elasticsearch/pull/2913 Ivan On Thu, Jan 30, 2014 at 11:51 AM, Binh Ly b...@hibalo.com wrote: Coder, I've verified

Re: Custom Score Query not working in elasticsearch-0.90.3

2014-01-30 Thread Ivan Brusic
BTW, if you do not want to upgrade, you can also simply maintain two fields. One to use as a document boost (the _boost) and another to use for scripting/sorting/faceting. -- Ivan On Thu, Jan 30, 2014 at 4:00 PM, Ivan Brusic i...@brusic.com wrote: I fixed this bug a while ago and it should

Re: ProcessClusterEventTimeoutException in Elasticsearch. Is this timeout value configurable? If yes how?

2014-01-29 Thread Ivan Brusic
It appears that this value is not configurable. IMHO, it should be, so perhaps you should open an issue on Github and see if the elasticsearch team agrees. Better yet, submit a pull request. :) -- Ivan On Tue, Jan 28, 2014 at 1:08 PM, Ahaduzzaman Munna ahaduzzaman.mu...@gmail.com wrote:

Re: Marvel behind Nginx and https

2014-01-28 Thread Ivan Brusic
Marvel was just announced today (to me at least), and there already is a question/issue? :) Let us know how it is working out. I am assuming that the elasticsearch team has Marvel as a private repo, which means you can't post issues. -- Ivan On Tue, Jan 28, 2014 at 7:45 AM, J. Schulz

Re: Modifying scoring algorithm during search operations

2014-01-28 Thread Ivan Brusic
) .endObject() .startObject(totalexp) .field(store, yes) .endObject() .endObject() .endObject() .endObject() .endObject(); --- Hiro On Monday, 27 January 2014 23:50:41 UTC+5:30, Ivan Brusic wrote

Re: Add a custom lucene search filter to elasticsearch

2014-01-28 Thread Ivan Brusic
Correct, create a plugin. Take a look at some of the existing analysis plugins as a template: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-plugins.html#analysis-plugins -- Ivan On Tue, Jan 28, 2014 at 8:48 AM, Guilhem Legal guilhem.le...@geomatys.comwrote:

Re: Complex or impossible query

2014-01-28 Thread Ivan Brusic
What you are probably looking for is field collapsing, which is not yet supported in elasticsearch (it is planned). You can use a term facet to retrieve the count for all terms and then do a separate query for each unique term. In addition to the slowness of having to do multiple queries, you

Re: Add a custom lucene search filter to elasticsearch

2014-01-28 Thread Ivan Brusic
Jörg, what is the purpose of tweaking the IndexQueryParserService? I have a few custom filters, and all I do is add them via tokenFiltersBindings.processTokenFilter() or analysisModule.addTokenFilter(). -- Ivan On Tue, Jan 28, 2014 at 9:40 AM, joergpra...@gmail.com joergpra...@gmail.com

For those wanting to try Marvel

2014-01-28 Thread Ivan Brusic
It is only supported with Elasticsearch 0.90.8 and higher. Yet another push for me to upgrade! -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to

Re: For those wanting to try Marvel

2014-01-28 Thread Ivan Brusic
...@campaignmonitor.com web: www.campaignmonitor.com On 29 January 2014 09:37, Ivan Brusic i...@brusic.com wrote: It is only supported with Elasticsearch 0.90.8 and higher. Yet another push for me to upgrade! -- You received this message because you are subscribed to the Google Groups

Re: For those wanting to try Marvel

2014-01-28 Thread Ivan Brusic
: ma...@campaignmonitor.com web: www.campaignmonitor.com On 29 January 2014 09:51, Ivan Brusic i...@brusic.com wrote: From my log: [2014-01-28 14:31:01,915][WARN ][marvel.agent ] Elasticsearch version [0.90.2] is too old. Marvel is disabled (requires version 0.90.8 or higher

Re: For those wanting to try Marvel

2014-01-28 Thread Ivan Brusic
Engineer Campaign Monitor email: ma...@campaignmonitor.com web: www.campaignmonitor.com On 29 January 2014 09:51, Ivan Brusic i...@brusic.com wrote: From my log: [2014-01-28 14:31:01,915][WARN ][marvel.agent ] Elasticsearch version [0.90.2] is too old. Marvel is disabled

Re: For those wanting to try Marvel

2014-01-28 Thread Ivan Brusic
Marvel doesn't different much from elasticsearch plugins in that the code now runs in the same JVM instead of a separate process. The event data is pushed rather than pulled. It is great not having to re-invent the wheel, but having monitoring outside of elasticsearch is not an issue. Great

Re: Modifying scoring algorithm during search operations

2014-01-27 Thread Ivan Brusic
For the third rule, you can omit index norms for a field which will prevent length normalization. See [1]. The option is either called omit_norms or norms.enabled depending on your version. For the second rule, it is slightly more complicated. You can define your own custom similarity [2] that

Re: Any possibility of permalinks to documentation for each version?

2014-01-27 Thread Ivan Brusic
Here are the current branches: http://www.elasticsearch.org/guide/en/elasticsearch/reference/index.html On Mon, Jan 27, 2014 at 11:53 AM, Ivan Brusic i...@brusic.com wrote: The documentation is now versioned. For example (random page) 0.90: http://www.elasticsearch.org/guide/en

Re: Fault Tolerance Fallacy

2014-01-26 Thread Ivan Brusic
://speakerdeck.com/benbjohnson/raft-the-understandable-distributed-consensus-protocol Jörg On Sat, Jan 25, 2014 at 7:26 PM, Ivan Brusic i...@brusic.com wrote: Unfortunately the problem will still not be addressed in the upcoming 1.0 release. Judging by recent comments from the elasticsearch team

Re: Fault Tolerance Fallacy

2014-01-25 Thread Ivan Brusic
Unfortunately the problem will still not be addressed in the upcoming 1.0 release. Judging by recent comments from the elasticsearch team, they are truly looking into the matter. The change will require changing the underlying consensus algorithm to something like Paxos/Raft. That said, the

Re: Different boosts to different fields

2014-01-24 Thread Ivan Brusic
Are you omitting norms by any chance on your fields? -- Ivan On Thu, Jan 23, 2014 at 12:32 PM, georgi.mat...@jobvector.com wrote: What I want to do is do a fuzzy (like this) query across multiple fields where matches in certain fields should result in a higher score than matches in other

Re: Autogeneration of the schema from the data. Plus I know that one field is a parent/child relationship. How to deal with that?

2014-01-24 Thread Ivan Brusic
Can you establish a naming convention for the parent fields? You can probably use index templates: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-templates.html -- Ivan On Thu, Jan 23, 2014 at 1:55 PM, Olivier Rossel olivier.ros...@gmail.comwrote: I mostly use

Re: Question on elasticsearch-zookeeper resiliency

2014-01-24 Thread Ivan Brusic
You can run zookeeper as its own cluster and not just a singleton. At this point, you will now have an elasticsearch cluster and a zookeeper cluster! Are you experiencing split brain issues? You might be better off determine how to avoid situations that lead to split brain. Cheers, Ivan On

Re: Too Many Open Files

2014-01-22 Thread Ivan Brusic
The first thing to do is check if your limits are actually being persisted and used. The elasticsearch site has a good writeup: http://www.elasticsearch.org/tutorials/too-many-open-files/ Second, it might be possible that you are reaching the 128k limit. How many shards per node do you have? Do

Re: How to configure elasticsearch to sort the scored documents on a field after score for documents is calculated ?

2014-01-22 Thread Ivan Brusic
for this ? Also, Can I do this sort in my mapping itself ? Thanks On Wed, Jan 22, 2014 at 3:22 AM, Ivan Brusic i...@brusic.com wrote: You can sort on multiple fields for tie-breakers: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-sort.html If you are first

Re: Difference about the nested type and object type?

2014-01-22 Thread Ivan Brusic
The elasticsearch site has a short writeup about the benefits of the nested type: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-nested-type.html Basically, with object types, multiple instances of a field are flattened as an unordered array. In your example, if you

Re: start, end and gap in Price using Aggregations

2014-01-21 Thread Ivan Brusic
We use a slightly inefficient technique that uses numerous buckets with static ranges based on previously monitored heuristics and then combine the buckets on the client side. The ranges change as you apply facets and drill down into the documents further and the ranges change. Cheers, Ivan On

Re: retrieve localhost:9200/_aliases using the java api

2014-01-21 Thread Ivan Brusic
The easiest way to find out how to use the Java API equivalent of a REST call is to simply look up the RestAction class. In this case:

Re: How to configure elasticsearch to sort the scored documents on a field after score for documents is calculated ?

2014-01-21 Thread Ivan Brusic
You can sort on multiple fields for tie-breakers: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-sort.html If you are first sorting on a field, and not the score, you will need to enable track_scores:

Re: How to use ElasticSearch Custom Similarity provider classes ?

2014-01-16 Thread Ivan Brusic
I am assuming that you packaged the two Java files into a jar file and deploy it to the $ES_PATH lib directory. Elasticsearch will pick up any jar files in that directory. Similarities are either set to be the default for all the indices via the index.similarity.default.type configuration

Re: RC1 - Create Index Exception

2014-01-16 Thread Ivan Brusic
Whichever plugin you are using for the ComboAnalyzer is using the wrong class name. Caused by: java.lang.NoClassDefFoundError: org/elasticsearch/ElasticSearchIllegalArgumentException at org.elasticsearch.index.analysis.ComboAnalyzerProvider.get(ComboAnalyzerProvider.java:50) This class

Re: Question on breaking change (boost) in 1.0.0.RC1 release

2014-01-16 Thread Ivan Brusic
Judging by the commits, the functionality was only deprecated, not removed. https://github.com/elasticsearch/elasticsearch/issues/4664 I believe there are many use cases where it makes sense to boost a document at index time. The process only occurs once instead of every time during queries.

Re: Newbie question about analyzed vs not analyzed

2014-01-16 Thread Ivan Brusic
Correction: meant to say use *term* queries on non-analyzed fields, not *text* queries. On Thu, Jan 16, 2014 at 6:22 PM, Ivan Brusic i...@brusic.com wrote: Correct. A term query does not analyze the terms, while a match query does. Generally, you should use text queries on non-analyzed fields

Re: How can we use elasticsearch custom similarity plugin in mapping ?

2014-01-15 Thread Ivan Brusic
I was focused on how to use a custom similarity and not how to actually tackle the problem. In this respect, I agree with Jörg. Also, I believe the today's release (1.0.0.RC1) has additional scripting options:

Re: Terms facet on single field but also return associated id

2014-01-14 Thread Ivan Brusic
Just to clarify, when you say there's 2 fields in the row, do you really mean that the field has two values (an array)? If so, the facet results should look like: alexander 1 great 1 viking 1 vinny 1 There is no easy way to achieve what you want. One way would be to great combined keys during

Re: exists filter

2014-01-14 Thread Ivan Brusic
You would need to combine your two filters with a bool filter: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-bool-filter.html The constant score query should be at the top level and the exists/term filters should be combined with a bool filter underneath it. The

Re: Cluster state yellow

2014-01-14 Thread Ivan Brusic
Don't forget gateway.expected_nodes :) Wouldn't the value increase as you add more nodes? It will, which is precisely why the value is not computed automatically. The value can decrease/increase over time, but the cluster does not know if this is because it is on purpose or because of failures.

Re: Warmer API

2014-01-12 Thread Ivan Brusic
Warmer queries will affect indexing time since each new segment needs to be warmed up. If you are going to do a large bulk load of data, it would be advisable to disable warming. With elasticsearch 0.90.6, it should not be necessary to register different queries for the warmer API:

Building the documentation

2014-01-11 Thread Ivan Brusic
I have been accumulating a few documentation inconsistencies over the past several months and I wanted to finally fix them and contribute them back. Last time I made any changes was in the old markdown repository. I am referring to the documentation under the main project (

Re: Querying unique pageviews in log data

2014-01-11 Thread Ivan Brusic
Sounds like what you are looking for is field collapsing which is not yet supported in elasticsearch. ETA is post 1.0 release. Perhaps there is a way with the new aggregations framework, but I have yet to try it out. Cheers, Ivan On Fri, Jan 10, 2014 at 3:13 PM, Matthew Boynes

Re: Building the documentation

2014-01-11 Thread Ivan Brusic
It seems I have misunderstood the intent of the new documentation repo and it is indeed what I am looking for. I assumed it was the WIP elasticsearch book. Asciidoc is fairly simple, not sure why this level of complexity is needed. Cheers, Ivan On Sat, Jan 11, 2014 at 12:01 PM, Ivan Brusic i

Re: My query returns no facets? Any idea?

2014-01-10 Thread Ivan Brusic
A few things. - I am surprised the response did not return an error. Facets work on the document set returned by the query, so it is incorrect to add a query section to the facet. Try it again without the query. - Did you really use a JSON with a body section? - The fields parameter does not

Re: Upgrades causing Elastic Search downtime

2014-01-09 Thread Ivan Brusic
cluster.routing.allocation.disable_allocation to reduce the need of waiting for things to rebalance. Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email: ma...@campaignmonitor.com web: www.campaignmonitor.com On 8 January 2014 04:41, Ivan Brusic iv...@brusic.com wrote: Almost

Re: Upgrades causing Elastic Search downtime

2014-01-09 Thread Ivan Brusic
That is definitely not the behavior I have ever seen with elasticsearch. If you restart a node with allocation disabled, the restarted node will have no shards and the shards that it should contain are marked as unassigned. I have never seen a node reinitialize the shards it has. Cheers, Ivan

Re: How to query custom rest handler in elastic search using Java api

2014-01-08 Thread Ivan Brusic
The CustomRestAction code you posted contains *exactly* the Java code you need to execute the same action as the REST action. If you want to still want to use the REST URL, you cannot use the elasticsearch libraries. /_mastering/nodes is not a valid search type. The action does not even execute a

Re: incrementally scaling ES from the small data

2014-01-08 Thread Ivan Brusic
BTW, I was very wrong when I mentioned that elasticsearch uses consistent hashing. It uses modulo-based hashing, which is why the number of shards cannot change since the modulo is fixed. Working on too many things at once while replying. :) On Wed, Jan 8, 2014 at 1:10 PM, InquiringMind

Re: Hipchat Elasticsearch

2014-01-07 Thread Ivan Brusic
Here are some related links, including a video of a talk: http://www.meetup.com/Elasticsearch-San-Francisco/events/141698772/ -- Ivan On Tue, Jan 7, 2014 at 1:43 AM, Ümit Seren uemit.se...@gmail.com wrote: Interesting read about elasticsearch in HipChat

Re: score based on term frequency only

2014-01-07 Thread Ivan Brusic
://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-advanced-scripting.html Cheers, Britta On Mon, Jan 6, 2014 at 2:13 AM, Ivan Brusic i...@brusic.com wrote: You could provide your own Similarity class as a plugin. Don't have any sample code in front of me, but it would

Re: incrementally scaling ES from the small data

2014-01-07 Thread Ivan Brusic
Elasticsearch uses consistent hashing, so you cannot change the number of shards for an index. If you can reindex data, then you can create a new index with a different number of shards and simply reindex. If your data is temporal in nature, you can create a new index per day/week/month and these

Re: incrementally scaling ES from the small data

2014-01-07 Thread Ivan Brusic
An increase of shards will not cause an increase in sockets used. Each node shard action is responsible for gather the responses from each shard at the file-level before sending the response back to the client. Since each shard is actually its own Lucene index, an increase of shards will increase

Re: How to index an existing json file

2014-01-07 Thread Ivan Brusic
The JSON file is used by the curl command, so in your example it should be in the same directory in which you executed the command (current directory). -- Ivan On Tue, Jan 7, 2014 at 6:00 PM, ZenMaster80 sabdall...@gmail.com wrote: Hi, I am just starting with ElasticSearch, I would like to

Re: specs for non-data, master-eligible nodes?

2014-01-03 Thread Ivan Brusic
I have never used non-data nodes, but in general they should primarily be CPU-bound since it is their responsibility to gather the various shard responses from different nodes and return a unified response to the client. It all comes down to the amount of concurrent requests and the size of your

Re: elastic and language stem (polish)

2014-01-02 Thread Ivan Brusic
Analyzers are associated with fields, so in your mapping you can specify which analyzer to use. When you query a field, elasticsearch will know which analyzer to use (although it can be overridden). For example: title: {type: string, analyzer: polish} If you are using the plugin, there is no

Re: Getting specific Fields

2014-01-02 Thread Ivan Brusic
Not yet supported: https://github.com/elasticsearch/elasticsearch/issues/3022 Cheers, Ivan On Thu, Jan 2, 2014 at 4:27 AM, paul avinashpau...@gmail.com wrote: My DATA --- { rankingList:[ { value:9, key:Academic }, {

Re: Need help retrieving field from ES

2014-01-02 Thread Ivan Brusic
Judging by the one sample document, the keepalive field is not there. You can use the missing filter to see if any documents do have that field. For example: curl -XPOST localhost:9200/2014010119/_count/ -d ' { filtered: { query: { match_all: {} }, filter: {

Re: Need help retrieving field from ES

2014-01-02 Thread Ivan Brusic
is loaded with data? Excuse my stupid questions, but I thought the field would be created if an index was created that had a mapping for keepalive. Thanks Nick On Thursday, 2 January 2014 20:45:46 UTC, Ivan Brusic wrote: Judging by the one sample document, the keepalive field is not there. You

Re: facets on nested objects, plus facet_filter

2014-01-02 Thread Ivan Brusic
AFAIK, you cannot filter on parent fields when faceting on nested documents. Cheers, Ivan On Thu, Jan 2, 2014 at 2:46 PM, Nathan Moon nathannos...@gmail.com wrote: Hi, I am using nested objects for indexing “ratings” on an object, where a rating contains two properties: the owner and the

<    1   2   3   4   5   6   >