Write a plugin to query and aggregate results from multiple shards

2014-09-14 Thread 'Sandeep Ramesh Khanzode' via elasticsearch
Hi, I am looking through the sources, and I am not sure whether this is possible. What I am looking to is the possibility to manipulate the SearchRequest object when it reaches the SearchShards level. Since I need to update the object with some value that is shard specific. For this, I was che

Re: Question on top hits aggregation

2014-09-14 Thread Martijn v Groningen
Hi, In that case you would need to wrap a terms agg in a nested aggregation (with the desired path) and a top_hits inside that terms aggregation. Unfortunately the top_hits aggregation doesn't yet work inside a nested aggregation. However work is being done that will add this support: https://gith

Re: COUCHBASE + ELASTIC Parent/child mapping

2014-09-14 Thread Itamar Syn-Hershko
Seems like it will in the next version (looking at the couchbase elastic transport plugin commits) -- Itamar Syn-Hershko http://code972.com | @synhershko Freelance Developer & Consultant Author of RavenDB in Action On Tue, Sep 9,

Search query performance issue when index size is growing

2014-09-14 Thread gopi k
Hi All, I'm experiencing search performance degradation when index size is growing. Eg: For a given query, i'm getting *3500* RPS (Request per second) with *1M* documents on index, whereas with *6M* documents on index is giving me *1200* RPS. Each document size on index is average of 500 bytes

An exception was thrown when stop service

2014-09-14 Thread 帮某某
Hello, I get an exception when stop service.This is log detail: [2014-09-15 10:05:02,910][WARN ][netty.channel.DefaultChannelPipeline] An exception was thrown by an exception handler. java.util.concurrent.RejectedExecutionException: Worker has already been shutdown at org.elasticsearch.common.n

Re: Purge the deleted documents on disk

2014-09-14 Thread Wei
Hi Vineeth, Thanks a lot for your response. However I've tried setting max_num_segments to 1 and only_expunge_deletes to true, using the command curl -XPOST 'http://localhost:9200/_optimize?only_expunge_deletes=true&max_num_segments=1' This only cleared some of the deleted documents. e.g,

Re: java.lang.NoSuchFieldError: ALLOW_UNQUOTED_FIELD_NAMES when trying to query elasticsearch using spark

2014-09-14 Thread Mark Walkom
You probably want to put this in your own thread :) Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email: ma...@campaignmonitor.com web: www.campaignmonitor.com On 15 September 2014 06:55, SAURAV PAUL wrote: > Hi, > > I am trying to use Spark and ElasticSearch. > > Currently, th

Re: java.lang.NoSuchFieldError: ALLOW_UNQUOTED_FIELD_NAMES when trying to query elasticsearch using spark

2014-09-14 Thread SAURAV PAUL
Hi, I am trying to use Spark and ElasticSearch. Currently, the RDD contains pipe delimited records. parsedRDD.saveAsNewAPIHadoopFile(outputLocation, NullWritable.class, Text.class, CustomTextOutputFormat.class, job.getConfiguration()); Write now I am storing the output in HDFS. In

Question on top hits aggregation

2014-09-14 Thread AgentV
Hello I need to use the top hits aggregation for a grouping feature. The aggregation should be applied on nested fields. e.g. My nested field is F with path P. Would

Re: is it possible to reference query/filer data in your aggregations

2014-09-14 Thread Benoit Gagnon
I suppose you can make a bucket for each value of _id that matches your query; you should then have one bucket per document, on which you can define sub-aggregations (only metrics at this granularity) -- but really at that point, script_fields would be equivalent. It sounds like you want to get

Re: Preparing for ElasticSearch in production

2014-09-14 Thread Benoit Gagnon
If you want the ability to do maintenance on your cluster without downtime, it will require at least two nodes. Even if you don't care about replication, and you don't expect your servers to fail, you could consider software/hardware upgrades as a form of failure tolerance. If spawning an occasi

Re: Pagination on unique data

2014-09-14 Thread Benoit Gagnon
There is no support for pagination for terms aggregations. The official reason seems to be that it is "tricky to implement"; see issue #4915 which is now unfortunately closed. So getting paginated terms ordered by count does not seem p

Re: Elasticsearch bad indexing timing

2014-09-14 Thread Itamar Syn-Hershko
Sure thing -- Itamar Syn-Hershko http://code972.com | @synhershko Freelance Developer & Consultant Author of RavenDB in Action On Sun, Sep 14, 2014 at 7:19 PM, Niv Penso wrote: > Amazing answer helped me so much!!! > the load-av

Re: Elasticsearch bad indexing timing

2014-09-14 Thread Niv Penso
Amazing answer helped me so much!!! the load-avg decreased to normal number and the documents index per second increased to 85 Thnx Niv On Sunday, September 14, 2014 1:29:42 PM UTC+3, Itamar Syn-Hershko wrote: > > A couple of suggestions: > > 1. Disable replicas before large amounts of inserts (

Re: Elasticsearch.net client, endpoint strategy?

2014-09-14 Thread Lasse Schou
Thanks for your clarifications. 2014-09-14 15:17 GMT+02:00 Itamar Syn-Hershko : > Yes, all REST clients (that is to exclude the native Java clients) do not > participate in the cluster and therefor don't have and shouldn't have any > knowledge of the cluster. The issue a request to a node in the

Re: Elasticsearch.net client, endpoint strategy?

2014-09-14 Thread Itamar Syn-Hershko
Yes, all REST clients (that is to exclude the native Java clients) do not participate in the cluster and therefor don't have and shouldn't have any knowledge of the cluster. The issue a request to a node in the cluster which then may or may not reroute it to 1 or more nodes in an efficient manner u

Re: Elasticsearch.net client, endpoint strategy?

2014-09-14 Thread Lasse Schou
Alright, so the clients don't have the knowledge to route the request to the right server. This would typically yields an extra request from the receiving node to the node that stores the data, but on the other hand it reduces the complexity of the client. Can anybody confirm that this is the reco

Re: Elasticsearch bad indexing timing

2014-09-14 Thread Itamar Syn-Hershko
A couple of suggestions: 1. Disable replicas before large amounts of inserts (set replica count to 0), and only enable it afterwards again. 2. Use batching , actual batch size would depends on many factors (doc

Elasticsearch bad indexing timing

2014-09-14 Thread Niv Penso
Hey, I am trying to migrate (copy) 35 million documents (which is a standard amount, not too big) between couchbase to elasticsearch. My elasticsearch cluster composed from 3 A3 (4 cores, 7 GB memory) CentOS Severs on Microsoft Azure (each server equals to a large server on Amazon).. I used "ti

Re: how to suppress scientific notation returned for long numbers with Elasticsearch api ?

2014-09-14 Thread joergpra...@gmail.com
ES uses Jackson, and Jackson uses Java Double.toString(), which has peculiarities for numbers < 10^-3 or >= 10^7 since it converts them to scientific notation. There are two options: - patching ES for optional format options of doubles, e.g. NumberFormat nf = NumberFormat.getInstance(); nf.setGr