Re: Shard count and plugin questions

2014-06-05 Thread Mark Walkom
I haven't heard of a limit to the number of indexes, obviously the more you have the larger the cluster state that needs to be maintained. You might want to look into routing ( http://exploringelasticsearch.com/advanced_techniques.html or

Re: Kibana 3: display the number of items in a Text panel?

2014-06-05 Thread Itamar Syn-Hershko
number of lines where? you can always show a Count facet that will count the number of results of a query -- Itamar Syn-Hershko http://code972.com | @synhershko https://twitter.com/synhershko Freelance Developer Consultant Author of RavenDB in Action http://manning.com/synhershko/ On Wed, Jun

Re: How exactly works max_expansions in match_phrase_prefix query?

2014-06-05 Thread shgeorge
In the above example if there are documents with terms : test,tester,testing,tests and we are querying for test and max_expansions : 2, should it return only first 2 matching docs? I see that it is returning all the matching docs. Could you please explain? -- View this message in context:

Re: Kibana 3: display the number of items in a Text panel?

2014-06-05 Thread Nitsan Seniak
Here I'm looking for the number of distinct string values for a certain field. Say for instance that the log contains the following records: { ... user_id: joe ...} { ... user_id: mike ...} { ... user_id: joe ...} { ... user_id: sarah ...} { ... user_id: sarah ...} I'd like to be able to display

date_histogram aggregation and DST

2014-06-05 Thread Dunaeth
Hi, I wonder whether it was possible or not to have date histogram aggregation be DST aware. From what I understand of the date histogram algorithm, it's something like : date + offset - (date + offset) % interval Maybe a scripted term aggregation would be a better solution if the date datas

Geo Distance Facet - ElasticsearchParseException

2014-06-05 Thread Munjal Dhamecha
Hello All, I've been facing problem with geo_distance facet since few hours. Error is: ElasticsearchParseException[field must be either 'lat', 'lon' or 'geohash'] I am not sure, if this is bug or I am making a silly mistake here. Please guide in right direction. Gist:

Trouble configuring Logstash to Squid Logs

2014-06-05 Thread SG Chan
My Logstash (1.4.1) config to read Squid log is shown below: *input { file{path = /var/log/squid3/access.log }}filter {grok {match = [message,%{NUMBER:timestamp} \s+ %{NUMBER:request_msec:float} %{IPORHOST:src_ip}

Marvel 1.2.0 java.lang.IllegalStateException

2014-06-05 Thread Paweł Krzaczkowski
Hi. After upgrading Marvel to 1.2.0 (running on Elasticsearch 1.2.1) i'm getting errors like [2014-06-05 10:47:25,346][INFO ][node ] [es-m-3] version[1.2.1], pid[68924], build[6c95b75/2014-06-03T15:02:52Z] [2014-06-05 10:47:25,347][INFO ][node ] [es-m-3]

Re: [ANN] Elasticsearch Simple Action Plugin

2014-06-05 Thread joergpra...@gmail.com
One more hint, you see org.elasticsearch.common.lucene.search.function.FieldValueFunction This implements the ScoreFunction and fetches boost values from a configured field in the doc, for use by the Java API for FunctionScoreQuery. If you can write a custom ScoreFunction, you could implement

Re: Inter-document Queries

2014-06-05 Thread joergpra...@gmail.com
A suggestion for the path model: - index also the path depth, and name the fields with the depth level - execute a nested aggregation query over the path depth levels Example doc with path info: { path0 : promo/A, path1 : sale/B ... } In this doc you know the user went from promo/A to

Searching Parent and Child Documents together as one document?

2014-06-05 Thread Udit Narayan
Hi I have this scenario of discussion board where people create discussion thread called post. Other can comment on it called comment. Now comment i same as post except it as parentId stored in it. In other words, my database schema for post table is Post PostId, PostSubjectId, PostTitle,

Object property vs array index

2014-06-05 Thread random35743373
When a field contains an object, in a terms aggregation I can specify a specific object property that contains the terms I want to use eg { terms: { field: fieldName.propertyContainingTerms } } So with a array type field that contains a list of strings [first, second, third] I

Aggregation average value is not coming correct

2014-06-05 Thread Subhadip Bagui
Hi, I'm using the below code to get the average value of cpu_usage using aggregation. When I checked the output of cpu value individually and calculate the avg, it is not matching with the aggregation avg value. I'm using a boolquery along with rangeFilter here to get the data. Please help to

Re: Problem with the river-jdbc sqlite

2014-06-05 Thread Matt Burns
Unfortunately, that version of the sqlite driver does not work on OSX: java.lang.NoClassDefFoundError: org/sqlite/NativeDB See: https://bitbucket.org/xerial/sqlite-jdbc/issue/127 On Thursday, 24 April 2014 07:59:11 UTC+1, Jörg Prante wrote: You must use a JDBC4 driver (jdbc sqlite

Re: Problem with the river-jdbc sqlite

2014-06-05 Thread Matt Burns
Ahh, I just realised that if I solve this, I just bump into the next problem regarding the readonly flag: https://github.com/jprante/elasticsearch-river-jdbc/issues/250 Humph :( On Thursday, 5 June 2014 12:05:24 UTC+1, Matt Burns wrote: Unfortunately, that version of the sqlite driver does

using elasticsearch how to build reports asp.net

2014-06-05 Thread khajavali sk
Please help us. We are trying to build few reports using your tool through ASP.NET Web application. We don't know what is the process. Please request help us and provide few sample applications to build reports through asp.net web. -- You received this message because you are subscribed to

Java Client - Error Handling

2014-06-05 Thread Nir Dothan
I'm not sure how to handle errors when using the java client. How do I grammatically know if my connection was successful, or if indexing of a document succeeded? In Rest we have the http result code, but in java, I did not see a documented way to catch checked exceptions or anything like that.

Get by _id doesn't work but search does.

2014-06-05 Thread Luke Wilson-Mawer
Hi, I'm seeing weird behaviours with ids on elasticsearch 1.2.0 (recently upgraded from 1.0.1). A search retrieves my document, showing the correct value for _id: [terminal] curl 'myServer:9200/global/_search?q=someField:something

Re: Get by _id doesn't work but search does.

2014-06-05 Thread Adrien Grand
Hi, This is very likely because of https://github.com/elasticsearch/elasticsearch/pull/6393 See http://www.elasticsearch.org/blog/elasticsearch-1-2-1-released/ for more information, we are currently working on a tool that would help relocate documents to the right shard. On Thu, Jun 5, 2014 at

Re: Best cluster environment for search

2014-06-05 Thread Marcelo Paes Rech
Hi Jörg. Thanks for your reply again. As I said, I already had used ids filter, but I got the same behaviour. I realized what was wrong. Maybe it could be a bug in ES or not. When I executed the filter I included from and size attibutes. In this case size was 99, but the final result

Re: Java Client - Error Handling

2014-06-05 Thread joergpra...@gmail.com
Do you use TransportClient or NodeClient? On NodeClient, you are tied to the cluster, as the node is being a part of it, on TransportClient, you can count the connected nodes. The discovery mechanism behind the scenes sends ping actions each few seconds for you. If an action fails, you will see

Templates are not updated

2014-06-05 Thread Bernhard Berger
The templates from localhost:9200/_template get not updated with the configured one, even when I create an index. I am not sure, is this is a bug? Steps to reproduce: 1. Create in a fresh Elasticsearch 1.2.1 installation the file config/templates/template_1.json like in this example

Re: Hourly Shards Elasticsearch/Kibana

2014-06-05 Thread Antonio Augusto Santos
Hey Mark, What are you calling lot of resources ? And how do you go about detecting it? Currently I'm ussing ttls for rolling old logs from my cluster. Its pretty small currently (about 40GB of data), but as its get bigger I want to know it it will pose a problem. Thanks On Wednesday, June

Re: Templates are not updated

2014-06-05 Thread Antonio Augusto Santos
AFAIK the templates that lives on the filesystem are not put on _template. Also, you can update the template on the FS without restarting ES and it will get the new info there. On Thursday, June 5, 2014 9:45:53 AM UTC-3, Bernhard Berger wrote: The templates from localhost:9200/_template get

Re: Elasticsearch/Lucene Delete space reuse? recovery?

2014-06-05 Thread Shannon Monasco
I haven't changed my merge settings. How often should segments be created and how often should merges happen naturally? On Jun 4, 2014 4:58 PM, Ivan Brusic i...@brusic.com wrote: Lucene will hold onto deleted documents until a merged is performed. An update in Lucene is basically an atomic

Re: Elasticsearch/Lucene Delete space reuse? recovery?

2014-06-05 Thread Michael McCandless
The default merge policy in Lucene (TieredMergePolicy) has a bias towards segments with more deletes, so it is trying to merge those ones away. You can increase this bias by setting index.reclaim_deletes_weight (see

Re: Templates are not updated

2014-06-05 Thread Bernhard Berger
Thanks, that was an unexpected behaviour for me. I will avoid filesystem templates in the future and directly PUT templates in my application to Elasticsearch. Am 05.06.2014 15:05, schrieb Antonio Augusto Santos: AFAIK the templates that lives on the filesystem are not put on _template. Also,

Re: Java Client - Error Handling

2014-06-05 Thread Nir Dothan
Thanks. The code I'm developing will support both Node and Transport clients. The selection will be configuration driven. There must be a way to determine if a CRUD operation succeeded. For example, see the following code taken from the Logstash Ruby client based plugin. Is there any Java

Re: Shard count and plugin questions

2014-06-05 Thread Todd Nine
Thanks for the feedback Mark. I agree with your thoughts on the testing. We plan on doing some testing, find our failure point, and dial that back to some value that allows us to still run the migration. This way, we can get ahead of the problem. Since a re-index would actually introduce more

Re: Java Client - Error Handling

2014-06-05 Thread joergpra...@gmail.com
Check the Elasticsearch test code. There, you can see how Java API works. For example GetIndexTemplatesResponse response = client().admin().indices().prepareGetTemplates().get(); You can get an empty response if template does not exist, or the execution throws an exception, when something went

Re: Hourly Shards Elasticsearch/Kibana

2014-06-05 Thread Kellan Strong
I thought I replied to this yesterdayAnyways it was with kibana. Thank you for that. On Wednesday, June 4, 2014 9:29:18 AM UTC-7, Antonio Augusto Santos wrote: Hey There, Did you remember to change the Timestamping on Kibana so that it would know you are using an hourly index ? Go the

Nested queries and cross referencing

2014-06-05 Thread David Fox
After reading this http://www.elasticsearch.org/blog/managing-relations-inside-elasticsearch/ excellent document on managing relations in Elasticsearch, I have decided that 'nested queries' are the best solution for our particular query needs.Of the list of negatives for nested queries

Re: Shard count and plugin questions

2014-06-05 Thread joergpra...@gmail.com
The knapsack plugin does not come with a downtime. You can increase shards on the fly by copying an index over to another index (even on another cluster). The index should be write disabled during copy though. Increasing replica level is a very simple command, no index copy required. It seems

Storing aggregation results back into elasticsearch

2014-06-05 Thread erewh0n
I've recently started using and enjoying ES, in particular I'm keen to exploit the new aggregations feature to report on system metrics data that is currently being fed into ES indexes. I'm experimenting with aggregations that fold up things like request rates per machine or API calls (per

Re: Shard count and plugin questions

2014-06-05 Thread Todd Nine
Hey Jörg, Thank you for your response. A few questions/points. In our use cases, the inability to write or read is considered a downtime. Therefore, I cannot disable writes during expansion. Your alias points raise some interesting research I need to do, and I have a few follow up questions.

Elasticsearch and Hadoop Questions

2014-06-05 Thread ES USER
Try as I might and I have read all the stuff I can find on ES' website about this I understand somewhat how the integration works but not the actual nuts and bolts of it. For example: Is Hadoop just storing the files that would normally be stored in the local filesystem for the ES indexes or

A plugin to change the result set before sending it back to the http client

2014-06-05 Thread Mario Mueller
Hey folks, I kindly ask for a hint to achieve the following thing: The goal is to deliver only a json array of source objects to the client. The php app that sits on the other side uses JMS\Serializer to deserialize the response into entities. At the moment the app needs to take an overhead

Re: A plugin to change the result set before sending it back to the http client

2014-06-05 Thread Ivan Brusic
If you are only modifying the REST API calls and not the Java API, such a plugin should be easy. You are not creating a new type of action, merely using the current search one, but changing the output format. Here are two tutorials on simple REST plugins:

Re: Shard count and plugin questions

2014-06-05 Thread joergpra...@gmail.com
Thanks for raising the questions, I will come back later in more detail. Just a quick note, the idea about shards scale write and replica scale read is correct, but Elasticsearch is also elastic which means it scales out, by adding node hardware. The shard/replica scale pattern finds its limits

Re: A plugin to change the result set before sending it back to the http client

2014-06-05 Thread Mario Mueller
So, if I understood your approach in the right way ... I should build a new Rest Action like _search_and_return_source that proxies the original _search one? I've already read those two articles and I've set up my development environment with the help of those ;) Am Donnerstag, 5. Juni 2014

date_histogram not returning key_as_string

2014-06-05 Thread Tim Heikell
Sorry for the noob question, but is there some setting I am missing? It's not clear to me why I'm not getting a key_as_string field in my results. I'm running v1.1.0, here is my search: GET /_all/_search { aggs: { totalsByHour: { date_histogram: { field: sessionStartTime,

Elasticsearch Cluster discovery

2014-06-05 Thread avery . rozar
I have 3 Elasticsearch servers setup on a CentOS KVM host. I'd like to lock these servers down with iptables but when I do this It kills the cluster (even with the propper ports open). So I thought I'd have two servers behind the KVM nat interface, and the primary server with two nics. One nic

multiple bulk request hanging

2014-06-05 Thread S
Hi I am writing a client on nodejs platform and I am calling multiple( around 300) http bulk request one after another and each request has around 300 index actions for a same index/type . the scenario is that a user can upload files (containing the list of items) to my nodejs server to get

Re: A plugin to change the result set before sending it back to the http client

2014-06-05 Thread joergpra...@gmail.com
Just a quick question, do you just want to extract a field from the json source? There are field filters and parameters for shaping such a JSON result, maybe they can already help? Or can you give an example of the problem? Jörg On Thu, Jun 5, 2014 at 7:45 PM, Mario Mueller ma...@xenji.com

Re: A plugin to change the result set before sending it back to the http client

2014-06-05 Thread Mario Mueller
Hey Joerg, I just need the whole content of the _source field like so: [ { HotelName: Plaka, ProductCode: 7050, objectId: 437-de, GroupId: 25223, readonly: false, lang: de, City: Athens }, { HotelName: Hyatt at Fisherman's Wharf,

Re: Shard count and plugin questions

2014-06-05 Thread Todd Nine
Hey Jorg, Thanks for the reply. We're using Cassandra heavily in production, I'm very familiar with the scale out out concepts. What we've seen in all our distributed systems is that at some point, you reach a saturation of your capacity for a single node. In the case of ES, to me that would

Re: A plugin to change the result set before sending it back to the http client

2014-06-05 Thread Ivan Brusic
There is no way to eliminate returning the search metadata. It has been requested often. -- Ivan On Thu, Jun 5, 2014 at 12:40 PM, Mario Mueller ma...@xenji.com wrote: Hey Joerg, I just need the whole content of the _source field like so: [ { HotelName: Plaka,

Re: A plugin to change the result set before sending it back to the http client

2014-06-05 Thread Ivan Brusic
I just looked it up and it should be as easy as creating your own RestResponseListener that takes a SearchResponse and creates a simplified version with no metadata. Should be an interesting quick plugin, but it looks like Jorg is going to beat me to it (I'm still at work for several more hours).

Re: A plugin to change the result set before sending it back to the http client

2014-06-05 Thread joergpra...@gmail.com
OK, I think I made it. Good exercise to wrestle with Github before going to sleep... https://github.com/jprante/elasticsearch-arrayformat Best, Jörg On Thu, Jun 5, 2014 at 10:28 PM, Ivan Brusic i...@brusic.com wrote: I just looked it up and it should be as easy as creating your own

min_doc_count on lower/lowest level nested aggregation

2014-06-05 Thread George Lui
I have this query with some nested aggregations *{* * aggs: {* * by_date: {* * date_histogram: {* *field: timestamp,* *interval: day* * },* * aggs: {* *new_users: {* * filter: {* * query: {* *

Could a custom Aggregator be used for general purpose Map/Reduce or bulk update?

2014-06-05 Thread Daniel Winterstein
Hello, So by writing a plugin you can create a custom aggregation.[1] I'd like to explore what we could do with that. Why? I'm looking for ways round a costly scan-and-update-each-document algorithm. Do Aggregators run in a parallel fashion, with your aggregation being run against all shards

Re: A plugin to change the result set before sending it back to the http client

2014-06-05 Thread Ivan Brusic
I see that we agree that a new RestResponseListener is the way to go. I have not cloned your project yet, only looked at the code on github, but I noticed that you provided your own parseSearchRequest, but still call RestSearchAction.parseSearchRequest from inside handleRequest. Did I

Re: Shard count and plugin questions

2014-06-05 Thread joergpra...@gmail.com
Yes, routing is very powerful. The general use case is to introduce a mapping to a large number of shards so you can store parts of data all at the same shard which is good for locality concepts. For example, combined with index alias working on filter terms, you can create one big concrete index,

Re: Storing aggregation results back into elasticsearch

2014-06-05 Thread erewh0n
To clarify, these questions are coming from my desire to dynamically produce real time aggregated information from a stream, which in this case is metric data we're feeding to ES. I'm concerned about unnecessary re-execution of aggregations on (potentially large) data sets that could be

snowball and elusion

2014-06-05 Thread Oto Iashvili
Hello, At first, I was using the analyzer language analyzer and everything seemed to work very well. Until I realize that a is not part of the list of stopwords in french So I decided to test with snowball. It also seemed working well, but in this case it does remove short word like l' ,

Re: Elasticsearch and Hadoop Questions

2014-06-05 Thread Costin Leau
Think of es-hadoop as a connector between Hadoop and Elasticsearch. You would use it to index data in Hadoop to ES or run queries in ES directly from Hadoop. Where does ES store the data? That depends on its configuration (completely separate from es-hadoop itself). In general (and the default) is

Re: A plugin to change the result set before sending it back to the http client

2014-06-05 Thread joergpra...@gmail.com
Ups, yes, a mistake... I bluntly copy/pasted the RestSearchAction. Thanks! Jörg On Fri, Jun 6, 2014 at 12:03 AM, Ivan Brusic i...@brusic.com wrote: I see that we agree that a new RestResponseListener is the way to go. I have not cloned your project yet, only looked at the code on github,

Re: Elasticsearch Cluster discovery

2014-06-05 Thread Mark Walkom
ES runs on an all or nothing principal when it comes to networking. You cannot split cluster and API interfaces. Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email: ma...@campaignmonitor.com web: www.campaignmonitor.com On 6 June 2014 04:17, avery.ro...@insecure-it.com wrote:

Re: Best cluster environment for search

2014-06-05 Thread Mark Walkom
This would probably be worth raising as a github issue - https://github.com/elasticsearch/ Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email: ma...@campaignmonitor.com web: www.campaignmonitor.com On 5 June 2014 22:38, Marcelo Paes Rech marcelopaesr...@gmail.com wrote: Hi

Re: Hourly Shards Elasticsearch/Kibana

2014-06-05 Thread Mark Walkom
It depends on a few factors, document size, index size, etc etc. If you are using ES for logging data, then best practise is to use timestamped indexes and then just drop old ones as needed using curator. Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email:

Re: Could a custom Aggregator be used for general purpose Map/Reduce or bulk update?

2014-06-05 Thread joergpra...@gmail.com
I try to answer some of the queries though I must admit, I am not too much familiar with the aggregation source code yet (still exploring). Aggregations work like a search, they are embedded into the search actions, and work over the result set of a search. They run in each shard, just like the

Re: A plugin to change the result set before sending it back to the http client

2014-06-05 Thread Brian
This may or may not help, but the following worked well for me. Just as any database-backed application, the business logic (such as what you described) is best implemented outside of the database. Since ES is a first-class Java citizen and its Java API is clean and superb (documentation

cluster yellow state

2014-06-05 Thread flyer
I have a cluster of two nodes and have the following configs for shards and replicas: index.number_of_shards: 10 index.number_of_replicas: 1 But when I index around 10k data or just one data, I find that there are always 4 replica shards not to be allocated. Is there a method to allocate all

Re: Queries, filters and match_all

2014-06-05 Thread Arkadiy Zabazhanov
Yeah, I've got ehis already, thanks. I'm still confused why filtered query is returning all results even without match_all in filtered query. четверг, 5 июня 2014 г., 6:21:03 UTC+7 пользователь Ivan Brusic написал: There is no label, but the change was made last December:

Re: cluster yellow state

2014-06-05 Thread flyer
Because it's difficult to recognize which shards are replica (I haven't installed the head plugin), I removed all of the index data, tried to reindex the data but got the same results that there were still some replica shards not to be allocated. I want to know why there're some replica not to be

Best Practices on client (java) settings

2014-06-05 Thread Soumya Sanyal
Hi guys, Relative newcomer to the elasticsearch phenomenon here. I'm trying to rationalize a very basic problem with my service. I'm running Jetty with a 100 or so threads (standard RESTful Service with Spring MVC) and one instance of the ES client in the JVM which seems to have around 14 or

If I set index.number_of_replica:1, then the minimum number of nodes should be 3 to assure that the status of the cluster is gree?

2014-06-05 Thread flyer
I have a cluster of two nodes, and set the configs for shard number and replica number as following: index.number_of_shards: 10 index.number_of_replicas: 1 The master node is elected automatically. Before I index data, the state of the cluster is green. After I index data, the state of the

Re: cluster yellow state

2014-06-05 Thread flyer
I add another node into the cluster and now after I index data, the state of the cluster becomes green. If the replica number is 1, must I have at least 3 nodes to assure that the state of the cluster is green? On Fri, Jun 6, 2014 at 9:46 AM, flyer flyer...@gmail.com wrote: Because it's

Terms Filter lookup with realtime?

2014-06-05 Thread anahap
Hi All, is it possible to use terms filter filter lookup mechanism, so that changes to the lookup document are used in realtime. for example i want to filter already seen documents out and have a lookup document that contains already seen document ids which is updated as the tracking system

Correct way to use TransportClient connection object

2014-06-05 Thread Subhadip Bagui
Hi, I'm using the below code to get a singleton object for TransportClient object. I'm using the getInstance() to get the client object which is already alive in webapplication. public static Client getInstance() { if (instance == null) { logger.debug(the client instance is null, creating