Balance between number of indices and shards per index

2014-07-08 Thread Magnus Bäck
I'm setting up an Elasticsearch-based log cluster and I'm having some doubts about how I should choose the number of indices and shards. By default, Logstash and Kibana use per-day indices and Elasticsearch defaults to five shards per index. I'm worried that this will create an excessive number of

Re: Balance between number of indices and shards per index

2014-07-08 Thread Mark Walkom
Short - Stop worrying! Long - As you mentioned this is very dependant on your node specs, but ideally you want one shard per node. However you can over-allocate and not run into problems, plus it allows easier balancing when you add more nodes to the cluster. *U*sing daily is a better as you can

Custom Drop Down Filters

2014-07-08 Thread Hari
Hello..I am trying to add custom filters similar to the existing time filter on top right corner of the Kibana dashboard. These filters make it easy to use by the end users of the dashboard. For example, gender field has Male, Female values. Once I click Male option in dropdown, the data gets

Aggregating by hour

2014-07-08 Thread Jenny Blunt
I was sort of expecting the following to give me an aggregation which groups the results only by hour: curl http://localhost:9000/stream/_search -d '{ aggs : { visitor_count : { date_histogram : { field : created_at, interval : hour} } } }' As it stands, it does group by

Re: ingest performance degrades sharply along with the documents having more fileds

2014-07-08 Thread Maco Ma
Hi Kimchy, I rerun the benchmark using ES1.3 with default settings (just disable the _source _all ) and it makes a great progress on the performance. However Solr still outperforms ES 1.3: Number of different meta data field ES ES with disable _all/codec bloom filter *ES 1.3 * Solr

ELASTIC SEARCH - GROUP BY QUERY ON ARRAY

2014-07-08 Thread K.Samanth Kumar Reddy
Hi, I am working on elasticsearch for last 2 months. It is really providing awesome searching capabilities, good json structure documents etc... Currently I am stuck up with the problem on How to write group by query and get the data. Ex:- In this example company, prod_type are defined as

2 JOBs: Elasticsearch engineer @ Sematext

2014-07-08 Thread Otis Gospodnetic
Hi, At Sematext http://sematext.com/ we have 2 interesting openings. 1) We are looking for an engineer who knows Elasticsearch (or Solr or both) and wants to use these technologies to implement search and analytics solutions for both Sematext's own products such as SPM

problem with river-mongodb on ES ver 1.2.1

2014-07-08 Thread Антон Кикоть
Anybody solved the problem with running elasticsearch-river-MongoDB on elasticsearch 1.2.1 version I was not able to run any elasticsearch-river-MongoDB-master nor elasticsearch-river-mongodb-1.2.0. need detailed instructions =) -- You received this message because you are subscribed to the

CLA sign required even for tiny typo fix?

2014-07-08 Thread Lukáš Vlček
Hi, http://www.infoworld.com/d/open-source-software/red-hat-joyent-and-others-break-down-licensing-barriers-244727 May be some companies showing what the future trend will be? :-) (yes, I am biased, I work for one of the companies listed in the blog post) Regards, Lukas -- You received this

Re: CLA sign required even for tiny typo fix?

2014-07-08 Thread Mark Walkom
I'd support this for what it's worth :) Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email: ma...@campaignmonitor.com web: www.campaignmonitor.com On 8 July 2014 21:27, Lukáš Vlček lukas.vl...@gmail.com wrote: Hi,

Re: Elasticsearch Not Working

2014-07-08 Thread Hai S. Ha
I got exactly the same problem like Shriyansh Jain. Anyone knows what happened? Thanks a lot for helping. On Tuesday, July 8, 2014 2:03:44 AM UTC+2, shriyansh jain wrote: When I am verifying the elastic-search status, its giving me the following error message. *elasticsearch dead but

Re: Elasticsearch Not Working

2014-07-08 Thread Hai S. Ha
Hi, Here is the way to solve it: You have to set the variables in /etc/elasticsearch/elasticsearch.yml: path.data: path/to/data path.work: path/to/work path.logs: /var/log/elasticsearch path.conf: /etc/elasticsearch And remember to give access to user elasticsearch for folder that elasticsearch

i want to know that ES possible to create how many count of type.

2014-07-08 Thread hongsgo
i want to know that ES possible to create how many count of type. i want to create that type 10 million to the each index. elasticsearch is can possible to manage type of index that amount of 10 million? sorry to my english skill sombody help me. please teach me. -- View this message in

Re: Custom Plugin for specifying custom filter attributes at query time

2014-07-08 Thread 'Sandeep Ramesh Khanzode' via elasticsearch
Hi, Sure. Thanks a lot for the helpful pointers. I will take a look at the classes and create a plugin. If there are any gotcha's or certain ways of doing things in this plugin, please tell me so that I can take note. It seems that the plugin would be small with just the Parser/Builder and a

Re: ELASTIC SEARCH - GROUP BY QUERY ON ARRAY

2014-07-08 Thread vineeth mohan
Hi Samanth , First you will need to make that array into nested type. - http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-nested-type.html#mapping-nested-type Then you need to do a 2 level agg with term aggregation at parent on field prod_type and sum aggregation on

Classification Pattern: Percolate, Tag, Index

2014-07-08 Thread Peter Passaro
I'm fairly new to Elasticsearch and I'm looking for suggestions on the best pattern to execute something similar to what I've done with other systems. I have a set of fairly complex queries (for about 10 categories) based on a slightly modified version of the Lucene query language. For each new

Classification pattern: Percolate, Tag, Index

2014-07-08 Thread Peter Passaro
I'm fairly new to ES, and wanted to get some guidance about implementing something similar to what I've done with other systems. I have a set of queries I use for classifying documents written in a modified version of the Lucene query syntax. I would like to tag each new document coming into

Re: ingest performance degrades sharply along with the documents having more fileds

2014-07-08 Thread kimchy
Yes, this is the equivalent of using RAMDirectory. Please, don't use this, Mmap is optimized for random access and if the lucene index can fit in heap (to use ram dir), it can certainly fit in OS RAM, without the implications of loading it to heap. On Monday, July 7, 2014 6:26:07 PM UTC+2,

Re: ingest performance degrades sharply along with the documents having more fileds

2014-07-08 Thread kimchy
Hi, thanks for running the tests!. My tests were capped at 10k fields and improve for it, any more than that, I, and anybody here on Elasticsearch (+Lucene: Mike/Robert) simply don't recommend and can't really be behind when it comes to supporting it. In Elasticsearch, there is a conscious

Hive write data to elastic search

2014-07-08 Thread David Zabner
Hi all, I am trying to write data to elastic search from hive and whenever I try I get this error: org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: No resource ['es.resource'] (index/query/location) specified The script I am running looks like this: USE pl_10; ADD jar

Re: CLA sign required even for tiny typo fix?

2014-07-08 Thread Ivan Brusic
Elasticsearch as a company is relatively new, so I hope it will adjust its practices, not just the CLA, as time goes on. The codebase has been evolving so rapidly, that I would assume they are working on the code and the revenue stream and not its licensing model. -- Ivan On Tue, Jul 8, 2014

Can a data node be set up to only be allocated replicas?

2014-07-08 Thread Ned Campion
Hi All, Been a real joy working with es, and today I'm trying to see if I can stump es with a tricky config. Say I have 2 data nodes that contain a collection of indices that were set up with number_of_shards=1 and number_of_replicas=1. Say I want to add a data node that only contains

How to index documents without location field ?

2014-07-08 Thread coder
HI, I need to index a mix of documents, some of which needs to be indexed using geo_point with a location fields but there are some other documents which don't contain location field. Whenever I do indexing, I keep getting Mapper parsing exception with location={} during indexing and

Re: How to index documents without location field ?

2014-07-08 Thread Ivan Brusic
In terms of the parsing exception, can you simply index the document with the field entirely? As far as sorting goes, it makes sense to push the location-less documents to the top or bottom. You lost me on the part regarding the rescorer. Do you need the location-less documents to be returned in

Timeout notification from cluster service

2014-07-08 Thread x0ne
I am running a 4 node cluster running in EC2 and for the past few days, I have noticed that some nodes occasionally timeout on a request resulting in the following: ConnectionError(HTTPConnectionPool(host='HOST', port=9200): Read timed out. (read timeout=10)) caused by:

Re: ELASTIC SEARCH - GROUP BY QUERY ON ARRAY

2014-07-08 Thread K.Samanth Kumar Reddy
Hi Vineeth, Thank you very much. I will try and let you know. Thanks, Samanth On Tuesday, July 8, 2014 4:23:18 PM UTC+5:30, K.Samanth Kumar Reddy wrote: Hi, I am working on elasticsearch for last 2 months. It is really providing awesome searching capabilities, good json structure

Re: Can a data node be set up to only be allocated replicas?

2014-07-08 Thread Ned Campion
Never mind. Anyone please correct me if I'm wrong, but after some thought I think I've convinced myself that there is no need for such a setup. I think it's ok for the 3rd node to get primary shards. Still have to test this, but if I were to have auto_expand_replicas set up on all the indices

Re: CLA sign required even for tiny typo fix?

2014-07-08 Thread Lukáš Vlček
Ivan, I think it is mostly about lowering the barrier for contributors regarding small updates (so you found a missing comma in guide example and you want to fix it) and saving resources on the company side with CLA maintenance. As a company you can always explicitly ask for CLA sign in

Re: How to index documents without location field ?

2014-07-08 Thread coder
Yes, I can index the documents which contain location field but not those documents which don't contain location field. It gives a parsing exception in that case and then stops importing documents. Is there anyway by which I can tell the ES that index location if it's present otherwise skip it

Custom date histogram interval

2014-07-08 Thread Gabe Gorelick-Feldman
Is there any way to use a custom interval with date histograms (either facets or aggregations)? For example, something like { date_histogram: { field: date, interval: fiscalYear } } Obviously, you can always use a regular histogram, but then the client would

Re: How to index documents without location field ?

2014-07-08 Thread coder
yes, I got it. You are right Ivan. I think I should omit the field altogether because that way it won't find that field and will not try to index it. I think that should work. I'll try it and will let you know if that works. But how can I make the use of that location field is also very

Re: Best practice to backup index daily?

2014-07-08 Thread sabdalla80
This looks great. However I am not sure if I am missing anything, When I take a snapshot with curl, it works fine by taking the snapshot: curl -XPUT http:local_host:9200/_snapshot/es_repository/snapshot_1?wait_for_completion=true However with curator, it completes but no snapshots are actually

Re: Aggregating by hour

2014-07-08 Thread Gabe Gorelick-Feldman
I think you want something like a histogram with a value script to decide the bucket. But it looks like histogram doesn't support that, so would a range agg work? Otherwise, it might be easiest to store the hour in addition to the timestamp. On Tuesday, July 8, 2014 4:06:02 AM UTC-4, Jenny

Re: Problems upgrading an existing field to a multi-field

2014-07-08 Thread Ryan Tanner
*bump* Anyone? On Monday, July 7, 2014 5:15:06 PM UTC-6, Ryan Tanner wrote: I'm having trouble upgrading an existing field to a multi-field. I've done this before with no issues on other fields. I think the issue here is that the original mapping specifically defines an analyzer:

Translog stage never ending

2014-07-08 Thread Matías Waisgold
Hi, I'm using elastic 1.1.1 and after some issues with an instance I had to recreate instances and rebalance. The issue I'm seeing is that it never passes the translog stage and don't know why. No errors, no nothing. Any Ideas? index shard timetypestagesource_host

Re: CLA sign required even for tiny typo fix?

2014-07-08 Thread Shay Banon
We can’t start to differentiate between one contribution or the other, cause then we start a different discussion, to where does the line goes. Its simpler to have a consistent message. Btw, our CLA is explicitly very lightweight, and it aims at protecting the contributors as well. Its quite

Re: Problems upgrading an existing field to a multi-field

2014-07-08 Thread Shay Banon
Which version of ES are using? I believe we fixed a bug around this several versions ago. On Jul 8, 2014, at 20:31, Ryan Tanner ryan.tan...@gmail.com wrote: *bump* Anyone? On Monday, July 7, 2014 5:15:06 PM UTC-6, Ryan Tanner wrote: I'm having trouble upgrading an existing field to a

Re: Problems upgrading an existing field to a multi-field

2014-07-08 Thread Ryan Tanner
1.1.1 in production but I tested this with 1.2.1 locally and had the same problem. On Tuesday, July 8, 2014 12:53:14 PM UTC-6, kimchy wrote: Which version of ES are using? I believe we fixed a bug around this several versions ago. On Jul 8, 2014, at 20:31, Ryan Tanner ryan@gmail.com

Re: Time range filter

2014-07-08 Thread Shay Banon
Aye, make sense to add a dedicated filter for this, care to open an issue? On Jul 8, 2014, at 6:06, vineeth mohan vm.vineethmo...@gmail.com wrote: Hello Tom , At this point , i can think of 2 approaches - Store an additioanl field with just the time and not the date information. Do a

A brief blog on query performance optimisation ???

2014-07-08 Thread akun baru
his isn't mine, just something I found online that might be of interest to others; There is a bunch of tests that are run on AWS that give some good insight into sizing and potential choke points when running queries against a cluster. = )[( Watch 22 Jump Street 2014 Full Movie Online for

Re: CLA sign required even for tiny typo fix?

2014-07-08 Thread Ivan Brusic
Great to see kimchy posting again! On Tue, Jul 8, 2014 at 11:51 AM, Shay Banon kim...@gmail.com wrote: We can’t start to differentiate between one contribution or the other, cause then we start a different discussion, to where does the line goes. Its simpler to have a consistent message.

Re: CLA sign required even for tiny typo fix?

2014-07-08 Thread Lukáš Vlček
Shay, I think it is not fair to say that JBoss is trying to be the beacon here (my formulation could make it sound like that in my initial post - sorry about that), it is more about JBoss catching up with the rest of the company (i.e. Red Hat). See the below link for more details:

Re: Time range filter

2014-07-08 Thread vineeth mohan
Hello Tom , Please paste the link to the issue. I am seeing more of such request in the forum. Thanks Vineeth On Wed, Jul 9, 2014 at 1:06 AM, Tom Miller tom.mil...@ebiz.co.uk wrote: Thanks guys - I've created a ticket in github. I'll store the time separately for now as Vineeth

Re: Custom date histogram interval

2014-07-08 Thread vineeth mohan
Hello Gabe , Please elaborate on what you mean by custom interval . Thanks Vineeth On Tue, Jul 8, 2014 at 11:25 PM, Gabe Gorelick-Feldman gabegorel...@gmail.com wrote: Is there any way to use a custom interval with date histograms (either facets or aggregations)? For example,

Re: Problems upgrading an existing field to a multi-field

2014-07-08 Thread Ryan Tanner
Side question: If I try to set lowercase_terms to true, I get a 400 back saying suggester[term] doesn't support [lowercase_terms] which seems to contradict the documentation. suggest : { text : my query string, person_name : { term : { field :

Re: Custom date histogram interval

2014-07-08 Thread Gabe Gorelick-Feldman
Sure. According to the docs [1], the available expressions for date_histogram interval are year, quarter, month, week, day, hour, minute, second. But what if you want to rollup by another interval that's not supported, like decade or millisecond? I was just wondering if there was maybe a way

Re: Custom date histogram interval

2014-07-08 Thread vineeth mohan
Hello Gabe , You can mention the range as 1.5h or 2w and all - http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-bucket-datehistogram-aggregation.html Hope that might help your cause. Thanks Vineeth On Wed, Jul 9, 2014 at 2:49 AM, Gabe

Re: Custom date histogram interval

2014-07-08 Thread Gabe Gorelick-Feldman
That solves the decade or millisecond problem, but wouldn't work for something like MMWR week or fiscal year which are more complex. Here's the CDC's definition of MMWR week to illustrate my point: The first day of any MMWR week is Sunday. MMWR week numbering is sequential beginning with 1

Re: Can a data node be set up to only be allocated replicas?

2014-07-08 Thread Mark Walkom
There's not really a need as you've discovered. ES is really good at managing distribution of shards, and unless you are running specific hardware for storage (ie tiered storage across different nodes) or you want rack awareness, then it's better to just let ES deal with the allocation of

Re: Custom date histogram interval

2014-07-08 Thread vineeth mohan
Hello Gabe , Th only thing , I can think of would be to store the fiscal year date as a separate field while indexing. And then do all manipulation on this date. Thanks Vineeth On Wed, Jul 9, 2014 at 3:10 AM, Gabe Gorelick-Feldman gabegorel...@gmail.com wrote: That solves the

Re: A brief blog on query performance optimisation ???

2014-07-08 Thread vineeth mohan
Nice try spammer. ( The spam links are invisible as they have a font of whilte colour) On Wed, Jul 9, 2014 at 1:20 AM, akun baru patihgajahmad...@gmail.com wrote: his isn't mine, just something I found online that might be of interest to others; There is a bunch of tests that are run on AWS

Re: Custom date histogram interval

2014-07-08 Thread Gabe Gorelick-Feldman
Thanks, I might do that. Just wanted to make sure there wasn't some easy way to have an interval script On Tuesday, July 8, 2014 5:53:39 PM UTC-4, vineeth mohan wrote: Hello Gabe , Th only thing , I can think of would be to store the fiscal year date as a separate field while indexing.

Re: Tutorial on Java interface to ElasticSearch?

2014-07-08 Thread Adrian
On Thu, Jul 03, 2014 at 09:20:05AM -0700, Ivan Brusic wrote: Ivan, Currently the best way to learn the Java API is to view the Elasticsearch search code. Or just sift through the generated Java API Documentation. You can find some at: http://javadoc.kyubu.de/elasticsearch. Best, Adrian --

Re: Elasticsearch Not Working

2014-07-08 Thread shriyansh jain
Hi, Thank you very much Hai. It worked. Thanks, Shriyansh On Tuesday, July 8, 2014 5:58:53 AM UTC-7, Hai S. Ha wrote: Hi, Here is the way to solve it: You have to set the variables in /etc/elasticsearch/elasticsearch.yml: path.data: path/to/data path.work: path/to/work path.logs:

Re: Time range filter

2014-07-08 Thread Tom Miller
https://github.com/elasticsearch/elasticsearch/issues/6785 On 8 July 2014 22:06, vineeth mohan vm.vineethmo...@gmail.com wrote: Hello Tom , Please paste the link to the issue. I am seeing more of such request in the forum. Thanks Vineeth On Wed, Jul 9, 2014 at 1:06 AM, Tom

Re: CLA sign required even for tiny typo fix?

2014-07-08 Thread joergpra...@gmail.com
I think Elasticsearch CLA is fair. It helps in extreme scenarios, for example if Elasticsearch had to move to a new home or umbrella and continue development from there, without having to ask each and every contributor for permission. Also, playing devil's advocate, Elasticsearch development

Re: New Errors when upgraded from V1.0 to V1.1.0

2014-07-08 Thread sabdalla80
Yes, it was working fine on my 2 node cluster for a long time before upgrading. As a matter of fact, it still does, it indexes docs regardless of the exceptions being printed out. But, I never had the exceptions before upgrading. It is strange, because I can access the 2 nodes and the cluster

Re: New Errors when upgraded from V1.0 to V1.1.0

2014-07-08 Thread Ivan Brusic
How did you upgrade? Are you using repos or tarballs? It could be that you are missing the Lucene jar files or you have different versions of Lucene. Also, are you using the same version of Java across nodes? Java broke network serialization backward compatibility early in 1.7. Probably not the

Query string query mini-language vs. grammar implementation?

2014-07-08 Thread x0ne
Ever since I discovered the mini-language provided through the query string query (http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-query-string-query.html), I have had a hard time going back to the difficult process of mapping what someone wants to a proper

Re: Aggregating by hour

2014-07-08 Thread Antonio Augusto Santos
You can use The histogram aggregate and use a script with something like document[@timestamp].hour -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to

Any issues using 2 shards for an index?

2014-07-08 Thread Drew Kutcharian
Hi All, We are thinking of using two shards per index + 1 replica to keep the number of shards low for some indices. Are there any gotchas with using 2 shards per index besides that at most we can scale the writes to this index to two machines? Thanks, Drew -- You received this message

Re: Any issues using 2 shards for an index?

2014-07-08 Thread Mark Walkom
Writes *and* reads :) You may also end up with some nodes holding more, smaller shards than others, which will mean uneven load. If you have potential for many small indexes, check out routing as an alternative. Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email:

Re: Any issues using 2 shards for an index?

2014-07-08 Thread Drew Kutcharian
Thanks Mark. I know we can scale the reads by adding more replicas. Also the issue with nodes containing a lot of shards can be fixed using index shard allocation. I mainly wanted to see if there are any other undocumented gotchas. On Jul 8, 2014, at 7:19 PM, Mark Walkom

Re: Any issues using 2 shards for an index?

2014-07-08 Thread Nikolas Everett
You should be fine. We run about 1600 indexes most of which are single shard. The are pretty low traffic so it works or fine. Yes we know about routing, no it won't help us. 1600 isn't enough to cause a problem. On Jul 8, 2014 10:24 PM, Drew Kutcharian d...@venarc.com wrote: Thanks Mark. I know

Re: Problems with file locks on OpenVC

2014-07-08 Thread Patrick Mi
Hi there, Where do I configure this ? I put the following line in the elasticsearch.yml but still couldn't start up the server. index.store.fs.lock: none We are running version 1.2.1 on Centos 5 using Lustre file system and for certain reasons we need to turn off the support for native

Every other query slow

2014-07-08 Thread Jonathan Foy
Hello I'm trying to get a new ES cluster tuned properly to actually put into production, and I'm running into some performance issues. While testing, I noticed that when running the same query multiple times, I had alternating fast (~50 ms), and slow (2-3 s) results. It's the exact same

Re: Any issues using 2 shards for an index?

2014-07-08 Thread Mark Walkom
Yep that is all manageable, but you may cross a point where managing that becomes more hassle than it's worth. Something to keep in mind. Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email: ma...@campaignmonitor.com web: www.campaignmonitor.com On 9 July 2014 12:24, Drew

Indexing attachments (11 000), Parsers keep crashing elasticsearch

2014-07-08 Thread aurelien bax
Hi, i'm trying to index 11 000 documents (pdf, word...). My conf : elasticSearch 1.2.1 , elasticsearch-river-jdbc-1.2.1.1-plugin.zip, elasticsearch-mapper-attachments/2.0.0 on a Debian server. I'm using elasticSearch-php. I don't think that posting my code is usefull. I'm obliged to make

Re: ELASTIC SEARCH - GROUP BY QUERY ON ARRAY

2014-07-08 Thread K.Samanth Kumar Reddy
Thank you very much. Its working. Thanks, Samanth On Tuesday, July 8, 2014 4:23:18 PM UTC+5:30, K.Samanth Kumar Reddy wrote: Hi, I am working on elasticsearch for last 2 months. It is really providing awesome searching capabilities, good json structure documents etc... Currently I am

Queries on elasticsearch index template

2014-07-08 Thread Chetana
I am using index template to store metadata of my application. There are many templates (including nested) created. 1. Is template a good idea for application metadata (not all are search related) storage? 2. Does ES store template details in each node or like index data is it possible to

Re: Indexing attachments (11 000), Parsers keep crashing elasticsearch

2014-07-08 Thread David Pilato
Could you gist the full logs? Do you have some big attachments? Could you copy some failing attachments to bintray or any other service and paste the link here? -- David ;-) Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs Le 9 juil. 2014 à 05:42, aurelien bax picol...@gmail.com a écrit :

Re: Indexing attachments (11 000), Parsers keep crashing elasticsearch

2014-07-08 Thread aurelien bax
Hi, Here is the full log (today) : Log https://gist.github.com/anonymous/ef0cbf956714cf9b138f this log contains other kind of error i made like typo on curl.. not revelant for the indexing problem. Most files are less than 2Mo. I had a problem with a 80Mo .rtf file but the file was

Re: Any issues using 2 shards for an index?

2014-07-08 Thread Drew Kutcharian
Yes, but for our usecase we need to use parent/child queries which are pretty much unfeasible to do any other way due to their limitations (can't do parent/child using multiple indices). - Drew On Jul 8, 2014, at 8:17 PM, Mark Walkom ma...@campaignmonitor.com wrote: Yep that is all