I'm setting up an Elasticsearch-based log cluster and I'm having some
doubts about how I should choose the number of indices and shards.
By default, Logstash and Kibana use per-day indices and Elasticsearch
defaults to five shards per index. I'm worried that this will create
an excessive number of
Short - Stop worrying!
Long - As you mentioned this is very dependant on your node specs, but
ideally you want one shard per node. However you can over-allocate and not
run into problems, plus it allows easier balancing when you add more nodes
to the cluster.
*U*sing daily is a better as you can
Hello..I am trying to add custom filters similar to the existing time
filter on top right corner of the Kibana dashboard. These filters make it
easy to use by the end users of the dashboard. For example, gender field
has Male, Female values. Once I click Male option in dropdown, the data
gets
I was sort of expecting the following to give me an aggregation which
groups the results only by hour:
curl http://localhost:9000/stream/_search -d '{
aggs : {
visitor_count : { date_histogram : { field : created_at,
interval : hour} }
}
}'
As it stands, it does group by
Hi Kimchy,
I rerun the benchmark using ES1.3 with default settings (just disable the
_source _all ) and it makes a great progress on the performance. However
Solr still outperforms ES 1.3:
Number of different meta data field
ES
ES with disable _all/codec bloom filter
*ES 1.3 *
Solr
Hi,
I am working on elasticsearch for last 2 months. It is really providing
awesome searching capabilities, good json structure documents etc...
Currently I am stuck up with the problem on How to write group by query and
get the data.
Ex:- In this example company, prod_type are defined as
Hi,
At Sematext http://sematext.com/ we have 2 interesting openings.
1) We are looking for an engineer who knows Elasticsearch (or Solr or both)
and wants to use these technologies to implement search and analytics
solutions for both Sematext's own products such as SPM
Anybody solved the problem with running elasticsearch-river-MongoDB on
elasticsearch
1.2.1 version I was not able to run any elasticsearch-river-MongoDB-master
nor elasticsearch-river-mongodb-1.2.0. need detailed instructions =)
--
You received this message because you are subscribed to the
Hi,
http://www.infoworld.com/d/open-source-software/red-hat-joyent-and-others-break-down-licensing-barriers-244727
May be some companies showing what the future trend will be? :-)
(yes, I am biased, I work for one of the companies listed in the blog post)
Regards,
Lukas
--
You received this
I'd support this for what it's worth :)
Regards,
Mark Walkom
Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com
On 8 July 2014 21:27, Lukáš Vlček lukas.vl...@gmail.com wrote:
Hi,
I got exactly the same problem like Shriyansh Jain. Anyone knows what
happened?
Thanks a lot for helping.
On Tuesday, July 8, 2014 2:03:44 AM UTC+2, shriyansh jain wrote:
When I am verifying the elastic-search status, its giving me the following
error message.
*elasticsearch dead but
Hi,
Here is the way to solve it:
You have to set the variables in /etc/elasticsearch/elasticsearch.yml:
path.data: path/to/data
path.work: path/to/work
path.logs: /var/log/elasticsearch
path.conf: /etc/elasticsearch
And remember to give access to user elasticsearch for folder that
elasticsearch
i want to know that ES possible to create how many count of type.
i want to create that type 10 million to the each index.
elasticsearch is can possible to manage type of index that amount of 10
million?
sorry to my english skill
sombody help me.
please teach me.
--
View this message in
Hi,
Sure. Thanks a lot for the helpful pointers. I will take a look at the
classes and create a plugin. If there are any gotcha's or certain ways of
doing things in this plugin, please tell me so that I can take note.
It seems that the plugin would be small with just the Parser/Builder and a
Hi Samanth ,
First you will need to make that array into nested type. -
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-nested-type.html#mapping-nested-type
Then you need to do a 2 level agg with term aggregation at parent on field
prod_type and sum aggregation on
I'm fairly new to Elasticsearch and I'm looking for suggestions on the best
pattern to execute something similar to what I've done with other systems.
I have a set of fairly complex queries (for about 10 categories) based on a
slightly modified version of the Lucene query language. For each new
I'm fairly new to ES, and wanted to get some guidance about implementing
something similar to what I've done with other systems. I have a set of
queries I use for classifying documents written in a modified version of
the Lucene query syntax. I would like to tag each new document coming into
Yes, this is the equivalent of using RAMDirectory. Please, don't use this,
Mmap is optimized for random access and if the lucene index can fit in heap
(to use ram dir), it can certainly fit in OS RAM, without the implications
of loading it to heap.
On Monday, July 7, 2014 6:26:07 PM UTC+2,
Hi, thanks for running the tests!. My tests were capped at 10k fields and
improve for it, any more than that, I, and anybody here on Elasticsearch
(+Lucene: Mike/Robert) simply don't recommend and can't really be behind
when it comes to supporting it.
In Elasticsearch, there is a conscious
Hi all,
I am trying to write data to elastic search from hive and whenever I try I
get this error:
org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: No resource
['es.resource'] (index/query/location) specified
The script I am running looks like this:
USE pl_10;
ADD jar
Elasticsearch as a company is relatively new, so I hope it will adjust its
practices, not just the CLA, as time goes on. The codebase has been
evolving so rapidly, that I would assume they are working on the code and
the revenue stream and not its licensing model.
--
Ivan
On Tue, Jul 8, 2014
Hi All,
Been a real joy working with es, and today I'm trying to see if I can stump
es with a tricky config.
Say I have 2 data nodes that contain a collection of indices that were set
up with number_of_shards=1 and number_of_replicas=1.
Say I want to add a data node that only contains
HI,
I need to index a mix of documents, some of which needs to be indexed using
geo_point with a location fields but there are some other documents which
don't contain location field. Whenever I do indexing, I keep getting Mapper
parsing exception with location={} during indexing and
In terms of the parsing exception, can you simply index the document with
the field entirely?
As far as sorting goes, it makes sense to push the location-less documents
to the top or bottom. You lost me on the part regarding the rescorer. Do
you need the location-less documents to be returned in
I am running a 4 node cluster running in EC2 and for the past few days, I
have noticed that some nodes occasionally timeout on a request resulting in
the following:
ConnectionError(HTTPConnectionPool(host='HOST', port=9200): Read timed out.
(read timeout=10)) caused by:
Hi Vineeth,
Thank you very much. I will try and let you know.
Thanks,
Samanth
On Tuesday, July 8, 2014 4:23:18 PM UTC+5:30, K.Samanth Kumar Reddy wrote:
Hi,
I am working on elasticsearch for last 2 months. It is really providing
awesome searching capabilities, good json structure
Never mind. Anyone please correct me if I'm wrong, but after some thought I
think I've convinced myself that there is no need for such a setup. I think
it's ok for the 3rd node to get primary shards. Still have to test this,
but if I were to have auto_expand_replicas set up on all the indices
Ivan,
I think it is mostly about lowering the barrier for contributors regarding
small updates (so you found a missing comma in guide example and you want
to fix it) and saving resources on the company side with CLA maintenance.
As a company you can always explicitly ask for CLA sign in
Yes, I can index the documents which contain location field but not those
documents which don't contain location field. It gives a parsing exception
in that case and then stops importing documents. Is there anyway by which I
can tell the ES that index location if it's present otherwise skip it
Is there any way to use a custom interval with date histograms (either
facets or aggregations)? For example, something like
{
date_histogram: {
field: date,
interval: fiscalYear
}
}
Obviously, you can always use a regular histogram, but then the client
would
yes, I got it. You are right Ivan. I think I should omit the field
altogether because that way it won't find that field and will not try to
index it. I think that should work. I'll try it and will let you know if
that works.
But how can I make the use of that location field is also very
This looks great. However I am not sure if I am missing anything, When I
take a snapshot with curl, it works fine by taking the snapshot:
curl -XPUT
http:local_host:9200/_snapshot/es_repository/snapshot_1?wait_for_completion=true
However with curator, it completes but no snapshots are actually
I think you want something like a histogram with a value script to decide
the bucket. But it looks like histogram doesn't support that, so would a
range agg work? Otherwise, it might be easiest to store the hour in
addition to the timestamp.
On Tuesday, July 8, 2014 4:06:02 AM UTC-4, Jenny
*bump*
Anyone?
On Monday, July 7, 2014 5:15:06 PM UTC-6, Ryan Tanner wrote:
I'm having trouble upgrading an existing field to a multi-field. I've
done this before with no issues on other fields.
I think the issue here is that the original mapping specifically defines
an analyzer:
Hi, I'm using elastic 1.1.1 and after some issues with an instance I had to
recreate instances and rebalance.
The issue I'm seeing is that it never passes the translog stage and don't
know why. No errors, no nothing. Any Ideas?
index shard timetypestagesource_host
We can’t start to differentiate between one contribution or the other, cause
then we start a different discussion, to where does the line goes. Its simpler
to have a consistent message.
Btw, our CLA is explicitly very lightweight, and it aims at protecting the
contributors as well. Its quite
Which version of ES are using? I believe we fixed a bug around this several
versions ago.
On Jul 8, 2014, at 20:31, Ryan Tanner ryan.tan...@gmail.com wrote:
*bump*
Anyone?
On Monday, July 7, 2014 5:15:06 PM UTC-6, Ryan Tanner wrote:
I'm having trouble upgrading an existing field to a
1.1.1 in production but I tested this with 1.2.1 locally and had the same
problem.
On Tuesday, July 8, 2014 12:53:14 PM UTC-6, kimchy wrote:
Which version of ES are using? I believe we fixed a bug around this
several versions ago.
On Jul 8, 2014, at 20:31, Ryan Tanner ryan@gmail.com
Aye, make sense to add a dedicated filter for this, care to open an issue?
On Jul 8, 2014, at 6:06, vineeth mohan vm.vineethmo...@gmail.com wrote:
Hello Tom ,
At this point , i can think of 2 approaches -
Store an additioanl field with just the time and not the date information. Do
a
his isn't mine, just something I found online that might be of interest to
others;
There is a bunch of tests that are run on AWS that give some good insight
into sizing and potential choke points when running queries against a
cluster.
= )[( Watch 22 Jump Street 2014 Full Movie Online for
Great to see kimchy posting again!
On Tue, Jul 8, 2014 at 11:51 AM, Shay Banon kim...@gmail.com wrote:
We can’t start to differentiate between one contribution or the other,
cause then we start a different discussion, to where does the line goes.
Its simpler to have a consistent message.
Shay,
I think it is not fair to say that JBoss is trying to be the beacon here
(my formulation could make it sound like that in my initial post - sorry
about that), it is more about JBoss catching up with the rest of the
company (i.e. Red Hat). See the below link for more details:
Hello Tom ,
Please paste the link to the issue.
I am seeing more of such request in the forum.
Thanks
Vineeth
On Wed, Jul 9, 2014 at 1:06 AM, Tom Miller tom.mil...@ebiz.co.uk wrote:
Thanks guys - I've created a ticket in github. I'll store the time
separately for now as Vineeth
Hello Gabe ,
Please elaborate on what you mean by custom interval .
Thanks
Vineeth
On Tue, Jul 8, 2014 at 11:25 PM, Gabe Gorelick-Feldman
gabegorel...@gmail.com wrote:
Is there any way to use a custom interval with date histograms (either
facets or aggregations)? For example,
Side question:
If I try to set lowercase_terms to true, I get a 400 back saying
suggester[term] doesn't support [lowercase_terms] which seems to
contradict the documentation.
suggest : {
text : my query string,
person_name : {
term : {
field :
Sure. According to the docs [1], the available expressions for
date_histogram interval are year, quarter, month, week, day, hour, minute,
second. But what if you want to rollup by another interval that's not
supported, like decade or millisecond? I was just wondering if there was
maybe a way
Hello Gabe ,
You can mention the range as 1.5h or 2w and all -
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-bucket-datehistogram-aggregation.html
Hope that might help your cause.
Thanks
Vineeth
On Wed, Jul 9, 2014 at 2:49 AM, Gabe
That solves the decade or millisecond problem, but wouldn't work for
something like MMWR week or fiscal year which are more complex. Here's
the CDC's definition of MMWR week to illustrate my point:
The first day of any MMWR week is Sunday. MMWR week numbering is sequential
beginning with 1
There's not really a need as you've discovered.
ES is really good at managing distribution of shards, and unless you are
running specific hardware for storage (ie tiered storage across different
nodes) or you want rack awareness, then it's better to just let ES deal
with the allocation of
Hello Gabe ,
Th only thing , I can think of would be to store the fiscal year date as a
separate field while indexing.
And then do all manipulation on this date.
Thanks
Vineeth
On Wed, Jul 9, 2014 at 3:10 AM, Gabe Gorelick-Feldman
gabegorel...@gmail.com wrote:
That solves the
Nice try spammer. ( The spam links are invisible as they have a font of
whilte colour)
On Wed, Jul 9, 2014 at 1:20 AM, akun baru patihgajahmad...@gmail.com
wrote:
his isn't mine, just something I found online that might be of interest to
others;
There is a bunch of tests that are run on AWS
Thanks, I might do that. Just wanted to make sure there wasn't some easy
way to have an interval script
On Tuesday, July 8, 2014 5:53:39 PM UTC-4, vineeth mohan wrote:
Hello Gabe ,
Th only thing , I can think of would be to store the fiscal year date as a
separate field while indexing.
On Thu, Jul 03, 2014 at 09:20:05AM -0700, Ivan Brusic wrote:
Ivan,
Currently the best way to learn the Java API is to view the Elasticsearch
search code.
Or just sift through the generated Java API Documentation. You can find some
at: http://javadoc.kyubu.de/elasticsearch.
Best, Adrian
--
Hi,
Thank you very much Hai. It worked.
Thanks,
Shriyansh
On Tuesday, July 8, 2014 5:58:53 AM UTC-7, Hai S. Ha wrote:
Hi,
Here is the way to solve it:
You have to set the variables in /etc/elasticsearch/elasticsearch.yml:
path.data: path/to/data
path.work: path/to/work
path.logs:
https://github.com/elasticsearch/elasticsearch/issues/6785
On 8 July 2014 22:06, vineeth mohan vm.vineethmo...@gmail.com wrote:
Hello Tom ,
Please paste the link to the issue.
I am seeing more of such request in the forum.
Thanks
Vineeth
On Wed, Jul 9, 2014 at 1:06 AM, Tom
I think Elasticsearch CLA is fair. It helps in extreme scenarios, for
example if Elasticsearch had to move to a new home or umbrella and
continue development from there, without having to ask each and every
contributor for permission.
Also, playing devil's advocate, Elasticsearch development
Yes, it was working fine on my 2 node cluster for a long time before
upgrading. As a matter of fact, it still does, it indexes docs regardless
of the exceptions being printed out. But, I never had the exceptions before
upgrading. It is strange, because I can access the 2 nodes and the cluster
How did you upgrade? Are you using repos or tarballs? It could be that you
are missing the Lucene jar files or you have different versions of Lucene.
Also, are you using the same version of Java across nodes? Java broke
network serialization backward compatibility early in 1.7. Probably not the
Ever since I discovered the mini-language provided through the query string
query
(http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-query-string-query.html),
I have had a hard time going back to the difficult process of mapping what
someone wants to a proper
You can use The histogram aggregate and use a script with something like
document[@timestamp].hour
--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to
Hi All,
We are thinking of using two shards per index + 1 replica to keep the number of
shards low for some indices. Are there any gotchas with using 2 shards per
index besides that at most we can scale the writes to this index to two
machines?
Thanks,
Drew
--
You received this message
Writes *and* reads :)
You may also end up with some nodes holding more, smaller shards than
others, which will mean uneven load.
If you have potential for many small indexes, check out routing as an
alternative.
Regards,
Mark Walkom
Infrastructure Engineer
Campaign Monitor
email:
Thanks Mark. I know we can scale the reads by adding more replicas. Also the
issue with nodes containing a lot of shards can be fixed using index shard
allocation.
I mainly wanted to see if there are any other undocumented gotchas.
On Jul 8, 2014, at 7:19 PM, Mark Walkom
You should be fine. We run about 1600 indexes most of which are single
shard. The are pretty low traffic so it works or fine.
Yes we know about routing, no it won't help us. 1600 isn't enough to cause
a problem.
On Jul 8, 2014 10:24 PM, Drew Kutcharian d...@venarc.com wrote:
Thanks Mark. I know
Hi there,
Where do I configure this ? I put the following line in the
elasticsearch.yml but still couldn't start up the server.
index.store.fs.lock: none
We are running version 1.2.1 on Centos 5 using Lustre file system and for
certain reasons we need to turn off the support for native
Hello
I'm trying to get a new ES cluster tuned properly to actually put into
production, and I'm running into some performance issues.
While testing, I noticed that when running the same query multiple times, I
had alternating fast (~50 ms), and slow (2-3 s) results. It's the exact
same
Yep that is all manageable, but you may cross a point where managing that
becomes more hassle than it's worth.
Something to keep in mind.
Regards,
Mark Walkom
Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com
On 9 July 2014 12:24, Drew
Hi,
i'm trying to index 11 000 documents (pdf, word...).
My conf :
elasticSearch 1.2.1 , elasticsearch-river-jdbc-1.2.1.1-plugin.zip,
elasticsearch-mapper-attachments/2.0.0 on a Debian server.
I'm using elasticSearch-php. I don't think that posting my code is usefull.
I'm obliged to make
Thank you very much. Its working.
Thanks,
Samanth
On Tuesday, July 8, 2014 4:23:18 PM UTC+5:30, K.Samanth Kumar Reddy wrote:
Hi,
I am working on elasticsearch for last 2 months. It is really providing
awesome searching capabilities, good json structure documents etc...
Currently I am
I am using index template to store metadata of my application. There are
many templates (including nested) created.
1. Is template a good idea for application metadata (not all are search
related) storage?
2. Does ES store template details in each node or like index data is it
possible to
Could you gist the full logs?
Do you have some big attachments?
Could you copy some failing attachments to bintray or any other service and
paste the link here?
--
David ;-)
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs
Le 9 juil. 2014 à 05:42, aurelien bax picol...@gmail.com a écrit :
Hi,
Here is the full log (today) : Log
https://gist.github.com/anonymous/ef0cbf956714cf9b138f
this log contains other kind of error i made like typo on curl.. not
revelant for the indexing problem.
Most files are less than 2Mo. I had a problem with a 80Mo .rtf file but the
file was
Yes, but for our usecase we need to use parent/child queries which are pretty
much unfeasible to do any other way due to their limitations (can't do
parent/child using multiple indices).
- Drew
On Jul 8, 2014, at 8:17 PM, Mark Walkom ma...@campaignmonitor.com wrote:
Yep that is all
73 matches
Mail list logo