I haven't heard of a limit to the number of indexes, obviously the more you
have the larger the cluster state that needs to be maintained.
You might want to look into routing (
http://exploringelasticsearch.com/advanced_techniques.html or
number of lines where? you can always show a Count facet that will count
the number of results of a query
--
Itamar Syn-Hershko
http://code972.com | @synhershko https://twitter.com/synhershko
Freelance Developer Consultant
Author of RavenDB in Action http://manning.com/synhershko/
On Wed, Jun
In the above example if there are documents with terms :
test,tester,testing,tests
and we are querying for test and max_expansions : 2, should it return
only first 2 matching docs?
I see that it is returning all the matching docs. Could you please explain?
--
View this message in context:
Here I'm looking for the number of distinct string values for a certain
field. Say for instance that the log contains the following records:
{ ... user_id: joe ...}
{ ... user_id: mike ...}
{ ... user_id: joe ...}
{ ... user_id: sarah ...}
{ ... user_id: sarah ...}
I'd like to be able to display
Hi,
I wonder whether it was possible or not to have date histogram aggregation
be DST aware. From what I understand of the date histogram algorithm, it's
something like :
date + offset - (date + offset) % interval
Maybe a scripted term aggregation would be a better solution if the date
datas
Hello All,
I've been facing problem with geo_distance facet since few hours.
Error is: ElasticsearchParseException[field must be either 'lat', 'lon' or
'geohash']
I am not sure, if this is bug or I am making a silly mistake here. Please
guide in right direction.
Gist:
My Logstash (1.4.1) config to read Squid log is shown below:
*input { file{path = /var/log/squid3/access.log }}filter {grok
{match = [message,%{NUMBER:timestamp} \s+
%{NUMBER:request_msec:float} %{IPORHOST:src_ip}
Hi.
After upgrading Marvel to 1.2.0 (running on Elasticsearch 1.2.1) i'm
getting errors like
[2014-06-05 10:47:25,346][INFO ][node ] [es-m-3]
version[1.2.1], pid[68924], build[6c95b75/2014-06-03T15:02:52Z]
[2014-06-05 10:47:25,347][INFO ][node ] [es-m-3]
One more hint, you see
org.elasticsearch.common.lucene.search.function.FieldValueFunction
This implements the ScoreFunction and fetches boost values from a
configured field in the doc, for use by the Java API for FunctionScoreQuery.
If you can write a custom ScoreFunction, you could implement
A suggestion for the path model:
- index also the path depth, and name the fields with the depth level
- execute a nested aggregation query over the path depth levels
Example doc with path info:
{
path0 : promo/A,
path1 : sale/B
...
}
In this doc you know the user went from promo/A to
Hi
I have this scenario of discussion board where people create discussion
thread called post. Other can comment on it called comment. Now comment i
same as post except it as parentId stored in it. In other words, my
database schema for post table is
Post
PostId, PostSubjectId, PostTitle,
When a field contains an object, in a terms aggregation I can specify a
specific object property that contains the terms I want to use eg
{
terms: {
field: fieldName.propertyContainingTerms
}
}
So with a array type field that contains a list of strings [first,
second, third] I
Hi,
I'm using the below code to get the average value of cpu_usage using
aggregation. When I checked the output of cpu value individually and
calculate the avg, it is not matching with the aggregation avg value. I'm
using a boolquery along with rangeFilter here to get the data.
Please help to
Unfortunately, that version of the sqlite driver does not work on OSX:
java.lang.NoClassDefFoundError: org/sqlite/NativeDB
See:
https://bitbucket.org/xerial/sqlite-jdbc/issue/127
On Thursday, 24 April 2014 07:59:11 UTC+1, Jörg Prante wrote:
You must use a JDBC4 driver (jdbc sqlite
Ahh, I just realised that if I solve this, I just bump into the next
problem regarding the readonly flag:
https://github.com/jprante/elasticsearch-river-jdbc/issues/250
Humph :(
On Thursday, 5 June 2014 12:05:24 UTC+1, Matt Burns wrote:
Unfortunately, that version of the sqlite driver does
Please help us. We are trying to build few reports using your tool through
ASP.NET Web application. We don't know what is the process. Please request
help us and provide few sample applications to build reports through asp.net
web.
--
You received this message because you are subscribed to
I'm not sure how to handle errors when using the java client. How do I
grammatically know if my connection was successful, or if indexing of a
document succeeded?
In Rest we have the http result code, but in java, I did not see a
documented way to catch checked exceptions or anything like that.
Hi,
I'm seeing weird behaviours with ids on elasticsearch 1.2.0 (recently
upgraded from 1.0.1).
A search retrieves my document, showing the correct value for _id:
[terminal] curl 'myServer:9200/global/_search?q=someField:something
Hi,
This is very likely because of
https://github.com/elasticsearch/elasticsearch/pull/6393
See http://www.elasticsearch.org/blog/elasticsearch-1-2-1-released/ for
more information, we are currently working on a tool that would help
relocate documents to the right shard.
On Thu, Jun 5, 2014 at
Hi Jörg. Thanks for your reply again.
As I said, I already had used ids filter, but I got the same behaviour.
I realized what was wrong. Maybe it could be a bug in ES or not. When I
executed the filter I included from and size attibutes. In this case
size was 99, but the final result
Do you use TransportClient or NodeClient?
On NodeClient, you are tied to the cluster, as the node is being a part of
it, on TransportClient, you can count the connected nodes.
The discovery mechanism behind the scenes sends ping actions each few
seconds for you. If an action fails, you will see
The templates from localhost:9200/_template get not updated with the
configured one, even when I create an index.
I am not sure, is this is a bug?
Steps to reproduce:
1. Create in a fresh Elasticsearch 1.2.1 installation the file
config/templates/template_1.json like in this example
Hey Mark,
What are you calling lot of resources ? And how do you go about detecting
it?
Currently I'm ussing ttls for rolling old logs from my cluster. Its pretty
small currently (about 40GB of data), but as its get bigger I want to know
it it will pose a problem.
Thanks
On Wednesday, June
AFAIK the templates that lives on the filesystem are not put on _template.
Also, you can update the template on the FS without restarting ES and it
will get the new info there.
On Thursday, June 5, 2014 9:45:53 AM UTC-3, Bernhard Berger wrote:
The templates from localhost:9200/_template get
I haven't changed my merge settings. How often should segments be created
and how often should merges happen naturally?
On Jun 4, 2014 4:58 PM, Ivan Brusic i...@brusic.com wrote:
Lucene will hold onto deleted documents until a merged is performed. An
update in Lucene is basically an atomic
The default merge policy in Lucene (TieredMergePolicy) has a bias towards
segments with more deletes, so it is trying to merge those ones away.
You can increase this bias by setting index.reclaim_deletes_weight (see
Thanks, that was an unexpected behaviour for me. I will avoid filesystem
templates in the future and directly PUT templates in my application to
Elasticsearch.
Am 05.06.2014 15:05, schrieb Antonio Augusto Santos:
AFAIK the templates that lives on the filesystem are not put on
_template.
Also,
Thanks. The code I'm developing will support both Node and Transport
clients. The selection will be configuration driven.
There must be a way to determine if a CRUD operation succeeded. For
example, see the following code taken from the Logstash Ruby client based
plugin. Is there any Java
Thanks for the feedback Mark.
I agree with your thoughts on the testing. We plan on doing some testing,
find our failure point, and dial that back to some value that allows us to
still run the migration. This way, we can get ahead of the problem. Since
a re-index would actually introduce more
Check the Elasticsearch test code. There, you can see how Java API works.
For example
GetIndexTemplatesResponse response =
client().admin().indices().prepareGetTemplates().get();
You can get an empty response if template does not exist, or the execution
throws an exception, when something went
I thought I replied to this yesterdayAnyways it was with kibana. Thank
you for that.
On Wednesday, June 4, 2014 9:29:18 AM UTC-7, Antonio Augusto Santos wrote:
Hey There,
Did you remember to change the Timestamping on Kibana so that it would
know you are using an hourly index ? Go the
After reading this
http://www.elasticsearch.org/blog/managing-relations-inside-elasticsearch/
excellent document on managing relations in Elasticsearch, I have decided
that 'nested queries' are the best solution for our particular query
needs.Of the list of negatives for nested queries
The knapsack plugin does not come with a downtime. You can increase shards
on the fly by copying an index over to another index (even on another
cluster). The index should be write disabled during copy though.
Increasing replica level is a very simple command, no index copy required.
It seems
I've recently started using and enjoying ES, in particular I'm keen to
exploit the new aggregations feature to report on system metrics data that
is currently being fed into ES indexes.
I'm experimenting with aggregations that fold up things like request rates
per machine or API calls (per
Hey Jörg,
Thank you for your response. A few questions/points.
In our use cases, the inability to write or read is considered a downtime.
Therefore, I cannot disable writes during expansion. Your alias points
raise
some interesting research I need to do, and I have a few follow up
questions.
Try as I might and I have read all the stuff I can find on ES' website
about this I understand somewhat how the integration works but not the
actual nuts and bolts of it.
For example:
Is Hadoop just storing the files that would normally be stored in the local
filesystem for the ES indexes or
Hey folks,
I kindly ask for a hint to achieve the following thing:
The goal is to deliver only a json array of source objects to the client.
The php app that sits on the other side uses JMS\Serializer to deserialize
the response into entities. At the moment the app needs to take an overhead
If you are only modifying the REST API calls and not the Java API, such a
plugin should be easy. You are not creating a new type of action, merely
using the current search one, but changing the output format.
Here are two tutorials on simple REST plugins:
Thanks for raising the questions, I will come back later in more detail.
Just a quick note, the idea about shards scale write and replica scale
read is correct, but Elasticsearch is also elastic which means it
scales out, by adding node hardware. The shard/replica scale pattern
finds its limits
So, if I understood your approach in the right way ... I should build a new
Rest Action like _search_and_return_source that proxies the original
_search one?
I've already read those two articles and I've set up my development
environment with the help of those ;)
Am Donnerstag, 5. Juni 2014
Sorry for the noob question, but is there some setting I am missing? It's
not clear to me why I'm not getting a key_as_string field in my results.
I'm running v1.1.0, here is my search:
GET /_all/_search
{
aggs: {
totalsByHour: {
date_histogram: {
field: sessionStartTime,
I have 3 Elasticsearch servers setup on a CentOS KVM host. I'd like to lock
these servers down with iptables but when I do this It kills the cluster
(even with the propper ports open). So I thought I'd have two servers
behind the KVM nat interface, and the primary server with two nics. One nic
Hi
I am writing a client on nodejs platform and I am calling multiple( around
300) http bulk request one after another and each request has around 300
index actions
for a same index/type . the scenario is that a user can upload files
(containing the list of items) to my nodejs server to get
Just a quick question, do you just want to extract a field from the json
source?
There are field filters and parameters for shaping such a JSON result,
maybe they can already help?
Or can you give an example of the problem?
Jörg
On Thu, Jun 5, 2014 at 7:45 PM, Mario Mueller ma...@xenji.com
Hey Joerg,
I just need the whole content of the _source field like so:
[
{
HotelName: Plaka,
ProductCode: 7050,
objectId: 437-de,
GroupId: 25223,
readonly: false,
lang: de,
City: Athens
},
{
HotelName: Hyatt at Fisherman's Wharf,
Hey Jorg,
Thanks for the reply. We're using Cassandra heavily in production, I'm
very familiar with the scale out out concepts. What we've seen in all our
distributed systems is that at some point, you reach a saturation of your
capacity for a single node. In the case of ES, to me that would
There is no way to eliminate returning the search metadata. It has been
requested often.
--
Ivan
On Thu, Jun 5, 2014 at 12:40 PM, Mario Mueller ma...@xenji.com wrote:
Hey Joerg,
I just need the whole content of the _source field like so:
[
{
HotelName: Plaka,
I just looked it up and it should be as easy as creating your
own RestResponseListener that takes a SearchResponse and creates a
simplified version with no metadata.
Should be an interesting quick plugin, but it looks like Jorg is going to
beat me to it (I'm still at work for several more hours).
OK, I think I made it. Good exercise to wrestle with Github before going to
sleep...
https://github.com/jprante/elasticsearch-arrayformat
Best,
Jörg
On Thu, Jun 5, 2014 at 10:28 PM, Ivan Brusic i...@brusic.com wrote:
I just looked it up and it should be as easy as creating your
own
I have this query with some nested aggregations
*{*
* aggs: {*
* by_date: {*
* date_histogram: {*
*field: timestamp,*
*interval: day*
* },*
* aggs: {*
*new_users: {*
* filter: {*
* query: {*
*
Hello,
So by writing a plugin you can create a custom aggregation.[1]
I'd like to explore what we could do with that.
Why? I'm looking for ways round a costly scan-and-update-each-document
algorithm.
Do Aggregators run in a parallel fashion, with your aggregation being run
against all shards
I see that we agree that a new RestResponseListener is the way to go.
I have not cloned your project yet, only looked at the code on github, but
I noticed that you provided your own parseSearchRequest, but still
call RestSearchAction.parseSearchRequest from inside handleRequest. Did I
Yes, routing is very powerful. The general use case is to introduce a
mapping to a large number of shards so you can store parts of data all at
the same shard which is good for locality concepts. For example, combined
with index alias working on filter terms, you can create one big concrete
index,
To clarify, these questions are coming from my desire to dynamically
produce real time aggregated information from a stream, which in this
case is metric data we're feeding to ES. I'm concerned about unnecessary
re-execution of aggregations on (potentially large) data sets that could be
Hello,
At first, I was using the analyzer language analyzer and everything
seemed to work very well. Until I realize that a is not part of the list
of stopwords in french
So I decided to test with snowball. It also seemed working well, but in
this case it does remove short word like l' ,
Think of es-hadoop as a connector between Hadoop and Elasticsearch. You
would use it to index data in Hadoop to ES or run queries in ES directly
from Hadoop.
Where does ES store the data? That depends on its configuration (completely
separate from es-hadoop itself). In general (and the default) is
Ups, yes, a mistake... I bluntly copy/pasted the RestSearchAction. Thanks!
Jörg
On Fri, Jun 6, 2014 at 12:03 AM, Ivan Brusic i...@brusic.com wrote:
I see that we agree that a new RestResponseListener is the way to go.
I have not cloned your project yet, only looked at the code on github,
ES runs on an all or nothing principal when it comes to networking.
You cannot split cluster and API interfaces.
Regards,
Mark Walkom
Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com
On 6 June 2014 04:17, avery.ro...@insecure-it.com wrote:
This would probably be worth raising as a github issue -
https://github.com/elasticsearch/
Regards,
Mark Walkom
Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com
On 5 June 2014 22:38, Marcelo Paes Rech marcelopaesr...@gmail.com wrote:
Hi
It depends on a few factors, document size, index size, etc etc.
If you are using ES for logging data, then best practise is to use
timestamped indexes and then just drop old ones as needed using curator.
Regards,
Mark Walkom
Infrastructure Engineer
Campaign Monitor
email:
I try to answer some of the queries though I must admit, I am not too much
familiar with the aggregation source code yet (still exploring).
Aggregations work like a search, they are embedded into the search
actions, and work over the result set of a search. They run in each shard,
just like the
This may or may not help, but the following worked well for me. Just as any
database-backed application, the business logic (such as what you
described) is best implemented outside of the database. Since ES is a
first-class Java citizen and its Java API is clean and superb
(documentation
I have a cluster of two nodes and have the following configs for shards and
replicas:
index.number_of_shards: 10
index.number_of_replicas: 1
But when I index around 10k data or just one data, I find that there are
always 4 replica shards not to be allocated.
Is there a method to allocate all
Yeah, I've got ehis already, thanks.
I'm still confused why filtered query is returning all results even without
match_all in filtered query.
четверг, 5 июня 2014 г., 6:21:03 UTC+7 пользователь Ivan Brusic написал:
There is no label, but the change was made last December:
Because it's difficult to recognize which shards are replica (I haven't
installed the head plugin), I removed all of the index data, tried to
reindex the data but got the same results that there were still some
replica shards not to be allocated.
I want to know why there're some replica not to be
Hi guys,
Relative newcomer to the elasticsearch phenomenon here. I'm trying to
rationalize a very basic problem with my service. I'm running Jetty with a
100 or so threads (standard RESTful Service with Spring MVC) and one
instance of the ES client in the JVM which seems to have around 14 or
I have a cluster of two nodes, and set the configs for shard number and
replica number as following:
index.number_of_shards: 10
index.number_of_replicas: 1
The master node is elected automatically.
Before I index data, the state of the cluster is green. After I index data,
the state of the
I add another node into the cluster and now after I index data, the state
of the cluster becomes green.
If the replica number is 1, must I have at least 3 nodes to assure that the
state of the cluster is green?
On Fri, Jun 6, 2014 at 9:46 AM, flyer flyer...@gmail.com wrote:
Because it's
Hi All,
is it possible to use terms filter filter lookup mechanism,
so that changes to the lookup document are used in realtime.
for example i want to filter already seen documents out and have
a lookup document that contains already seen document ids which is updated
as the tracking system
Hi,
I'm using the below code to get a singleton object for TransportClient
object. I'm using the getInstance() to get the client object which is
already alive in webapplication.
public static Client getInstance()
{
if (instance == null)
{
logger.debug(the client instance is null, creating
70 matches
Mail list logo