We are developing an application which requires cascaded (flow based)
search where the search result of one will become the input criteria for
the next search.
Is there a way to do this in ES ? If not, can you suggest some third party
library which can provide cascading functionality over ES
Eventually I solved the issue in pretty ugly way - I added new add
command to elasticsearch, that doing what create does, but don't throw
exception ... bad thing about it that each new elasticsearch version that I
would want to use, I will need to merge those changes :/
On Monday, March 31,
Hi guys,
I'd like to count all entries in my ES instance, having a timestamp from
the *last day* and *group together all entries having the same instanceId*.
With the data below, the count result should be 1 (and not 2) since 2
entries are within the last day but they have the same instanceId
I do an *aggregation* search on my index(*6 nodes*). There are about *200
million lines* of data(port scanning). Each line is same* like this
:**{ip:85.18.68.5,
banner:cisco-IOS, country:IT, _type:port-80}.*
So you can image I have these data sort into different type by port they
are
Hi,
There is no such built-in functionnality in Elasticsearch, and I don't know
of a third-party library that would provide this.
On Wed, Apr 2, 2014 at 8:10 AM, Chetana ambha.car...@gmail.com wrote:
We are developing an application which requires cascaded (flow based)
search where the
The smaller index have 1 million lines of data. They are the lines filtered
by prefix:{ip:100.1} from the bigger one.
在 2014年4月2日星期三UTC+8下午4时04分27秒,vir@gmail.com写道:
I do an *aggregation* search on my index(*6 nodes*). There are about *200
million lines* of data(port scanning). Each line
I wrote a denormalizer plugin where I use a node client from a field
analyzer for a field type deref. A node client is started as a singleton
per node where the plugin is installed. It can ask other indexes/types for
a doc by given ID for injecting additional terms from an array of terms of
the
I've tried deleting an index by calling '$ curl -XDELETE
'http://localhost:9200/indexName',
but in actual file system 'indexName' directory still persists in the path
'repository/elasticsearch/data/228.5.8.6/nodes/0/indices/'.
The delete API only deletes '_state' directory under 'indexName'
Given your description of the problem, I think the issue is that your
Elasticsearch cluster doesn't have enough memory to load field data for the
ip field (which needs to be done for all documents, not only those that
match your query). So you either need to give more nodes to your cluster,
more
Thanks for the reply. I have done the test with 1 node (16GB RAM and 8CPUs,
allocating 8GB to ES), and I have been able to deal with all events with
only 1 node. Now i¹m trying to find out where is the bottleneck.
Next step, I¹m gonna try to benchmarking elasticsearch without external
elements in
But I can do aggregation on 'banner' field on both cluster. Is that because
values of 'banner' are not so unique compared to 'ip' field
2014-04-02 16:27 GMT+08:00 Adrien Grand adrien.gr...@elasticsearch.com:
Given your description of the problem, I think the issue is that your
Elasticsearch
On Wed, Apr 2, 2014 at 10:52 AM, 张阳 vir.ca...@gmail.com wrote:
But I can do aggregation on 'banner' field on both cluster. Is that
because values of 'banner' are not so unique compared to 'ip' field
Very likely, yes. Memory usage of field data is higher on high-cardinality
fields.
--
Adrien
Hi Vincent,
I left some replies inline:
On Wed, Apr 2, 2014 at 10:02 AM, Vincent Massol vmas...@gmail.com wrote:
Hi guys,
I'd like to count all entries in my ES instance, having a timestamp from
the *last day* and *group together all entries having the same
instanceId*. With the data
Few days ago we found we've got that same error when we search for data.
reason: FetchPhaseExecutionException[[site_production][1]:
query[ConstantScore(cache(_type:ademail))],from[0],size[648]: Fetch Failed
[Failed to fetch doc id [9615533]]]; nested: EOFException[seek past EOF:
Hey,
I am designing solution for indexing using hadoop.
I think to use same logic of LogStash to create index per period of time of
my records (10 days or Month) , in order to avoid working with big index
sizes(from experience - merge of huge fragments in lucene make whole index
being slow)
Thanks a lot for your fast response Adrien!
* I noticed the cardinality aggregation but I was worried by the an
approximate count of distinct values. part of the documentation. I need an
exact value, not an approximate one :) However I've read more the
documentation and it may not be a real
Hello,
many thanks to your answer.
Could you give me a little example how to add/remove a single child to a
parent object, maybe.
I would like to do this with the elasticsearch php module. Is this possible?
Regards Stefan
Am Dienstag, 1. April 2014 16:45:16 UTC+2 schrieb Binh Ly:
This
Hi Binh,
Great. Thanks for that.
On Wed, Apr 2, 2014 at 12:05 AM, Binh Ly binhly...@yahoo.com wrote:
If you specify explain=true in your query, it will tell you in detail how
the score is computed:
{
explain: true,
query: {}
}
Some useful info:
Hello ElasticSearch Community,
My name is Colton McInroy and I work with DOSarrest Internet
Security LTD. Over the past few months I have been working with
ElasticSearch fairly closely and building a infrastructure for it. When
dealing with lots of indices, managing lots them can be
Hi Binh,
The same problem again. I have the following queries :
1)
{
from : 0,
size : 100,
explain : true,
query : {
filtered : {
query : {
multi_match: {
query: happy,
fields: [ DISPLAY_NAME^6, PERFORMER ]
}
},
filter : {
query : {
Hello Ryan,
I am trying to build the same type of application (device log collecting)
and I'm also very new to logstash and elasticsearch.
I'm having a hard time setting up a lab environment that can sustain the
load (2000 logs/sec, 1024ko logs) and only 60% of the logs are indexed (I
count
Mike,
Your script needs to check for the status of the cluster before shutting
down a node, ie if the state is yellow wait until it becomes green again
before shutting down the next node. You'll probably want do disable
allocation of shards while each node is being restarted (enable when node
hey,
is it possible to look at this index / shard? do you still have it / can
you safe it for further investigations? You can ping me directly at simon
AT elasticsearch DOT com
On Wednesday, April 2, 2014 11:23:38 AM UTC+2, Paweł Chabierski wrote:
Few days ago we found we've got that same
I am writing a small script to create a snapshot of my kibana-int index,
and hit an odd race condition.
I delete the old snapshot if it exists:
curl -XDELETE
'http://localhost:9200/_snapshot/backup/snapshot_kibana?pretty'
Then make the new snapshot
curl -XPUT
You are starting local node, which is using local transport, which is not
listening on port 9300. The log message that you see is from transport
client that tries to connect to port 9300 but cannot. Try starting just
your node and you will be see that nobody listens on port 9300.
On Tuesday,
I am seeing a high number of rejections for the bulk thread pool on a 32
core system. Should I leave the thread pool size fixed to the # of cores
and the default queue size at 50? Are these rejections re-processed?
From my clients sending bulk documents (logstash), do I need to limit the
Hi there,
I have the following Request I send to ES:
{
query: {
filtered: {
query: {
bool: {
should: [
{
multi_match: {
query: socks purple,
So shall I set local to false?
Il giorno mercoledì 2 aprile 2014 15:06:04 UTC+1, Igor Motov ha scritto:
You are starting local node, which is using local transport, which is not
listening on port 9300. The log message that you see is from transport
client that tries to connect to port 9300
The wait_for_completion flag has to be specified on URL not in the body.
Try this:
curl -XPUT
http://localhost:9200/_snapshot/backup/snapshot_kibana?wait_for_completion=trueprettyhttp://localhost:9200/_snapshot/backup/snapshot_kibana?pretty
-d '{
indices: kibana-int,
If you want to be able to connect to it using Transport Client - yes or
remove it completely. If you still get some failure - post here complete
log.
On Wednesday, April 2, 2014 10:09:16 AM UTC-4, Dario Rossi wrote:
So shall I set local to false?
Il giorno mercoledì 2 aprile 2014 15:06:04
You should specify the same cluster name for both node and transport
client. It looks like they are running in different clusters:
[2014-04-02 15:19:23,262][WARN ][org.elasticsearch.client.transport] [Humus
Sapien] node [#transport#-1][d][inet[localhost/127.0.0.1:9300]] not part of
the cluster
Thanks that works! I didn't notice that detail. Odd that some parameters
work in the URL or body, and some only in the URL... o_O
Cheers,
-Robin-
On Wednesday, 2 April 2014 16:11:27 UTC+2, Igor Motov wrote:
The wait_for_completion flag has to be specified on URL not in the body.
Try this:
I forgot, after setting up the embedded node, I wait the cluster status to
be Yellow with
Client client = node.client();
client.admin().cluster().prepareHealth().setWaitForYellowStatus().
execute().actionGet();
this is done on the embedded node.
Il giorno mercoledì 2 aprile 2014
I've been testing concurrent queries, I have just one node in a server (1 * 4
core CPU, 12G memory) and create a index (4 shards, 1 replica). I use 1000
concurrent threads to query(use TransportClient, search condition contains a
termFilter and sort in a field). I've found sometimes the testing
Thanks, it works now.
I suggest to point out the detail about local transport in the docs for
TransportClient.
Il giorno mercoledì 2 aprile 2014 15:31:06 UTC+1, Igor Motov ha scritto:
You should specify the same cluster name for both node and transport
client. It looks like they are
Hi,
My simplified use case is to search in pages of book and show back to user
on which pages search phrase was found.
First think about such case was to denormalize pages structure into fields
in book, eg page_1, page_2, The important thing is that i need to
return back on which page we
I'm currently doing a query that's a mix of multi match and function score.
The important bit of the JSON looks like this:
function_score:{
query:{
query_string:{
query:some query,
Hi,
Gist: https://gist.github.com/dazraf/9935814
Basically, I'd like to be able to aggregate a field of an array of
observations, grouped by an ancestor/parent id.
So for example (see gist): Aggregate the timings per contestant across a
set of contests.
I realise that the data can be
shift wrote:
I am seeing a high number of rejections for the bulk thread pool
on a 32 core system. Should I leave the thread pool size fixed
to the # of cores and the default queue size at 50? Are these
rejections re-processed?
From my clients sending bulk documents (logstash), do I need
I just used this to upgrade our labs environment a couple of days ago:
#!/bin/bash
export prefix=deployment-elastic0
export suffix=.eqiad.wmflabs
rm -f servers
for i in {1..4}; do
echo $prefix$i$suffix servers
done
cat __commands__ /tmp/commands
wget
Hi,
I've also experimented with nested types using dynamic templates.
Interesting (empty!) aggregation results!
Gist: https://gist.github.com/dazraf/9937198
Would be grateful if anyone can shed some light on this please?
Thank you.
On Wednesday, 2 April 2014 16:05:00 UTC+1, dazraf wrote:
Hi,
Actually I've just realized I'm going to hit a problem... I wanted to use
Kibana to graph this for me but I'm not sure Kibana supports
aggregations...
Any idea?
Thanks
-Vincent
On Wednesday, April 2, 2014 11:38:14 AM UTC+2, Vincent Massol wrote:
Thanks a lot for your fast response Adrien!
If you have 40 search threads on the node running and no queue, you should
not use more than 40 search threads on the client, otherwise rejections are
to be expected.
Jörg
On Wed, Apr 2, 2014 at 9:00 AM, Pandiyan pandy0...@gmail.com wrote:
I've been testing concurrent queries, I have just one
you need to ask some difficult questions to get some help around
here...oops wait this was my post.
On Wednesday, April 2, 2014 11:33:38 AM UTC-4, computer engineer wrote:
I would like to know what is the best setup to have an elasticsearch data
node and kibana server on separate machines. I
Hi,
Since Marvel requires a license for production usage, does this mean in
order to use the Marvel bundled Sense against a production instance
requires you to buy a license?
I just got out of a meeting where I told a bunch of people to go download
sense off the chrome store. Whoops :)
People can try and use marvel and thus sense for free in their dev environment.
If they want to use it with a production cluster they need a license for that
cluster. It doesn't matter how many developers are using it.
On Wed, Apr 2, 2014 at 7:14 PM, ppearcy ppea...@gmail.com wrote:
Hi,
Could the the JSON fields of the document indexed in Elasticsearch have the
following:
1. Capital letters
2. Special character such as SPACE etc.
--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop
Hi Ivan,
Nope i didn't disable the norm. Here's the mapping :
{
media: {
properties: {
AUDIO: {
type: string
},
BILLINGTYPE_ID: {
type: long
},
CATMEDIA_CDATE: {
type: date,
That is exactly what I'm doing. For some reason the cluster reports as
green even though an entire node is down. The cluster doesn't seem to
notice the node is gone and change to yellow until many seconds later. By
then my rolling restart script has already gotten to the second node and
killed
Hi,
i am running a cluster of 5 servers. Elasticsearch version 0.90.5 .
Today we run into split brain. One of the server saw all server and was a
master and another 4 server saw only 4 servers ans has another server as a
master. We restarted broken server, so the problem was gone.
I need to
My scripts do a wait for yellow before waiting for green, because as you
noticed, the cluster does not entering a yellow state immediately following
a cluster (shutdown, replica change) event.
--
Ivan
On Wed, Apr 2, 2014 at 11:08 AM, Mike Deeks mik...@gmail.com wrote:
That is exactly what
I'm not sure what is up but my advice is to make sure you read the cluster
state from the node you are restarting. That'll make sure it is up in the
first place and you'll get that node's view of the cluster.
Nik
On Wed, Apr 2, 2014 at 2:08 PM, Mike Deeks mik...@gmail.com wrote:
That is
If it is a matter of paying for Sense, I would vote for a paid chrome extension
at a reasonable price so people who need sense can purchase it independently
from marvell
--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this
Yes that would be very interesting.
I have also got a good workaround to my issue now by using the lookup
script from https://github.com/imotov/elasticsearch-native-script-example
On Wednesday, April 2, 2014 1:17:52 AM UTC-7, Jörg Prante wrote:
I wrote a denormalizer plugin where I use a
In order to better understand the error, I copied your
NormRemovalSimilarity and NormRemovalSimilarityProvider code snippets in
usr/share/elasticsearch/lib. I put these 2 files in a jar named
NormRemovalSimilarity.jar. After restarting the elasticsearch service, I
tried to create the index
Thanks very much Mark! I'll study this and respond back on this thread.
On Wednesday, 2 April 2014 18:31:29 UTC+1, Mark Harwood wrote:
A rough Gist here that sums OK with one level of nesting:
https://gist.github.com/markharwood/9938890
On Wednesday, April 2, 2014 5:13:22 PM UTC+1, dazraf
When you mean different data nodes, do you mean nodes that are part of the
same cluster? If so then all you do is point kibana to one node and it will
read any data from that cluster you request.
You need to remove the extra quotes you have in that variable, you only
need them around the entire
1 - Data from both will be available, you've just told ES not to use the
defaults for one index. A replica is not a backup, it's a 1:1 replica so it
will contain the same data as the primary shard.
2 - Not sure, but I don't think so as lucene will try to split things.
Routing is the recommended
Are you using a full class name? I have no problems with
curl -XPOST 'http://localhost:9200/sim/' -d '
{
settings : {
similarity : {
my_similarity : {
type :
org.elasticsearch.index.similarity.NormRemovalSimilarityProvider
}
}
},
mappings : {
post : {
properties : {
Hi,
I'm new to elasticsearch. My usecase is to load a csv file containing some
agencies with geo location, each lines are like :
id;label;address;zipcode;city;region;*latitude*;*longitude*;(and some
others fields)+
I'm using the csv river plugin to index the file.
My mapping is :
{
I am new to elasticsearch, so I may not be constructing the mapping in a
correct way. But, my mapping looks as follows:
/myindex/messages/_mapping
{
“messages“: {
properties: {
author: {
type: String
},
“pipe_id: {
When trying to use Carrot2 with elasticsearch, I need to map the field
which is of type attachment for creating the logical cluster. Will it be
able to cluster the result if the content is in base64 encoded format? (as
that field is of type attachment) and at the moment, it does not seem to be
Hi All
Lets say we are trying to search for a field which has some stemming filter
configured for synonyms. If field has value called x and for which there is
a synonym y then can the search result return both x and y
Should I do some thing in the index time to store it before hand or is
there
Thanks for the great explanation. Is there also a comparable equivalent
when using query string?
On Wednesday, March 26, 2014 2:25:05 PM UTC-6, Binh Ly wrote:
You probably want to upgrade to the match query - text queries are
older and no longer exist in 1.x. But anyway when you query:
Thanks a lot Mark. That explains a lot.
By backup I meant copy of same data.
One last question, for fast searching what will be the better selection?
single index multiple shards or multiple index single shard?
Can you please give some reference how lucene splits documents and store in
65 matches
Mail list logo