search cascading in ES

2014-04-02 Thread Chetana
We are developing an application which requires cascaded (flow based) search where the search result of one will become the input criteria for the next search. Is there a way to do this in ES ? If not, can you suggest some third party library which can provide cascading functionality over ES

Re: How cause ElasticSearch to not throw DocumentAlreadyExistsException?

2014-04-02 Thread Igor Romanov
Eventually I solved the issue in pretty ugly way - I added new add command to elasticsearch, that doing what create does, but don't throw exception ... bad thing about it that each new elasticsearch version that I would want to use, I will need to merge those changes :/ On Monday, March 31,

Grouping entries together in a query, need some help with aggregations

2014-04-02 Thread Vincent Massol
Hi guys, I'd like to count all entries in my ES instance, having a timestamp from the *last day* and *group together all entries having the same instanceId*. With the data below, the count result should be 1 (and not 2) since 2 entries are within the last day but they have the same instanceId

Aggregation error( Java heap space)

2014-04-02 Thread vir . candy
I do an *aggregation* search on my index(*6 nodes*). There are about *200 million lines* of data(port scanning). Each line is same* like this :**{ip:85.18.68.5, banner:cisco-IOS, country:IT, _type:port-80}.* So you can image I have these data sort into different type by port they are

Re: search cascading in ES

2014-04-02 Thread Adrien Grand
Hi, There is no such built-in functionnality in Elasticsearch, and I don't know of a third-party library that would provide this. On Wed, Apr 2, 2014 at 8:10 AM, Chetana ambha.car...@gmail.com wrote: We are developing an application which requires cascaded (flow based) search where the

Re: Aggregation error( Java heap space)

2014-04-02 Thread vir . candy
The smaller index have 1 million lines of data. They are the lines filtered by prefix:{ip:100.1} from the bigger one. 在 2014年4月2日星期三UTC+8下午4时04分27秒,vir@gmail.com写道: I do an *aggregation* search on my index(*6 nodes*). There are about *200 million lines* of data(port scanning). Each line

Re: using java get document api within a script field

2014-04-02 Thread joergpra...@gmail.com
I wrote a denormalizer plugin where I use a node client from a field analyzer for a field type deref. A node client is started as a singleton per node where the plugin is installed. It can ask other indexes/types for a doc by given ID for injecting additional terms from an array of terms of the

Delete index directory

2014-04-02 Thread 김주은
I've tried deleting an index by calling '$ curl -XDELETE 'http://localhost:9200/indexName', but in actual file system 'indexName' directory still persists in the path 'repository/elasticsearch/data/228.5.8.6/nodes/0/indices/'. The delete API only deletes '_state' directory under 'indexName'

Re: Aggregation error( Java heap space)

2014-04-02 Thread Adrien Grand
Given your description of the problem, I think the issue is that your Elasticsearch cluster doesn't have enough memory to load field data for the ip field (which needs to be done for all documents, not only those that match your query). So you either need to give more nodes to your cluster, more

Re: Elasticsearch hardware requirement

2014-04-02 Thread Jorge Román
Thanks for the reply. I have done the test with 1 node (16GB RAM and 8CPUs, allocating 8GB to ES), and I have been able to deal with all events with only 1 node. Now i¹m trying to find out where is the bottleneck. Next step, I¹m gonna try to benchmarking elasticsearch without external elements in

Re: Aggregation error( Java heap space)

2014-04-02 Thread 张阳
But I can do aggregation on 'banner' field on both cluster. Is that because values of 'banner' are not so unique compared to 'ip' field 2014-04-02 16:27 GMT+08:00 Adrien Grand adrien.gr...@elasticsearch.com: Given your description of the problem, I think the issue is that your Elasticsearch

Re: Aggregation error( Java heap space)

2014-04-02 Thread Adrien Grand
On Wed, Apr 2, 2014 at 10:52 AM, 张阳 vir.ca...@gmail.com wrote: But I can do aggregation on 'banner' field on both cluster. Is that because values of 'banner' are not so unique compared to 'ip' field Very likely, yes. Memory usage of field data is higher on high-cardinality fields. -- Adrien

Re: Grouping entries together in a query, need some help with aggregations

2014-04-02 Thread Adrien Grand
Hi Vincent, I left some replies inline: On Wed, Apr 2, 2014 at 10:02 AM, Vincent Massol vmas...@gmail.com wrote: Hi guys, I'd like to count all entries in my ES instance, having a timestamp from the *last day* and *group together all entries having the same instanceId*. With the data

Re: Lucene index corruption on nodes restart

2014-04-02 Thread Paweł Chabierski
Few days ago we found we've got that same error when we search for data. reason: FetchPhaseExecutionException[[site_production][1]: query[ConstantScore(cache(_type:ademail))],from[0],size[648]: Fetch Failed [Failed to fetch doc id [9615533]]]; nested: EOFException[seek past EOF:

[Hadoop] New Feature - to write bulks to different indexes from hadoop...

2014-04-02 Thread Igor Romanov
Hey, I am designing solution for indexing using hadoop. I think to use same logic of LogStash to create index per period of time of my records (10 days or Month) , in order to avoid working with big index sizes(from experience - merge of huge fragments in lucene make whole index being slow)

Re: Grouping entries together in a query, need some help with aggregations

2014-04-02 Thread Vincent Massol
Thanks a lot for your fast response Adrien! * I noticed the cardinality aggregation but I was worried by the an approximate count of distinct values. part of the documentation. I need an exact value, not an approximate one :) However I've read more the documentation and it may not be a real

Re: Need some help for creating my model

2014-04-02 Thread Stefan Kruse
Hello, many thanks to your answer. Could you give me a little example how to add/remove a single child to a parent object, maybe. I would like to do this with the elasticsearch php module. Is this possible? Regards Stefan Am Dienstag, 1. April 2014 16:45:16 UTC+2 schrieb Binh Ly: This

Re: Relevancy sorting of result returned

2014-04-02 Thread chee hoo lum
Hi Binh, Great. Thanks for that. On Wed, Apr 2, 2014 at 12:05 AM, Binh Ly binhly...@yahoo.com wrote: If you specify explain=true in your query, it will tell you in detail how the score is computed: { explain: true, query: {} } Some useful info:

[Tool Contribution] Alfred the ElasticSearch Butler

2014-04-02 Thread Colton
Hello ElasticSearch Community, My name is Colton McInroy and I work with DOSarrest Internet Security LTD. Over the past few months I have been working with ElasticSearch fairly closely and building a infrastructure for it. When dealing with lots of indices, managing lots them can be

Re: Relevancy sorting of result returned

2014-04-02 Thread chee hoo lum
Hi Binh, The same problem again. I have the following queries : 1) { from : 0, size : 100, explain : true, query : { filtered : { query : { multi_match: { query: happy, fields: [ DISPLAY_NAME^6, PERFORMER ] } }, filter : { query : {

Re: near real time alerts for syslogs

2014-04-02 Thread Antoine Brun
Hello Ryan, I am trying to build the same type of application (device log collecting) and I'm also very new to logstash and elasticsearch. I'm having a hard time setting up a lab environment that can sustain the load (2000 logs/sec, 1024ko logs) and only 60% of the logs are indexed (I count

Re: Rolling restart of a cluster?

2014-04-02 Thread Petter Abrahamsson
Mike, Your script needs to check for the status of the cluster before shutting down a node, ie if the state is yellow wait until it becomes green again before shutting down the next node. You'll probably want do disable allocation of shards while each node is being restarted (enable when node

Re: Lucene index corruption on nodes restart

2014-04-02 Thread simonw
hey, is it possible to look at this index / shard? do you still have it / can you safe it for further investigations? You can ping me directly at simon AT elasticsearch DOT com On Wednesday, April 2, 2014 11:23:38 AM UTC+2, Paweł Chabierski wrote: Few days ago we found we've got that same

wait_for_completion doesn't seem to be working when making a snapshot

2014-04-02 Thread Robin Clarke
I am writing a small script to create a snapshot of my kibana-int index, and hit an odd race condition. I delete the old snapshot if it exists: curl -XDELETE 'http://localhost:9200/_snapshot/backup/snapshot_kibana?pretty' Then make the new snapshot curl -XPUT

Re: transport.tcp.port doesn't work for localhost?

2014-04-02 Thread Igor Motov
You are starting local node, which is using local transport, which is not listening on port 9300. The log message that you see is from transport client that tries to connect to port 9300 but cannot. Try starting just your node and you will be see that nobody listens on port 9300. On Tuesday,

bulk thread pool rejections

2014-04-02 Thread shift
I am seeing a high number of rejections for the bulk thread pool on a 32 core system. Should I leave the thread pool size fixed to the # of cores and the default queue size at 50? Are these rejections re-processed? From my clients sending bulk documents (logstash), do I need to limit the

Make a autosuggest-search searching in realtime doesn't work properly

2014-04-02 Thread Alex K
Hi there, I have the following Request I send to ES: { query: { filtered: { query: { bool: { should: [ { multi_match: { query: socks purple,

Re: transport.tcp.port doesn't work for localhost?

2014-04-02 Thread Dario Rossi
So shall I set local to false? Il giorno mercoledì 2 aprile 2014 15:06:04 UTC+1, Igor Motov ha scritto: You are starting local node, which is using local transport, which is not listening on port 9300. The log message that you see is from transport client that tries to connect to port 9300

Re: wait_for_completion doesn't seem to be working when making a snapshot

2014-04-02 Thread Igor Motov
The wait_for_completion flag has to be specified on URL not in the body. Try this: curl -XPUT http://localhost:9200/_snapshot/backup/snapshot_kibana?wait_for_completion=trueprettyhttp://localhost:9200/_snapshot/backup/snapshot_kibana?pretty -d '{ indices: kibana-int,

Re: transport.tcp.port doesn't work for localhost?

2014-04-02 Thread Igor Motov
If you want to be able to connect to it using Transport Client - yes or remove it completely. If you still get some failure - post here complete log. On Wednesday, April 2, 2014 10:09:16 AM UTC-4, Dario Rossi wrote: So shall I set local to false? Il giorno mercoledì 2 aprile 2014 15:06:04

Re: transport.tcp.port doesn't work for localhost?

2014-04-02 Thread Igor Motov
You should specify the same cluster name for both node and transport client. It looks like they are running in different clusters: [2014-04-02 15:19:23,262][WARN ][org.elasticsearch.client.transport] [Humus Sapien] node [#transport#-1][d][inet[localhost/127.0.0.1:9300]] not part of the cluster

Re: wait_for_completion doesn't seem to be working when making a snapshot

2014-04-02 Thread Robin Clarke
Thanks that works! I didn't notice that detail. Odd that some parameters work in the URL or body, and some only in the URL... o_O Cheers, -Robin- On Wednesday, 2 April 2014 16:11:27 UTC+2, Igor Motov wrote: The wait_for_completion flag has to be specified on URL not in the body. Try this:

Re: transport.tcp.port doesn't work for localhost?

2014-04-02 Thread Dario Rossi
I forgot, after setting up the embedded node, I wait the cluster status to be Yellow with Client client = node.client(); client.admin().cluster().prepareHealth().setWaitForYellowStatus(). execute().actionGet(); this is done on the embedded node. Il giorno mercoledì 2 aprile 2014

ESRejectedExecutionException

2014-04-02 Thread Pandiyan
I've been testing concurrent queries, I have just one node in a server (1 * 4 core CPU, 12G memory) and create a index (4 shards, 1 replica). I use 1000 concurrent threads to query(use TransportClient, search condition contains a termFilter and sort in a field). I've found sometimes the testing

Re: transport.tcp.port doesn't work for localhost?

2014-04-02 Thread Dario Rossi
Thanks, it works now. I suggest to point out the detail about local transport in the docs for TransportClient. Il giorno mercoledì 2 aprile 2014 15:31:06 UTC+1, Igor Motov ha scritto: You should specify the same cluster name for both node and transport client. It looks like they are

Adding payload and retrieve them in highlighting

2014-04-02 Thread Karol Sikora
Hi, My simplified use case is to search in pages of book and show back to user on which pages search phrase was found. First think about such case was to denormalize pages structure into fields in book, eg page_1, page_2, The important thing is that i need to return back on which page we

Mixing bool and multi match/function score query

2014-04-02 Thread Garry Welding
I'm currently doing a query that's a mix of multi match and function score. The important bit of the JSON looks like this: function_score:{ query:{ query_string:{ query:some query,

Aggregations on nested array types

2014-04-02 Thread dazraf
Hi, Gist: https://gist.github.com/dazraf/9935814 Basically, I'd like to be able to aggregate a field of an array of observations, grouped by an ancestor/parent id. So for example (see gist): Aggregate the timings per contestant across a set of contests. I realise that the data can be

Re: bulk thread pool rejections

2014-04-02 Thread Drew Raines
shift wrote: I am seeing a high number of rejections for the bulk thread pool on a 32 core system. Should I leave the thread pool size fixed to the # of cores and the default queue size at 50? Are these rejections re-processed? From my clients sending bulk documents (logstash), do I need

Re: Rolling restart of a cluster?

2014-04-02 Thread Nikolas Everett
I just used this to upgrade our labs environment a couple of days ago: #!/bin/bash export prefix=deployment-elastic0 export suffix=.eqiad.wmflabs rm -f servers for i in {1..4}; do echo $prefix$i$suffix servers done cat __commands__ /tmp/commands wget

Re: Aggregations on nested array types

2014-04-02 Thread dazraf
Hi, I've also experimented with nested types using dynamic templates. Interesting (empty!) aggregation results! Gist: https://gist.github.com/dazraf/9937198 Would be grateful if anyone can shed some light on this please? Thank you. On Wednesday, 2 April 2014 16:05:00 UTC+1, dazraf wrote: Hi,

Re: Grouping entries together in a query, need some help with aggregations

2014-04-02 Thread Vincent Massol
Actually I've just realized I'm going to hit a problem... I wanted to use Kibana to graph this for me but I'm not sure Kibana supports aggregations... Any idea? Thanks -Vincent On Wednesday, April 2, 2014 11:38:14 AM UTC+2, Vincent Massol wrote: Thanks a lot for your fast response Adrien!

Re: ESRejectedExecutionException

2014-04-02 Thread joergpra...@gmail.com
If you have 40 search threads on the node running and no queue, you should not use more than 40 search threads on the client, otherwise rejections are to be expected. Jörg On Wed, Apr 2, 2014 at 9:00 AM, Pandiyan pandy0...@gmail.com wrote: I've been testing concurrent queries, I have just one

Re: elasticsearch data node and kibana on different machines

2014-04-02 Thread computer engineer
you need to ask some difficult questions to get some help around here...oops wait this was my post. On Wednesday, April 2, 2014 11:33:38 AM UTC-4, computer engineer wrote: I would like to know what is the best setup to have an elasticsearch data node and kibana server on separate machines. I

Re: Sense on github abandoned?

2014-04-02 Thread ppearcy
Hi, Since Marvel requires a license for production usage, does this mean in order to use the Marvel bundled Sense against a production instance requires you to buy a license? I just got out of a meeting where I told a bunch of people to go download sense off the chrome store. Whoops :)

Re: Sense on github abandoned?

2014-04-02 Thread Boaz Leskes
People can try and use marvel and thus sense for free in their dev environment. If they want to use it with a production cluster they need a license for that cluster. It doesn't matter how many developers are using it. On Wed, Apr 2, 2014 at 7:14 PM, ppearcy ppea...@gmail.com wrote: Hi,

ES document field.

2014-04-02 Thread san
Could the the JSON fields of the document indexed in Elasticsearch have the following: 1. Capital letters 2. Special character such as SPACE etc. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop

Re: Relevancy sorting of result returned

2014-04-02 Thread chee hoo lum
Hi Ivan, Nope i didn't disable the norm. Here's the mapping : { media: { properties: { AUDIO: { type: string }, BILLINGTYPE_ID: { type: long }, CATMEDIA_CDATE: { type: date,

Re: Rolling restart of a cluster?

2014-04-02 Thread Mike Deeks
That is exactly what I'm doing. For some reason the cluster reports as green even though an entire node is down. The cluster doesn't seem to notice the node is gone and change to yellow until many seconds later. By then my rolling restart script has already gotten to the second node and killed

Delete documents after split brain

2014-04-02 Thread Greg
Hi, i am running a cluster of 5 servers. Elasticsearch version 0.90.5 . Today we run into split brain. One of the server saw all server and was a master and another 4 server saw only 4 servers ans has another server as a master. We restarted broken server, so the problem was gone. I need to

Re: Rolling restart of a cluster?

2014-04-02 Thread Ivan Brusic
My scripts do a wait for yellow before waiting for green, because as you noticed, the cluster does not entering a yellow state immediately following a cluster (shutdown, replica change) event. -- Ivan On Wed, Apr 2, 2014 at 11:08 AM, Mike Deeks mik...@gmail.com wrote: That is exactly what

Re: Rolling restart of a cluster?

2014-04-02 Thread Nikolas Everett
I'm not sure what is up but my advice is to make sure you read the cluster state from the node you are restarting. That'll make sure it is up in the first place and you'll get that node's view of the cluster. Nik On Wed, Apr 2, 2014 at 2:08 PM, Mike Deeks mik...@gmail.com wrote: That is

Re: Sense on github abandoned?

2014-04-02 Thread AlexR
If it is a matter of paying for Sense, I would vote for a paid chrome extension at a reasonable price so people who need sense can purchase it independently from marvell -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this

Re: using java get document api within a script field

2014-04-02 Thread mat taylor
Yes that would be very interesting. I have also got a good workaround to my issue now by using the lookup script from https://github.com/imotov/elasticsearch-native-script-example On Wednesday, April 2, 2014 1:17:52 AM UTC-7, Jörg Prante wrote: I wrote a denormalizer plugin where I use a

Re: how to modify term frequency formula?

2014-04-02 Thread geantbrun
In order to better understand the error, I copied your NormRemovalSimilarity and NormRemovalSimilarityProvider code snippets in usr/share/elasticsearch/lib. I put these 2 files in a jar named NormRemovalSimilarity.jar. After restarting the elasticsearch service, I tried to create the index

Re: Aggregations on nested array types

2014-04-02 Thread dazraf
Thanks very much Mark! I'll study this and respond back on this thread. On Wednesday, 2 April 2014 18:31:29 UTC+1, Mark Harwood wrote: A rough Gist here that sums OK with one level of nesting: https://gist.github.com/markharwood/9938890 On Wednesday, April 2, 2014 5:13:22 PM UTC+1, dazraf

Re: elasticsearch data node and kibana on different machines

2014-04-02 Thread Mark Walkom
When you mean different data nodes, do you mean nodes that are part of the same cluster? If so then all you do is point kibana to one node and it will read any data from that cluster you request. You need to remove the extra quotes you have in that variable, you only need them around the entire

Re: data distribution over shards and replicas

2014-04-02 Thread Mark Walkom
1 - Data from both will be available, you've just told ES not to use the defaults for one index. A replica is not a backup, it's a 1:1 replica so it will contain the same data as the primary shard. 2 - Not sure, but I don't think so as lucene will try to split things. Routing is the recommended

Re: how to modify term frequency formula?

2014-04-02 Thread Ivan Brusic
Are you using a full class name? I have no problems with curl -XPOST 'http://localhost:9200/sim/' -d ' { settings : { similarity : { my_similarity : { type : org.elasticsearch.index.similarity.NormRemovalSimilarityProvider } } }, mappings : { post : { properties : {

Copying fields to a geopoint type ?

2014-04-02 Thread Pascal VINCENT
Hi, I'm new to elasticsearch. My usecase is to load a csv file containing some agencies with geo location, each lines are like : id;label;address;zipcode;city;region;*latitude*;*longitude*;(and some others fields)+ I'm using the csv river plugin to index the file. My mapping is : {

Can a nested attachment type be highlighted?

2014-04-02 Thread Pratikshya Kuinkel
I am new to elasticsearch, so I may not be constructing the mapping in a correct way. But, my mapping looks as follows: /myindex/messages/_mapping { “messages“: { properties: { author: { type: String }, “pipe_id: {

Can _search_with_cluster also cluster the result if content is of type attachment?

2014-04-02 Thread Pratikshya Kuinkel
When trying to use Carrot2 with elasticsearch, I need to map the field which is of type attachment for creating the logical cluster. Will it be able to cluster the result if the content is in base64 encoded format? (as that field is of type attachment) and at the moment, it does not seem to be

Is there a way to know which synonym matched along with field value in the search ouput

2014-04-02 Thread saiprasad mishra
Hi All Lets say we are trying to search for a field which has some stemming filter configured for synonyms. If field has value called x and for which there is a synonym y then can the search result return both x and y Should I do some thing in the index time to store it before hand or is there

Re: Stree Address Queries

2014-04-02 Thread Henri van den Bulk
Thanks for the great explanation. Is there also a comparable equivalent when using query string? On Wednesday, March 26, 2014 2:25:05 PM UTC-6, Binh Ly wrote: You probably want to upgrade to the match query - text queries are older and no longer exist in 1.x. But anyway when you query:

Re: data distribution over shards and replicas

2014-04-02 Thread Subhadip Bagui
Thanks a lot Mark. That explains a lot. By backup I meant copy of same data. One last question, for fast searching what will be the better selection? single index multiple shards or multiple index single shard? Can you please give some reference how lucene splits documents and store in