Different results with/without preference=_primary_first/_replica_first using count API

2015-01-18 Thread Xiaoting Ye
Hi, I'm bulk indexing massive data however when I check the status, I found some interesting results: When I called: curl -XGET 'http://localhost:9200/my_index/my_type/_count?pretty' -d '{query : { filtered: {filter : {exists : {field: visibility}}' It returned: { count : 27395968,

JDBC plugin Feeder Mode

2015-01-18 Thread 4m7u1
Hi, This is what I've understood so far, JDBC plugin in Feeder mode is run as a bash script with parameters similar to river. The documentation says that it is a push model. Can anyone explain how does it work? If i have a new data pushed into my db, what role does the feeder play from here

Re: Correct way to use TransportClient connection object

2015-01-18 Thread Subhadip Bagui
Hi, In the same context... some times when I'm shutting down tomcat getting the below exception. And other times it works. Any idea why ? Jan 19, 2015 8:59:30 AM org.apache.catalina.core.StandardContext listenerStop SEVERE: Exception sending context destroyed event to listener instance of

Re: Writing custom scripts for indexing data in Elasticsearch

2015-01-18 Thread 4m7u1
Thank you :) On Friday, January 16, 2015 at 6:27:48 PM UTC+5:30, Jörg Prante wrote: schedule is triggering the JDBC plugin by wall clock time of the machine, where interval simply waits the given time period between two runs. Jörg On Fri, Jan 16, 2015 at 11:12 AM, Amtul Nazneen

Re: Elasticsearch JDBC river plugin- Interval vs Schedule.

2015-01-18 Thread 4m7u1
Okay got it. Thanks :). And are both the same when it comes to performance? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to

Re: Only send requests directly to data notes and not master nodes?

2015-01-18 Thread Justin Zhu
Would all transport clients only connect to this client node? Right now we have them connecting to all 3 master node. On Sunday, January 18, 2015 at 8:43:08 PM UTC-8, Mark Walkom wrote: It depends on your use, but try adding one client in with 8GB heap and see how you go. On 19 January

Re: Only send requests directly to data notes and not master nodes?

2015-01-18 Thread Mark Walkom
It depends on your use, but try adding one client in with 8GB heap and see how you go. On 19 January 2015 at 16:48, Justin Zhu haoranj...@gmail.com wrote: We give the master nodes 5gb of memory -- stats are showing low cpu memory utilization. Would you still recommend the client only node? If

Re: Elsticsearch JDBC river plugin metrics

2015-01-18 Thread 4m7u1
Thank you so much Jörg :) ! On Friday, January 16, 2015 at 6:08:49 PM UTC+5:30, Jörg Prante wrote: These are diagnostic messages which have been crept into one of the releases. Latest version has metrics logging disabled, it must be enabled by settings. The metrics count the number of

Re: Only send requests directly to data notes and not master nodes?

2015-01-18 Thread Justin Zhu
We give the master nodes 5gb of memory -- stats are showing low cpu memory utilization. Would you still recommend the client only node? If so, how many powerful? On Saturday, January 17, 2015 at 6:55:12 PM UTC-8, Mark Walkom wrote: Depends, sounds like you need a few client nodes if you

Re: How highlighting actually works?

2015-01-18 Thread Nikolas Everett
Highlighting is complex and more hacky than you'd imagine at first glance. Each highlighter is different and we can't tell which one you are using without seeing your mapping. For the plain highlighter the cost is roughly proportional to the length of the highlighted field. So in your case its the

A question about keyword_marker

2015-01-18 Thread Nassim
Hi all, I would like to know if there is a limitation of the number of words that we can give to the keyword_marker instruction ? And if there is a big impact on the performance of ES ? Thank you ! -- You received this message because you are subscribed to the Google Groups elasticsearch

Re: fielddata doesn't agree with _source for a long field

2015-01-18 Thread David Pilato
You should change the mapping for this field and use float or double: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-core-types.html#number David Le 18 janv. 2015 à 10:25, Sergey Tsalkov stsal...@gmail.com a écrit : Hey guys! I'm a newcomer and have been

Re: fielddata doesn't agree with _source for a long field

2015-01-18 Thread Sergey Tsalkov
Oh, long is actually an integer field, isn't it? I feel like such an idiot. Much gratitude to you! On Sunday, January 18, 2015 at 1:35:22 AM UTC-8, David Pilato wrote: You should change the mapping for this field and use float or double:

fielddata doesn't agree with _source for a long field

2015-01-18 Thread Sergey Tsalkov
Hey guys! I'm a newcomer and have been diving deep into ElasticSearch for the last week. Today, I've been trying to debug a maddening issue: I have a long field that contains decimals between 0 and 1, and sorting on it is not working. The records that are exactly 0 or exactly 1 show up in the

How highlighting actually works?

2015-01-18 Thread Karol Sikora
Hi all, I have some specific requirements for highlighting. I need to search in full content of item for phrase, and then show on which page searched phrase is occuring. So i've created one field named text_content and fields named text_content_{page_number} (text_content_1, text_content_2,

Re: Unicode characters and spaces in elasticsearch field names

2015-01-18 Thread joergpra...@gmail.com
You can find this in the source code. E.g. org.elasticsearch.index.mapper.ContentPath - see delimiter variable, it is '.' by default org.elasticsearch.index.mapper.Uid - see DELIMITER, it is set to '#' and for '*' org.elasticsearch.index.mapper.FieldMappersLookup and

Improving the default routing hash function

2015-01-18 Thread Andrew White
I noticed that the default routing hash function is DJB. This function is particularly poor at routing when the input keys are short and are mildly different. For example, basic two digit hex based values 00 - FF produce very large hot spots on clusters of size 11, 16, and 17 and others. By

Re: A question about keyword_marker

2015-01-18 Thread Adrien Grand
Tokens are stored in a hash table, which provides random access in constant time so I would not worry too much about performance. However, these tokens will be stored in memory so you should keep the size of the list reasonable. On Sun, Jan 18, 2015 at 4:58 PM, Nassim nassim.ka...@gmail.com

Re: How to Limit Search With-In Selected Document ID or Document ID List

2015-01-18 Thread Adrien Grand
Hi, This use-case typically looks like a join (search within the results of another search request) so you should look at whether you can change the way that you model your data in order to be able to use nested docs or the parent/child functionality. Otherwise, there is no better way,

Re: Unicode characters and spaces in elasticsearch field names

2015-01-18 Thread George
Does anybody have an idea at least where in the elasticsearch code this is handled? Thanks! On Friday, January 16, 2015 at 9:21:34 AM UTC+1, George wrote: Hello everybody, I've researched a little bit what characters are allowed in elasticsearch field names. However, I couldn't find

Re: Improving the default routing hash function

2015-01-18 Thread Adrien Grand
Hi Andrew, This is indeed an issue. For your information, elasticsearch will switch to murmur3 in the next major version. For backward compatibility, old indices will still use DJB, but newly created indices will use murmur3. There is more background about this issue at