Re: Using serialized doc_value instead of _source to improve read latency

2015-04-20 Thread Nikolas Everett
Have you profiled it and seen that reading the source is actually the slow part? hot_threads can lie here so I'd go with a profiler or just sigquit or something. I've got some reasonably big documents and generally don't see that as a problem even under decent load. I could see an argument for a

Index Size and Replica Impact

2015-04-20 Thread TB
I have my indexes size @ 6 GB currently with replica set @ 1. I have 3 node cluster, in order to utilize the cluster , my understanding that i would have set the replica to 3. If i do that, would my index size grow more than 6 GB in each node? -- You received this message because you are

Using serialized doc_value instead of _source to improve read latency

2015-04-20 Thread Itai Frenkel
Hi, We are having a performance problem in which for each hit, elasticsearch parses the entire _source then generates a new Json with only the requested query _source fields. In order to overcome this issue we would like to use mapping transform script that serializes the requested query

Re: Using serialized doc_value instead of _source to improve read latency

2015-04-20 Thread Itai Frenkel
Itamar, 1. The _source field includes many fields that are only being indexed, and many fields that are only needed as a query search result. _source includes them both.The projection from _source from the query result is too CPU intensive to do during search time for each result, especially

Re: Using serialized doc_value instead of _source to improve read latency

2015-04-20 Thread Itamar Syn-Hershko
This is how _source works. doc_values don't make sense in this regard - what you are looking for is using stored fields and have the transform script write to that. Loading stored fields (even one field per hit) may be slower than loading and parsing _source, though. I'd just put this logic in

Re: Using serialized doc_value instead of _source to improve read latency

2015-04-20 Thread Itai Frenkel
A quick check shows there is no significant performance gain between doc_value and stored field that is not a doc value. I suppose there are warm-up and file system caching issues are at play. I do not have that field in the source since the ETL process at this point does not generate it. The

Re: Using serialized doc_value instead of _source to improve read latency

2015-04-20 Thread Itai Frenkel
Also - does fielddata: { loading: eager } makes sense with doc_values in this use case? Would that combination be supported in the future? On Tuesday, April 21, 2015 at 2:14:03 AM UTC+3, Itai Frenkel wrote: Itamar, 1. The _source field includes many fields that are only being indexed, and

2.0 ETA

2015-04-20 Thread Matt Weber
Is there an ETA for 2.0? -- Thanks, Matt Weber -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this

How to diagnose slow queries every 10 minutes exactly?

2015-04-20 Thread Dave Reed
I have a 2-node cluster running on some beefy machines. 12g and 16g of heap space. About 2.1 million documents, each relatively small in size, spread across 200 or so indexes. The refresh interval is 0.5s (while I don't need realtime I do need relatively quick refreshes). Documents are

Re: Using serialized doc_value instead of _source to improve read latency

2015-04-20 Thread Itamar Syn-Hershko
What if all those fields are collapsed to one, like you suggest, but that one field is projected out of _source (think non-indexed json in a string field)? do you see a noticable performance gain then? What if that field is set to be stored (and loaded using fields, not via _source)? what is the

Bulk Index from Remote Host

2015-04-20 Thread TB
We are planning to bulk insert about 10 Gig data ,however we are being forced to do this from a remote host. Is this a good practice? And are there any potential issues i should watch out for? any advice would be great -- You received this message because you are subscribed to the Google

Re: Using serialized doc_value instead of _source to improve read latency

2015-04-20 Thread Itai Frenkel
If I could focus the question better : How do I whitelist a specific class in the groovy script inside transform ? On Tuesday, April 21, 2015 at 1:18:03 AM UTC+3, Itai Frenkel wrote: Hi, We are having a performance problem in which for each hit, elasticsearch parses the entire _source

Re: Using serialized doc_value instead of _source to improve read latency

2015-04-20 Thread Itai Frenkel
Hi Nik, when _source : true the time it takes for the search to complete in elasticsearch is very short. when _souce is a list of fields it is significantly slower. Itai On Tuesday, April 21, 2015 at 3:06:06 AM UTC+3, Nikolas Everett wrote: Have you profiled it and seen that reading the

Re: 2.0 ETA

2015-04-20 Thread Matt Weber
Thanks Adrien! On Mon, Apr 20, 2015 at 3:38 PM, Adrien Grand adr...@elastic.co wrote: Hi Matt, We have this meta issue which tracks what remains to be done before we release 2.0: https://github.com/elastic/elasticsearch/issues/9970. We plan to release as soon as we can but some of these

Re: Index Size and Replica Impact

2015-04-20 Thread Norberto Meijome
Replica = 3 means 4 copies of your data ( for each shard, 1 master and 3 replicas) On 21/04/2015 7:54 am, TB txind...@gmail.com wrote: I have my indexes size @ 6 GB currently with replica set @ 1. I have 3 node cluster, in order to utilize the cluster , my understanding that i would have set

Re: How to diagnose slow queries every 10 minutes exactly?

2015-04-20 Thread David Pilato
Could you run a hot_threads API call when this happens? Anything in logs about GC? BTW 200 indices is a lot for 2 nodes. And how many shards/replicas do you have? Why do you need so many indices for 2m docs? David Le 21 avr. 2015 à 01:16, Dave Reed infinit...@gmail.com a écrit : I have a

Re: Bulk Index from Remote Host

2015-04-20 Thread David Pilato
That's fine but you need to split your bulk into smaller bulk requests. Don't send a 10gb bulk in one call! :) David Le 21 avr. 2015 à 00:40, TB txind...@gmail.com a écrit : We are planning to bulk insert about 10 Gig data ,however we are being forced to do this from a remote host. Is

Re: Index Size and Replica Impact

2015-04-20 Thread David Pilato
You don't have to set replicas to 3. It depends on the number of shards you have for your index. If you are using default (5), then you probably have today something like: Node 1 : 4 shards Node 2 : 3 shards Node 3 : 3 shards Each shard should be around 600mb size (If using all defaults). What

Elasticsearch service often goes down or gets killed

2015-04-20 Thread Sébastien Vassaux
Hello! My webserver is running ubuntu 14.10 with elasticsearch 1.5.0 and java 1.7u55 For some reason,* the elasticsearch service often goes down,* resulting in my website not being available to my users anymore (using FOSElasticaBundle with symfony). I am using systemctl to restart it

elasticsearch machine configuration

2015-04-20 Thread guoyiqincn
*Hi* folks: I want to know elasticsearch machine a single node recommended configuration, now my machine is 2 cpu 4G memory. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send

Re: Access to specific kibana dashboards

2015-04-20 Thread Rubaiyat Islam Sadat
Hi Mark, Thanks mate. I have marked it as complete and will try this solution. On Sunday, April 19, 2015 at 4:44:34 AM UTC+2, Mark Walkom wrote: If you load kibana up you will see it gives you URLs like /dashboard/file/default.json or /dashboard/elasticsearch/dashboardname.json.

Re: jdbcRiver rebuilding after restart.

2015-04-20 Thread GWired
I can't look at the feeder setup now but I could in the future. Is my SQL statement incorrect? Should I be doing something differently? Does the river not utilize created_at and updated_at in this setup? I don't have a where clause because I thought using the column strategy it would take

Re: Evaluating Moving to Discourse - Feedback Wanted

2015-04-20 Thread Ivan Brusic
I believe the best developers are cynics. Never trust someone else's code, that API, the OS, etc :) What bothers me about Discourse is that email is an afterthought. They have not built out that feature yet? For me, and apparently many others, email is the first concern. The transition is

Re: creation_date in index setteing

2015-04-20 Thread Prashant Agrawal
Hi All, I also require the indexing time to be returned by ES, but when i am firing the query like curl -XGET 'http://192.168.0.179:9200/16-04-2015-index/_settings' I am not able to get the index_creation time and getting the response as :

Re: creation_date in index setteing

2015-04-20 Thread tao hiko
Thank you very much Christian. On Monday, April 20, 2015 at 2:29:29 PM UTC+7, christian...@elasticsearch.com wrote: The creation date is given with millisecond precision. Take away the last 3 digits and you converter gives Fri, 06 Mar 2015 08:44:57 GMT fo r 1425631497. Christian On

Corrupted Index

2015-04-20 Thread Ranjith Venkatesan
Dear all, We are using ES-1.3.7 for our search Application. Sometime back we upgraded from 0.90.5 to 1.3.7. We have 2 master nodes and 3 data nodes. We are getting CorruptedIndexException when Shard initialization is happening. This is the second time we are facing such issue since last

Re: creation_date in index setteing

2015-04-20 Thread christian . dahlqvist
The creation date is given with millisecond precision. Take away the last 3 digits and you converter gives Fri, 06 Mar 2015 08:44:57 GMT for 1425631497. Christian On Monday, April 20, 2015 at 5:06:40 AM UTC+1, tao hiko wrote: I query setting information of index and found that have

Cannot read from Elasticsearch using Spark SQL

2015-04-20 Thread michele crudele
I wrote this simple notebook in scala using Elasticsearch Spark adapter: %AddJar file:///tools/elasticsearch-hadoop-2.1.0.Beta3/dist/elasticsearch-spark_2.10-2.1.0.BUILD-SNAPSHOT.jar %AddJar file:///tools/elasticsearch-hadoop-2.1.0.Beta3/dist/elasticsearch-hadoop-2.1.0.BUILD-SNAPSHOT.jar //

Re: Cannot read from Elasticsearch using Spark SQL

2015-04-20 Thread Costin Leau
Beta3 work with Spark SQL 1.0 and 1.1 Spark SQL 1.2 was released after that and broke binary backwards compatibility however his has been fixed in master/dev version [1] Note that Spark SQL 1.3 was released as well and again, broke backwards compatibility this time significant hence why there

Creating Snapshot Repository on Windows cluster

2015-04-20 Thread Sam Judson
Hi I'm having some trouble creating a snapshot repository on a cluster running on Windows. PUT _snapshot/main_backup { type: fs, settings: { location: gbr-t-ess-003\\Snapshots\\backup2\\, compress: true } } The above fails with a BlobStoreException: Failed to create

KIBANA-4 Flow Chart

2015-04-20 Thread vijay kumar Ks
Hi every one , i am doing flow chart for kibana4 which i am using. Now my doubt is since kibana home page loads using javascript files i am not able to follow such long scripts. Can any one help in doing a flow chart for kibana-4(or flow of kibana using scripts and html). thanks in advance --

Re: Distribute Search Results across a specific field / property

2015-04-20 Thread mark
I have a pull request in the works that adds an option for maintaining diversity in results: https://github.com/elastic/elasticsearch/pull/10221 This is mainly for the purposes of sample-based aggregations but if used with the top_hits aggregation it might give you some of what you need. Cheers

MongoDB river not copying all of the data from mongoDB to ES

2015-04-20 Thread Ramdev Wudali
Enter code here... Hi: I have been successful at creating a river between a MongoDB database and an Elasticsearch instance. The MongoDB for the database and specific collection has 8M+ documents. However when the river is setup and running less than 1/2 the number of docs are

Re: Elasticseach issue with some indicies not populating data

2015-04-20 Thread Don Pich
Also, sanity check: root@logstash:/var/log/logstash# iptables -L Chain INPUT (policy ACCEPT) target prot opt source destination Chain FORWARD (policy ACCEPT) target prot opt source destination Chain OUTPUT (policy ACCEPT) target prot opt source

Re: creation_date in index setteing

2015-04-20 Thread Colin Goodheart-Smithe
Prashant, What version of Elasticsearch are you using? The index creation date added to the index settings API in version 1.4.0 and will only show for indices created with that version or later (see https://github.com/elastic/elasticsearch/pull/7218). Colin On Monday, April 20, 2015 at

find missing documents in an index

2015-04-20 Thread seallison
Is there a way for Elasticsearch to tell me documents that are NOT in an index given a set of criteria? I have a field in my documents that contains a unique numerical id. There are some ids that are missing from documents in the index and I want to find those ids. For example: { product_id:

Re: Elasticseach issue with some indicies not populating data

2015-04-20 Thread David Pilato
Having unassigned shards is perfectly fine on a one node cluster. The fact that your cluster were yellow does not mean your cluster was not behaving correctly. -- David Pilato - Developer | Evangelist elastic.co @dadoonet https://twitter.com/dadoonet | @elasticsearchfr

Re: Elasticseach issue with some indicies not populating data

2015-04-20 Thread Don Pich
Thanks for that info. Again, training wheels... :-) So below is my logstash config. If I do a tcpdump on port 5044, I see all of my forwarders communicating with the logstash server. However, if I do a tcpdump on port 9300, I do not see any traffic. This leads me to believe that I have a

Re: How to configure max file descriptors on windows OS?

2015-04-20 Thread Xudong You
Thanks Mark! On Friday, April 17, 2015 at 6:22:24 AM UTC+8, Mark Walkom wrote: -1 means unbound, ie unlimited. On 16 April 2015 at 20:54, Xudong You xudon...@gmail.com javascript: wrote: Anyone knows how to change the max_file_descriptors on windows? I built ES cluster on Windows and got

Re: Elasticseach issue with some indicies not populating data

2015-04-20 Thread David Pilato
Might be. But you should ask this on the logstash mailing list. I think that elasticsearch is working fine here as you did not see any trouble in logs. That said I’d use: elasticsearch { protocol = http host = localhost } So using REST port (9200) that is. You can also add this

Distribute Search Results across a specific field / property

2015-04-20 Thread Frederik Lipfert
Hi Guys, I am using ES to build out the search for an online store. The operator would like to have the results being returned in a way that showcases the varieties of manufactures he offers. So instead of returning order by score he would like there to be one result from each store on each

Re: Elasticseach issue with some indicies not populating data

2015-04-20 Thread Don Pich
Thanks David. I will move over to logstash as I agree that is where it is starting to feel like the problem is there. I appreciate your help!! Don Pich | Jedi Master (aka System Administrator 2) | O: 701-952-5925 3320 Westrac Drive South, Suite A * Fargo, ND 58103 Facebook

Re: Elasticseach issue with some indicies not populating data

2015-04-20 Thread Don Pich
Hello David, I found and this online that made my cluster go 'green'. http://blog.trifork.com/2013/10/24/how-to-avoid-the-split-brain-problem-in-elasticsearch/ I don't know for certain if that was 100% of the problem, but there are no longer unassigned shards. root@logstash:/# curl -XGET

SHIELD terms lookup filter : AuthorizationException BUG

2015-04-20 Thread Bert Vermeiren
Hi, Using: * ElasticSearch 1.5.1 * SHIELD 1.2 Whenever I use a terms lookup filter in a search query, I get an UnAuthorizedException for the [__es_system_user] user although the actual user has even 'admin' role privileges. This seems a bug to me, where the terms filter does not have the

Re: jdbcRiver rebuilding after restart.

2015-04-20 Thread joergpra...@gmail.com
The column strategy is a community effort, it can manipulate SQL statement where clauses with timestamp filter. I do not have enough knowledge about column strategy. You are correct, at node restart, a river does not know from where to restart. There is no method to resolve this within river

Re: creation_date in index setteing

2015-04-20 Thread Prashant Agrawal
We are using version 1.3.0 On Apr 20, 2015 7:38 PM, Colin Goodheart-Smithe-2 [via Elasticsearch Users] ml-node+s115913n4073856...@n3.nabble.com wrote: Prashant, What version of Elasticsearch are you using? The index creation date added to the index settings API in version 1.4.0 and will

Cannot specify a query in the target index and through es.query when working with ES, Wikipedia River and Hive

2015-04-20 Thread Gordon
Hi, I've largely got everything setup to integrate ES and Hive. However, when I execute a query against the table wikitable as defined below, I get the error *Cannot specify a query in the target index and through es.query* Versions are ES Hive integration, 2.1.0.Beta3; ES, 1.4.4; and, I'm

Elasticsearch puppet module's problem

2015-04-20 Thread Sergey Zemlyanoy
Dear support, I'm trying to setup Es using this module - https://github.com/elastic/puppet-elasticsearch/blob/master/README.md For sake of testing I spin up a new VM and try to apply default module on it node somenode{ include elasticsearch } It ends up with rpm package installed but absent

Re: Elasticseach issue with some indicies not populating data

2015-04-20 Thread Don Pich
Hey Christian, 8 gigs of ram -Xms6g -Xmx6g Don Pich | Jedi Master (aka System Administrator 2) | O: 701-952-5925 3320 Westrac Drive South, Suite A * Fargo, ND 58103 Facebook http://www.facebook.com/RealTruck | Youtube http://www.youtube.com/realtruckcom| Twitter http://twitter.com/realtruck |

Search Scroll issue

2015-04-20 Thread Shawn Feldman
We are using scroll to do paging. We are encountering an issue where the last result from the initial search appears as the first result in our scroll request. so.. hits[length-1] == nextPageHits[0] This only seems to occur after we do a large series of writes and searches. Initially it

how to detect changes in database and automatically adding new row to elasticsearch index

2015-04-20 Thread snosek
What I've already done: I connected my hbase with elasticsearch via this tutorial: http://lessc0de.github.io/connecting_hbase_to_elasticsearch.html And I get index with hbase table content, but after adding new row to hbase, it is not automatically added to elasticsearch index. I tried

Re: Elasticseach issue with some indicies not populating data

2015-04-20 Thread christian . dahlqvist
HI, That sounds like a very large amount of shards for a node that size, and this is most likely the source of your problems. Each shard in Elasticsearch corresponds to a Lucene instance and carries with it a certain amount of overhead. You therefore do not want your shards to be too small.

Re: Elasticseach issue with some indicies not populating data

2015-04-20 Thread christian . dahlqvist
Hi, Having read through the thread it sounds like your configuration has been working in the past. Is that correct? If this is the case I would reiterate David's initial questions about your node's RAM and heap size as the number of shards look quite large for a single node. Could you please

Re: shingle filter for sub phrase matching

2015-04-20 Thread brian
Did you ever figure this out? I have the same exact issue but using different words. On Wednesday, July 23, 2014 at 10:37:03 AM UTC-4, Nick Tackes wrote: I have created a gist with an analyzer that uses filter shingle in attempt to match sub phrases. For instance I have entries in the

Horizontal Bar Chart in Kibana

2015-04-20 Thread Vijay Nagendrarao
Hi, I need to implement horizontal bar chart in Kibana 4. I need help regarding. Please let me know. Thanks, Vijay.C.N -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send

Query boost values available in script_score?

2015-04-20 Thread Kevin Reilly
Hi. Are query boost values available in script_score? Read the documentation with no success but perhaps I overlooked something. Thanks in advance. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop

Shards get not distributed across the cluster

2015-04-20 Thread horst knete
Hey guys, i am struggeling atm with the shard allocation in my ES cluster. The problem is that i got the index [logstash-2015.04.13] that lays on the node Storage1 with 12 shards. I want that the index get evenly distributed to the 2 other Storage-nodes, the node with the SSDs in it should be