Re: Using serialized doc_value instead of _source to improve read latency
Have you profiled it and seen that reading the source is actually the slow part? hot_threads can lie here so I'd go with a profiler or just sigquit or something. I've got some reasonably big documents and generally don't see that as a problem even under decent load. I could see an argument for a second source field with the long stuff removed if you see the json decode or the disk read of the source be really slow - but transform doesn't do that. Nik On Mon, Apr 20, 2015 at 7:57 PM, Itai Frenkel itaifren...@live.com wrote: A quick check shows there is no significant performance gain between doc_value and stored field that is not a doc value. I suppose there are warm-up and file system caching issues are at play. I do not have that field in the source since the ETL process at this point does not generate it. The ETL could be fixed and then it will generate the required field. However, even then I would still prefer doc_field over _source since I do not need _source at all. You are right to assume that reading the entire source parsing it and returning only one field would be fast (since the cpu is in the json generator I suspect, and not the parser, but that requires more work). On Tuesday, April 21, 2015 at 2:25:22 AM UTC+3, Itamar Syn-Hershko wrote: What if all those fields are collapsed to one, like you suggest, but that one field is projected out of _source (think non-indexed json in a string field)? do you see a noticable performance gain then? What if that field is set to be stored (and loaded using fields, not via _source)? what is the performance gain then? Fielddata and the doc_values optimization on top of them will not help you here, those data structures aren't being used for sending data out, only for aggregations and sorting. Also, using fielddata will require indexing those fields; it is apparent that you are not looking to be doing that. -- Itamar Syn-Hershko http://code972.com | @synhershko https://twitter.com/synhershko Freelance Developer Consultant Lucene.NET committer and PMC member On Tue, Apr 21, 2015 at 12:14 AM, Itai Frenkel itaif...@live.com wrote: Itamar, 1. The _source field includes many fields that are only being indexed, and many fields that are only needed as a query search result. _source includes them both.The projection from _source from the query result is too CPU intensive to do during search time for each result, especially if the size is big. 2. I agree that adding another NoSQL could solve this problem, however it is currently out of scope, as it would require syncing data with another data store. 3. Wouldn't a big stored field will bloat the lucene index size? Even if not, isn't non_analyzed fields are destined to be (or already are) doc_fields? On Tuesday, April 21, 2015 at 1:36:20 AM UTC+3, Itamar Syn-Hershko wrote: This is how _source works. doc_values don't make sense in this regard - what you are looking for is using stored fields and have the transform script write to that. Loading stored fields (even one field per hit) may be slower than loading and parsing _source, though. I'd just put this logic in the indexer, though. It will definitely help with other things as well, such as nasty huge mappings. Alternatively, find a way to avoid IO completely. How about using ES for search and something like riak for loading the actual data, if IO costs are so noticable? -- Itamar Syn-Hershko http://code972.com | @synhershko https://twitter.com/synhershko Freelance Developer Consultant Lucene.NET committer and PMC member On Mon, Apr 20, 2015 at 11:18 PM, Itai Frenkel itaif...@live.com wrote: Hi, We are having a performance problem in which for each hit, elasticsearch parses the entire _source then generates a new Json with only the requested query _source fields. In order to overcome this issue we would like to use mapping transform script that serializes the requested query fields (which is known in advance) into a doc_value. Does that makes sense? The actual problem with the transform script is SecurityException that does not allow using any json serialization mechanism. A binary serialization would also be ok. Itai -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b897aba2-c250-4474-a03f-1d2a993baef9%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/b897aba2-c250-4474-a03f-1d2a993baef9%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to
Index Size and Replica Impact
I have my indexes size @ 6 GB currently with replica set @ 1. I have 3 node cluster, in order to utilize the cluster , my understanding that i would have set the replica to 3. If i do that, would my index size grow more than 6 GB in each node? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b3a6247f-a2a1-446f-8ed5-e93be4672cc3%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Using serialized doc_value instead of _source to improve read latency
Hi, We are having a performance problem in which for each hit, elasticsearch parses the entire _source then generates a new Json with only the requested query _source fields. In order to overcome this issue we would like to use mapping transform script that serializes the requested query fields (which is known in advance) into a doc_value. Does that makes sense? The actual problem with the transform script is SecurityException that does not allow using any json serialization mechanism. A binary serialization would also be ok. Itai -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b897aba2-c250-4474-a03f-1d2a993baef9%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Using serialized doc_value instead of _source to improve read latency
Itamar, 1. The _source field includes many fields that are only being indexed, and many fields that are only needed as a query search result. _source includes them both.The projection from _source from the query result is too CPU intensive to do during search time for each result, especially if the size is big. 2. I agree that adding another NoSQL could solve this problem, however it is currently out of scope, as it would require syncing data with another data store. 3. Wouldn't a big stored field will bloat the lucene index size? Even if not, isn't non_analyzed fields are destined to be (or already are) doc_fields? On Tuesday, April 21, 2015 at 1:36:20 AM UTC+3, Itamar Syn-Hershko wrote: This is how _source works. doc_values don't make sense in this regard - what you are looking for is using stored fields and have the transform script write to that. Loading stored fields (even one field per hit) may be slower than loading and parsing _source, though. I'd just put this logic in the indexer, though. It will definitely help with other things as well, such as nasty huge mappings. Alternatively, find a way to avoid IO completely. How about using ES for search and something like riak for loading the actual data, if IO costs are so noticable? -- Itamar Syn-Hershko http://code972.com | @synhershko https://twitter.com/synhershko Freelance Developer Consultant Lucene.NET committer and PMC member On Mon, Apr 20, 2015 at 11:18 PM, Itai Frenkel itaif...@live.com javascript: wrote: Hi, We are having a performance problem in which for each hit, elasticsearch parses the entire _source then generates a new Json with only the requested query _source fields. In order to overcome this issue we would like to use mapping transform script that serializes the requested query fields (which is known in advance) into a doc_value. Does that makes sense? The actual problem with the transform script is SecurityException that does not allow using any json serialization mechanism. A binary serialization would also be ok. Itai -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com javascript:. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b897aba2-c250-4474-a03f-1d2a993baef9%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/b897aba2-c250-4474-a03f-1d2a993baef9%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/630a2998-e2a9-44a3-9c93-e692be2c2338%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Using serialized doc_value instead of _source to improve read latency
This is how _source works. doc_values don't make sense in this regard - what you are looking for is using stored fields and have the transform script write to that. Loading stored fields (even one field per hit) may be slower than loading and parsing _source, though. I'd just put this logic in the indexer, though. It will definitely help with other things as well, such as nasty huge mappings. Alternatively, find a way to avoid IO completely. How about using ES for search and something like riak for loading the actual data, if IO costs are so noticable? -- Itamar Syn-Hershko http://code972.com | @synhershko https://twitter.com/synhershko Freelance Developer Consultant Lucene.NET committer and PMC member On Mon, Apr 20, 2015 at 11:18 PM, Itai Frenkel itaifren...@live.com wrote: Hi, We are having a performance problem in which for each hit, elasticsearch parses the entire _source then generates a new Json with only the requested query _source fields. In order to overcome this issue we would like to use mapping transform script that serializes the requested query fields (which is known in advance) into a doc_value. Does that makes sense? The actual problem with the transform script is SecurityException that does not allow using any json serialization mechanism. A binary serialization would also be ok. Itai -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b897aba2-c250-4474-a03f-1d2a993baef9%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/b897aba2-c250-4474-a03f-1d2a993baef9%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAHTr4Zsmri8LvzAqnXrwCA7B2PesCtH05BQxmj%3D3vMr%2B9abikw%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: Using serialized doc_value instead of _source to improve read latency
A quick check shows there is no significant performance gain between doc_value and stored field that is not a doc value. I suppose there are warm-up and file system caching issues are at play. I do not have that field in the source since the ETL process at this point does not generate it. The ETL could be fixed and then it will generate the required field. However, even then I would still prefer doc_field over _source since I do not need _source at all. You are right to assume that reading the entire source parsing it and returning only one field would be fast (since the cpu is in the json generator I suspect, and not the parser, but that requires more work). On Tuesday, April 21, 2015 at 2:25:22 AM UTC+3, Itamar Syn-Hershko wrote: What if all those fields are collapsed to one, like you suggest, but that one field is projected out of _source (think non-indexed json in a string field)? do you see a noticable performance gain then? What if that field is set to be stored (and loaded using fields, not via _source)? what is the performance gain then? Fielddata and the doc_values optimization on top of them will not help you here, those data structures aren't being used for sending data out, only for aggregations and sorting. Also, using fielddata will require indexing those fields; it is apparent that you are not looking to be doing that. -- Itamar Syn-Hershko http://code972.com | @synhershko https://twitter.com/synhershko Freelance Developer Consultant Lucene.NET committer and PMC member On Tue, Apr 21, 2015 at 12:14 AM, Itai Frenkel itaif...@live.com javascript: wrote: Itamar, 1. The _source field includes many fields that are only being indexed, and many fields that are only needed as a query search result. _source includes them both.The projection from _source from the query result is too CPU intensive to do during search time for each result, especially if the size is big. 2. I agree that adding another NoSQL could solve this problem, however it is currently out of scope, as it would require syncing data with another data store. 3. Wouldn't a big stored field will bloat the lucene index size? Even if not, isn't non_analyzed fields are destined to be (or already are) doc_fields? On Tuesday, April 21, 2015 at 1:36:20 AM UTC+3, Itamar Syn-Hershko wrote: This is how _source works. doc_values don't make sense in this regard - what you are looking for is using stored fields and have the transform script write to that. Loading stored fields (even one field per hit) may be slower than loading and parsing _source, though. I'd just put this logic in the indexer, though. It will definitely help with other things as well, such as nasty huge mappings. Alternatively, find a way to avoid IO completely. How about using ES for search and something like riak for loading the actual data, if IO costs are so noticable? -- Itamar Syn-Hershko http://code972.com | @synhershko https://twitter.com/synhershko Freelance Developer Consultant Lucene.NET committer and PMC member On Mon, Apr 20, 2015 at 11:18 PM, Itai Frenkel itaif...@live.com wrote: Hi, We are having a performance problem in which for each hit, elasticsearch parses the entire _source then generates a new Json with only the requested query _source fields. In order to overcome this issue we would like to use mapping transform script that serializes the requested query fields (which is known in advance) into a doc_value. Does that makes sense? The actual problem with the transform script is SecurityException that does not allow using any json serialization mechanism. A binary serialization would also be ok. Itai -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b897aba2-c250-4474-a03f-1d2a993baef9%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/b897aba2-c250-4474-a03f-1d2a993baef9%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com javascript:. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/630a2998-e2a9-44a3-9c93-e692be2c2338%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/630a2998-e2a9-44a3-9c93-e692be2c2338%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups
Re: Using serialized doc_value instead of _source to improve read latency
Also - does fielddata: { loading: eager } makes sense with doc_values in this use case? Would that combination be supported in the future? On Tuesday, April 21, 2015 at 2:14:03 AM UTC+3, Itai Frenkel wrote: Itamar, 1. The _source field includes many fields that are only being indexed, and many fields that are only needed as a query search result. _source includes them both.The projection from _source from the query result is too CPU intensive to do during search time for each result, especially if the size is big. 2. I agree that adding another NoSQL could solve this problem, however it is currently out of scope, as it would require syncing data with another data store. 3. Wouldn't a big stored field will bloat the lucene index size? Even if not, isn't non_analyzed fields are destined to be (or already are) doc_fields? On Tuesday, April 21, 2015 at 1:36:20 AM UTC+3, Itamar Syn-Hershko wrote: This is how _source works. doc_values don't make sense in this regard - what you are looking for is using stored fields and have the transform script write to that. Loading stored fields (even one field per hit) may be slower than loading and parsing _source, though. I'd just put this logic in the indexer, though. It will definitely help with other things as well, such as nasty huge mappings. Alternatively, find a way to avoid IO completely. How about using ES for search and something like riak for loading the actual data, if IO costs are so noticable? -- Itamar Syn-Hershko http://code972.com | @synhershko https://twitter.com/synhershko Freelance Developer Consultant Lucene.NET committer and PMC member On Mon, Apr 20, 2015 at 11:18 PM, Itai Frenkel itaif...@live.com wrote: Hi, We are having a performance problem in which for each hit, elasticsearch parses the entire _source then generates a new Json with only the requested query _source fields. In order to overcome this issue we would like to use mapping transform script that serializes the requested query fields (which is known in advance) into a doc_value. Does that makes sense? The actual problem with the transform script is SecurityException that does not allow using any json serialization mechanism. A binary serialization would also be ok. Itai -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b897aba2-c250-4474-a03f-1d2a993baef9%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/b897aba2-c250-4474-a03f-1d2a993baef9%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/d5abaeac-ff16-45ac-bb3d-62b53e497795%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
2.0 ETA
Is there an ETA for 2.0? -- Thanks, Matt Weber -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAJ3KEoAa69%3D%2B4NC608ptZzGZmF%2BTiW72yCMikS%2BRKM3RX08KCg%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
How to diagnose slow queries every 10 minutes exactly?
I have a 2-node cluster running on some beefy machines. 12g and 16g of heap space. About 2.1 million documents, each relatively small in size, spread across 200 or so indexes. The refresh interval is 0.5s (while I don't need realtime I do need relatively quick refreshes). Documents are continuously modified by the app, so reindex requests trickle in constantly. By trickle I mean maybe a dozen a minute. All index requests are made with _bulk, although a lot of the time there's only 1 in the list. Searches are very fast -- normally taking 50ms or less. But oddly, exactly every 10 minutes, searches slow down for a moment. The exact same query that normally takes 50ms takes 9000ms, for example. Any other queries regardless of what they are also take multiple seconds to complete. Once this moment passes, search queries return to normal. I have a tester I wrote that continuously posts the same query and logs the results, which is how I discovered this pattern. Here's an excerpt. Notice that query time is great at 3:49:10, then at :11 things stop for 10 seconds. At :21 the queued up searches finally come through. The numbers reported are the took field from the ES search response. Then things resume as normal. This is true no matter which node I run the search against. This pattern repeats like this every 10 minutes, to the second, for days now. 3:49:09, 47 3:49:09, 63 3:49:10, 31 3:49:10, 47 3:49:11, 47 3:49:11, 62 3:49:21, 8456 3:49:21, 5460 3:49:21, 7457 3:49:21, 4509 3:49:21, 3510 3:49:21, 515 3:49:21, 1498 3:49:21, 2496 3:49:22, 2465 3:49:22, 2636 3:49:22, 6506 3:49:22, 7504 3:49:22, 9501 3:49:22, 4509 3:49:22, 1638 3:49:22, 6646 3:49:22, 9641 3:49:22, 655 3:49:22, 3667 3:49:22, 31 3:49:22, 78 3:49:22, 47 3:49:23, 47 3:49:23, 47 3:49:24, 93 I've ruled out any obvious periodical process running on either node in the cluster. It's pretty clean. Disk I/O, CPU, RAM, etc stays pretty consistent and flat even during one of these blips. These are beefy machines like I said, so there's plenty of CPU and RAM available. Any advice on how I can figure out what ES is waiting for would be helpful. Is there any process ES runs every 10 minutes by default? Thanks! -Dave -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/0c715e1e-f45b-46fb-ae39-52318e126203%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Using serialized doc_value instead of _source to improve read latency
What if all those fields are collapsed to one, like you suggest, but that one field is projected out of _source (think non-indexed json in a string field)? do you see a noticable performance gain then? What if that field is set to be stored (and loaded using fields, not via _source)? what is the performance gain then? Fielddata and the doc_values optimization on top of them will not help you here, those data structures aren't being used for sending data out, only for aggregations and sorting. Also, using fielddata will require indexing those fields; it is apparent that you are not looking to be doing that. -- Itamar Syn-Hershko http://code972.com | @synhershko https://twitter.com/synhershko Freelance Developer Consultant Lucene.NET committer and PMC member On Tue, Apr 21, 2015 at 12:14 AM, Itai Frenkel itaifren...@live.com wrote: Itamar, 1. The _source field includes many fields that are only being indexed, and many fields that are only needed as a query search result. _source includes them both.The projection from _source from the query result is too CPU intensive to do during search time for each result, especially if the size is big. 2. I agree that adding another NoSQL could solve this problem, however it is currently out of scope, as it would require syncing data with another data store. 3. Wouldn't a big stored field will bloat the lucene index size? Even if not, isn't non_analyzed fields are destined to be (or already are) doc_fields? On Tuesday, April 21, 2015 at 1:36:20 AM UTC+3, Itamar Syn-Hershko wrote: This is how _source works. doc_values don't make sense in this regard - what you are looking for is using stored fields and have the transform script write to that. Loading stored fields (even one field per hit) may be slower than loading and parsing _source, though. I'd just put this logic in the indexer, though. It will definitely help with other things as well, such as nasty huge mappings. Alternatively, find a way to avoid IO completely. How about using ES for search and something like riak for loading the actual data, if IO costs are so noticable? -- Itamar Syn-Hershko http://code972.com | @synhershko https://twitter.com/synhershko Freelance Developer Consultant Lucene.NET committer and PMC member On Mon, Apr 20, 2015 at 11:18 PM, Itai Frenkel itaif...@live.com wrote: Hi, We are having a performance problem in which for each hit, elasticsearch parses the entire _source then generates a new Json with only the requested query _source fields. In order to overcome this issue we would like to use mapping transform script that serializes the requested query fields (which is known in advance) into a doc_value. Does that makes sense? The actual problem with the transform script is SecurityException that does not allow using any json serialization mechanism. A binary serialization would also be ok. Itai -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b897aba2-c250-4474-a03f-1d2a993baef9%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/b897aba2-c250-4474-a03f-1d2a993baef9%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/630a2998-e2a9-44a3-9c93-e692be2c2338%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/630a2998-e2a9-44a3-9c93-e692be2c2338%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAHTr4ZuxvUoZ4L%2BUq0G82GLZKYfN-hj_e_gez6RsUc3hZeHbyw%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Bulk Index from Remote Host
We are planning to bulk insert about 10 Gig data ,however we are being forced to do this from a remote host. Is this a good practice? And are there any potential issues i should watch out for? any advice would be great -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/479edffe-e780-4858-b093-676b1837d668%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Using serialized doc_value instead of _source to improve read latency
If I could focus the question better : How do I whitelist a specific class in the groovy script inside transform ? On Tuesday, April 21, 2015 at 1:18:03 AM UTC+3, Itai Frenkel wrote: Hi, We are having a performance problem in which for each hit, elasticsearch parses the entire _source then generates a new Json with only the requested query _source fields. In order to overcome this issue we would like to use mapping transform script that serializes the requested query fields (which is known in advance) into a doc_value. Does that makes sense? The actual problem with the transform script is SecurityException that does not allow using any json serialization mechanism. A binary serialization would also be ok. Itai -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/e925c3b6-b102-413c-a320-62f1c0ffcf99%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Using serialized doc_value instead of _source to improve read latency
Hi Nik, when _source : true the time it takes for the search to complete in elasticsearch is very short. when _souce is a list of fields it is significantly slower. Itai On Tuesday, April 21, 2015 at 3:06:06 AM UTC+3, Nikolas Everett wrote: Have you profiled it and seen that reading the source is actually the slow part? hot_threads can lie here so I'd go with a profiler or just sigquit or something. I've got some reasonably big documents and generally don't see that as a problem even under decent load. I could see an argument for a second source field with the long stuff removed if you see the json decode or the disk read of the source be really slow - but transform doesn't do that. Nik On Mon, Apr 20, 2015 at 7:57 PM, Itai Frenkel itaif...@live.com javascript: wrote: A quick check shows there is no significant performance gain between doc_value and stored field that is not a doc value. I suppose there are warm-up and file system caching issues are at play. I do not have that field in the source since the ETL process at this point does not generate it. The ETL could be fixed and then it will generate the required field. However, even then I would still prefer doc_field over _source since I do not need _source at all. You are right to assume that reading the entire source parsing it and returning only one field would be fast (since the cpu is in the json generator I suspect, and not the parser, but that requires more work). On Tuesday, April 21, 2015 at 2:25:22 AM UTC+3, Itamar Syn-Hershko wrote: What if all those fields are collapsed to one, like you suggest, but that one field is projected out of _source (think non-indexed json in a string field)? do you see a noticable performance gain then? What if that field is set to be stored (and loaded using fields, not via _source)? what is the performance gain then? Fielddata and the doc_values optimization on top of them will not help you here, those data structures aren't being used for sending data out, only for aggregations and sorting. Also, using fielddata will require indexing those fields; it is apparent that you are not looking to be doing that. -- Itamar Syn-Hershko http://code972.com | @synhershko https://twitter.com/synhershko Freelance Developer Consultant Lucene.NET committer and PMC member On Tue, Apr 21, 2015 at 12:14 AM, Itai Frenkel itaif...@live.com wrote: Itamar, 1. The _source field includes many fields that are only being indexed, and many fields that are only needed as a query search result. _source includes them both.The projection from _source from the query result is too CPU intensive to do during search time for each result, especially if the size is big. 2. I agree that adding another NoSQL could solve this problem, however it is currently out of scope, as it would require syncing data with another data store. 3. Wouldn't a big stored field will bloat the lucene index size? Even if not, isn't non_analyzed fields are destined to be (or already are) doc_fields? On Tuesday, April 21, 2015 at 1:36:20 AM UTC+3, Itamar Syn-Hershko wrote: This is how _source works. doc_values don't make sense in this regard - what you are looking for is using stored fields and have the transform script write to that. Loading stored fields (even one field per hit) may be slower than loading and parsing _source, though. I'd just put this logic in the indexer, though. It will definitely help with other things as well, such as nasty huge mappings. Alternatively, find a way to avoid IO completely. How about using ES for search and something like riak for loading the actual data, if IO costs are so noticable? -- Itamar Syn-Hershko http://code972.com | @synhershko https://twitter.com/synhershko Freelance Developer Consultant Lucene.NET committer and PMC member On Mon, Apr 20, 2015 at 11:18 PM, Itai Frenkel itaif...@live.com wrote: Hi, We are having a performance problem in which for each hit, elasticsearch parses the entire _source then generates a new Json with only the requested query _source fields. In order to overcome this issue we would like to use mapping transform script that serializes the requested query fields (which is known in advance) into a doc_value. Does that makes sense? The actual problem with the transform script is SecurityException that does not allow using any json serialization mechanism. A binary serialization would also be ok. Itai -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b897aba2-c250-4474-a03f-1d2a993baef9%40googlegroups.com
Re: 2.0 ETA
Thanks Adrien! On Mon, Apr 20, 2015 at 3:38 PM, Adrien Grand adr...@elastic.co wrote: Hi Matt, We have this meta issue which tracks what remains to be done before we release 2.0: https://github.com/elastic/elasticsearch/issues/9970. We plan to release as soon as we can but some of these issues are quite challenging so it's hard to give you an ETA. It should be a matter of months but I can't tell how many yet. However, even if the GA release might still take time, there will be beta releases before as we make progress through this checklist. Sorry if my answer does not give as much information as you hoped, we are all looking forward to this release and items on this checklist are very high priorities! On Mon, Apr 20, 2015 at 10:55 PM, Matt Weber m...@mattweber.org wrote: Is there an ETA for 2.0? -- Thanks, Matt Weber -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAJ3KEoAa69%3D%2B4NC608ptZzGZmF%2BTiW72yCMikS%2BRKM3RX08KCg%40mail.gmail.com https://groups.google.com/d/msgid/elasticsearch/CAJ3KEoAa69%3D%2B4NC608ptZzGZmF%2BTiW72yCMikS%2BRKM3RX08KCg%40mail.gmail.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- Adrien -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAO5%3DkAiRXH0w8QG_cO5cWgMdpVNwv9On%3Da6TbxWfD6%3DY6yjBHQ%40mail.gmail.com https://groups.google.com/d/msgid/elasticsearch/CAO5%3DkAiRXH0w8QG_cO5cWgMdpVNwv9On%3Da6TbxWfD6%3DY6yjBHQ%40mail.gmail.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAJ3KEoAF%2Bv9v50SWQHR_%2BF68sSAm8wd1A_%3DA_UppiYFLhsyETg%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: Index Size and Replica Impact
Replica = 3 means 4 copies of your data ( for each shard, 1 master and 3 replicas) On 21/04/2015 7:54 am, TB txind...@gmail.com wrote: I have my indexes size @ 6 GB currently with replica set @ 1. I have 3 node cluster, in order to utilize the cluster , my understanding that i would have set the replica to 3. If i do that, would my index size grow more than 6 GB in each node? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b3a6247f-a2a1-446f-8ed5-e93be4672cc3%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/b3a6247f-a2a1-446f-8ed5-e93be4672cc3%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CACj2-4KPdHs%3DpUUWxdSNHn4HmH20mqofesO%3DcJpBEMSqpFtE9A%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: How to diagnose slow queries every 10 minutes exactly?
Could you run a hot_threads API call when this happens? Anything in logs about GC? BTW 200 indices is a lot for 2 nodes. And how many shards/replicas do you have? Why do you need so many indices for 2m docs? David Le 21 avr. 2015 à 01:16, Dave Reed infinit...@gmail.com a écrit : I have a 2-node cluster running on some beefy machines. 12g and 16g of heap space. About 2.1 million documents, each relatively small in size, spread across 200 or so indexes. The refresh interval is 0.5s (while I don't need realtime I do need relatively quick refreshes). Documents are continuously modified by the app, so reindex requests trickle in constantly. By trickle I mean maybe a dozen a minute. All index requests are made with _bulk, although a lot of the time there's only 1 in the list. Searches are very fast -- normally taking 50ms or less. But oddly, exactly every 10 minutes, searches slow down for a moment. The exact same query that normally takes 50ms takes 9000ms, for example. Any other queries regardless of what they are also take multiple seconds to complete. Once this moment passes, search queries return to normal. I have a tester I wrote that continuously posts the same query and logs the results, which is how I discovered this pattern. Here's an excerpt. Notice that query time is great at 3:49:10, then at :11 things stop for 10 seconds. At :21 the queued up searches finally come through. The numbers reported are the took field from the ES search response. Then things resume as normal. This is true no matter which node I run the search against. This pattern repeats like this every 10 minutes, to the second, for days now. 3:49:09, 47 3:49:09, 63 3:49:10, 31 3:49:10, 47 3:49:11, 47 3:49:11, 62 3:49:21, 8456 3:49:21, 5460 3:49:21, 7457 3:49:21, 4509 3:49:21, 3510 3:49:21, 515 3:49:21, 1498 3:49:21, 2496 3:49:22, 2465 3:49:22, 2636 3:49:22, 6506 3:49:22, 7504 3:49:22, 9501 3:49:22, 4509 3:49:22, 1638 3:49:22, 6646 3:49:22, 9641 3:49:22, 655 3:49:22, 3667 3:49:22, 31 3:49:22, 78 3:49:22, 47 3:49:23, 47 3:49:23, 47 3:49:24, 93 I've ruled out any obvious periodical process running on either node in the cluster. It's pretty clean. Disk I/O, CPU, RAM, etc stays pretty consistent and flat even during one of these blips. These are beefy machines like I said, so there's plenty of CPU and RAM available. Any advice on how I can figure out what ES is waiting for would be helpful. Is there any process ES runs every 10 minutes by default? Thanks! -Dave -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/0c715e1e-f45b-46fb-ae39-52318e126203%40googlegroups.com. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/2863E632-E94D-4BFA-851B-2D487EC23276%40pilato.fr. For more options, visit https://groups.google.com/d/optout.
Re: Bulk Index from Remote Host
That's fine but you need to split your bulk into smaller bulk requests. Don't send a 10gb bulk in one call! :) David Le 21 avr. 2015 à 00:40, TB txind...@gmail.com a écrit : We are planning to bulk insert about 10 Gig data ,however we are being forced to do this from a remote host. Is this a good practice? And are there any potential issues i should watch out for? any advice would be great -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/479edffe-e780-4858-b093-676b1837d668%40googlegroups.com. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/65B2211D-B9EE-42B4-B522-4B21624B47AF%40pilato.fr. For more options, visit https://groups.google.com/d/optout.
Re: Index Size and Replica Impact
You don't have to set replicas to 3. It depends on the number of shards you have for your index. If you are using default (5), then you probably have today something like: Node 1 : 4 shards Node 2 : 3 shards Node 3 : 3 shards Each shard should be around 600mb size (If using all defaults). What are your exact index settings today? David Le 20 avr. 2015 à 23:54, TB txind...@gmail.com a écrit : I have my indexes size @ 6 GB currently with replica set @ 1. I have 3 node cluster, in order to utilize the cluster , my understanding that i would have set the replica to 3. If i do that, would my index size grow more than 6 GB in each node? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b3a6247f-a2a1-446f-8ed5-e93be4672cc3%40googlegroups.com. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/517407CD-0008-4C68-89A9-29FC3B0E18DD%40pilato.fr. For more options, visit https://groups.google.com/d/optout.
Elasticsearch service often goes down or gets killed
Hello! My webserver is running ubuntu 14.10 with elasticsearch 1.5.0 and java 1.7u55 For some reason,* the elasticsearch service often goes down,* resulting in my website not being available to my users anymore (using FOSElasticaBundle with symfony). I am using systemctl to restart it automatically, but I would prefer a good fix once and for all. I feel the logs I have are not descriptive enough. Being pretty new to managing a server, I need some help. Can someone help me figure out the reason for this failure ? What are the right files I can output here to better understand the issue ? Thanks ! *My systemctl status gives :* elasticsearch.service - ElasticSearch Loaded: loaded (/usr/lib/systemd/system/elasticsearch.service; enabled) Active: active (running) since Mon 2015-04-20 12:04:24 CEST; 1h 56min ago - Here it means restarted 1h56 ago. Why did it fail in the first place ? Main PID: 9120 (java) CGroup: /system.slice/elasticsearch.service └─9120 /usr/bin/java -Xms256m -Xmx1g -Djava.awt.headless=true -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingO... *In my journalctl, I have :* Apr 18 18:56:19 xx.ovh.net sshd[29397]: error: open /dev/tty failed - could not set controlling tty: Permission denied Apr 20 13:52:45 xx.ovh.net sshd[9764]: error: open /dev/tty failed - could not set controlling tty: Permission denied -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/f0f6e4d3-35f6-4d63-ab91-bf048332e467%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
elasticsearch machine configuration
*Hi* folks: I want to know elasticsearch machine a single node recommended configuration, now my machine is 2 cpu 4G memory. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/38c900ff-b6ec-483c-86ab-80342e945cd9%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Access to specific kibana dashboards
Hi Mark, Thanks mate. I have marked it as complete and will try this solution. On Sunday, April 19, 2015 at 4:44:34 AM UTC+2, Mark Walkom wrote: If you load kibana up you will see it gives you URLs like /dashboard/file/default.json or /dashboard/elasticsearch/dashboardname.json. Using those paths you can then limit access. On 17 April 2015 at 15:54, Rubaiyat Islam Sadat rubaiyati...@gmail.com javascript: wrote: Thanks Mark for your kind reply. Would you a be bit more specific as I am a newbie? I am sorry if I had not been clear enough what I want to achieve. As far as I know that Apache level access is based on relative static path/url, it won’t know detail how kibana works. I would like to restrict access to 'some' of the kibana dashboards, not all. Is it possible to achieve by configuring on the Kibana side? If on the apache side, do I have restrict the specific URLs of the Kibana dashboard to the specific group of people, e.g. as follows. Location /someDir Order deny,allow deny from all allow from 192.168. allow from 104.113. /Location Location /anotherDir Order deny,allow deny from all allow from 192.168. allow from 104.113. /Location In this case, for example, if I want to restrict an URL like http://myESHost:9200/_plugin/kopf/#/!/cluster, what do I have to put after Location /???. Sorry if I have asked a very naive question. Thanks again for your time. Cheers! Ruby On Friday, April 17, 2015 at 12:23:50 AM UTC+2, Mark Walkom wrote: You could do this with apache/nginx ACLs as KB3 simply loads a path, either a file from the server's FS or from ES. If you load it up you will see it in the URL. On 16 April 2015 at 21:58, Rubaiyat Islam Sadat rubaiyati...@gmail.com wrote: Hi all, As a completely newbie here, I am going to ask you a question which you might find find naive (or stupid!). I have a scenario where I would like to restrict access from specific locations (say, IP addresses) to access *'specific'* dashboards in Kibana. As far as I know that Apache level access is based on relative static path/url, it won’t know detail how kibana works. Is there any way/suggestion to control which users can load which dashboards? Or may be I'm wrong, there is a way to do that. Your suggestions would be really helpful. I am using Kibana 3 and I am not in a position to use Shield. Cheers! Ruby -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/cc652358-4d42-4263-9238-a76f42de5dad%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/cc652358-4d42-4263-9238-a76f42de5dad%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com javascript:. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3ca9c8e6-4861-46dc-9b34-b64931b46869%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/3ca9c8e6-4861-46dc-9b34-b64931b46869%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/67a39f7b-fdc5-4d54-8b9a-2b900c6e1883%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: jdbcRiver rebuilding after restart.
I can't look at the feeder setup now but I could in the future. Is my SQL statement incorrect? Should I be doing something differently? Does the river not utilize created_at and updated_at in this setup? I don't have a where clause because I thought using the column strategy it would take that in to account. This is an example of what I see in SQL server: SELECT id as _id, * FROM [MyDBName].[dbo].[MyTableName] WHERE ({fn TIMESTAMPDIFF(SQL_TSI_SECOND,@P0,createddate)} = 0) Which when i populate the @P0 with a timestamp it seems to be working fine. On a restart I'm guessing it doesn't know when to start. Any way that I can check values in elasticsearch within the column strategy? Such as using Max(CreatedDate) so that it can start there? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/06e9ce54-8b71-4337-971b-440a5b56f00d%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Evaluating Moving to Discourse - Feedback Wanted
I believe the best developers are cynics. Never trust someone else's code, that API, the OS, etc :) What bothers me about Discourse is that email is an afterthought. They have not built out that feature yet? For me, and apparently many others, email is the first concern. The transition is understandable if you want to transition from a closed system. The other reason, not enough fragmentation, is worrisome. If no one uses the logstash list, that is a problem with the site/documentation, not the mailing list itself. I cringe at the thought of an Elasticsearch forum with a dozen subforums. Ivan On Apr 15, 2015 7:21 PM, Leslie Hawthorn leslie.hawth...@elastic.co wrote: On Wed, Apr 15, 2015 at 9:02 AM, Ivan Brusic i...@brusic.com wrote: I should clarify that I have no issues moving to Discourse, as long as instantaneous email interaction is preserved, just wanted to point out that I see no issues with the mailing lists. Understood. The question is moot anyways since the change will happen regardless of our inputs. Actually, I'm maintaining our Forums pre-launch checklist where there's a line item for don't move forward based on community feedback. I respectfully disagree with your assessment that the change will happen regardless of input from the community. We asked for feedback for a reason. :) I hope we can subscribe to Discourse mailing lists without needing an account. You'll need an account, but it's a one-time login to set up your preferences and then read/interact solely via email. Cheers, LH Cheers, Ivan On Apr 13, 2015 7:13 PM, Leslie Hawthorn leslie.hawth...@elastic.co wrote: Thanks for your feedback, Ivan. There's no plan to remove threads from the forums, so information would always be archived there as well. Does that impact your thoughts on moving to Discourse? Folks, please keep the feedback coming! Cheers, LH On Sat, Apr 11, 2015 at 12:09 AM, Ivan Brusic i...@brusic.com wrote: As one of the oldest and most frequent users (before my sabbatical) of the mailing list, I just wanted to say that I never had an issue with it. It works. As long as I could continue using only email, I am happy. For realtime communication, there is the IRC channel. If prefer the mailing list since everything is archived. Ivan On Apr 2, 2015 5:36 PM, leslie.hawthorn leslie.hawth...@elastic.co wrote: Hello everyone, As we’ve begun to scale up development on three different open source projects, we’ve found Google Groups to be a difficult solution for dealing with all of our needs for community support. We’ve got multiple mailing lists going, which can be confusing for new folks trying to figure out where to go to ask a question. We’ve also found our lists are becoming noisy in the “good problem to have” kind of way. As we’ve seen more user adoption, and across such a wide variety of use cases, we’re getting widely different types of questions asked. For example, I can imagine that folks not using our Python client would rather not be distracted with emails about it. There’s also a few other strikes against Groups as a tool, such as the fact that it is no longer a supported product by Google, it provides no API hooks and it is not available for users in China. We’ve evaluated several options and we’re currently considering shuttering the elasticsearch-user and logstash-users Google Groups in favor of a Discourse forum. You can read more about Discourse at http://www.discourse.org We feel Discourse will allow us to provide a better experience for all of our users for a few reasons: * More fine grained conversation topics = less noise and better targeted discussions. e.g. we can offer a forum for each language client, individual logstash plugin or for each city to plan user group meetings, etc. * Facilitates discussions that are not generally happening on list now, such as best practices by use case or tips from moving to development to production * Easier for folks who are purely end users - and less used to getting peer support on a mailing list - to get help when they need it Obviously, Discourse does not function the exact same way as a mailing list - however, email interaction with Discourse is supported and will continue to allow you to participate in discussions over email (though there are some small issues related to in-line replies. [0]) We’re working with the Discourse team now as part of evaluating this transition, and we know they’re working to resolve this particular issue. We’re also still determining how Discourse will handle our needs for both user and list archive migration, and we’ll know the precise details of how that would work soon. (We’ll share when we have them.) The final goal would be to move Google Groups to read-only archives, and cut over to Discourse completely for community support discussions. We’re looking at making the cut over in ~30 days from today, but
Re: creation_date in index setteing
Hi All, I also require the indexing time to be returned by ES, but when i am firing the query like curl -XGET 'http://192.168.0.179:9200/16-04-2015-index/_settings' I am not able to get the index_creation time and getting the response as : {16-04-2015-index:{settings:{index:{uuid:rHmX564PSnuI8cye4GxA1g,number_of_replicas:0,number_of_shards:5,version:{created:1030099} Please let me know how i can get the index creation time for the same. ~Prashant -- View this message in context: http://elasticsearch-users.115913.n3.nabble.com/creation-date-in-index-setteing-tp4073837p4073853.html Sent from the Elasticsearch Users mailing list archive at Nabble.com. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1429536209334-4073853.post%40n3.nabble.com. For more options, visit https://groups.google.com/d/optout.
Re: creation_date in index setteing
Thank you very much Christian. On Monday, April 20, 2015 at 2:29:29 PM UTC+7, christian...@elasticsearch.com wrote: The creation date is given with millisecond precision. Take away the last 3 digits and you converter gives Fri, 06 Mar 2015 08:44:57 GMT fo r 1425631497. Christian On Monday, April 20, 2015 at 5:06:40 AM UTC+1, tao hiko wrote: I query setting information of index and found that have creation_date field but I cannot understand what is value. Can you explain me more? settings: { index: { creation_date: 1425631497164, uuid: A9ZXMK7zTjSyB_oWpvf7fg, analysis: { analyzer: { sensible_analyzer: { type: custom, filter: [ word_filter, lowercase ], tokenizer: keyword } }, filter: { word_filter: { split_on_numerics: false, type: word_delimiter, preserve_original: true, split_on_case_change: false, generate_number_parts: false, generate_word_parts: false } } }, number_of_replicas: 1, number_of_shards: 6, version: { created: 1040399 } } } Thank you, Hiko -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5d9e671b-ef21-45f7-999c-8d5dd979e47e%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Corrupted Index
Dear all, We are using ES-1.3.7 for our search Application. Sometime back we upgraded from 0.90.5 to 1.3.7. We have 2 master nodes and 3 data nodes. We are getting CorruptedIndexException when Shard initialization is happening. This is the second time we are facing such issue since last update. Last time, only one shard got corrupted, but now almost 15 to 20 shards got corrupted. Each shard has only ~500MB data. log trace : [2015-04-19 22:49:57,552][WARN ][cluster.action.shard ] [Node1] [138][3] received shard failed for [138][3], node[EkvXNBUOTcuEfWo4SG72bA], [P], s[INITIALIZING], indexUUID [_na_], reason [Failed to start shard, message [IndexShardGatewayRecoveryException[[138][3] failed to fetch index version after copying it over]; nested: CorruptIndexException[[138][3] Corrupted index [corrupted_gb9JvBzdRQKqkhEeaXFEIA] caused by: CorruptIndexException[checksum failed (hardware problem?) : expected=637c1x actual=gavi2b resource=(org.apache.lucene.store.FSDirectory$FSIndexOutput@54916f2a)]]; ]] [2015-04-19 22:49:57,626][WARN ][cluster.action.shard ] [Node1] [138][3] received shard failed for [138][3], node[Q1eAQgNtSJ2BLlMevzRzcA], [P], s[INITIALIZING], indexUUID [_na_], reason [Failed to start shard, message [IndexShardGatewayRecoveryException[[138][3] failed recovery]; nested: EngineCreationFailureException[[138][3] failed to open reader on writer]; nested: FileNotFoundException[No such file [_1gy_x.del]]; ]] Thanks in advance Ranjith Venkatesan -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/6f1881d0-9f43-4a01-a9e3-9337a1486fed%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: creation_date in index setteing
The creation date is given with millisecond precision. Take away the last 3 digits and you converter gives Fri, 06 Mar 2015 08:44:57 GMT for 1425631497. Christian On Monday, April 20, 2015 at 5:06:40 AM UTC+1, tao hiko wrote: I query setting information of index and found that have creation_date field but I cannot understand what is value. Can you explain me more? settings: { index: { creation_date: 1425631497164, uuid: A9ZXMK7zTjSyB_oWpvf7fg, analysis: { analyzer: { sensible_analyzer: { type: custom, filter: [ word_filter, lowercase ], tokenizer: keyword } }, filter: { word_filter: { split_on_numerics: false, type: word_delimiter, preserve_original: true, split_on_case_change: false, generate_number_parts: false, generate_word_parts: false } } }, number_of_replicas: 1, number_of_shards: 6, version: { created: 1040399 } } } Thank you, Hiko -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/4d4f9dea-57e9-4403-988a-6b3012af6a65%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Cannot read from Elasticsearch using Spark SQL
I wrote this simple notebook in scala using Elasticsearch Spark adapter: %AddJar file:///tools/elasticsearch-hadoop-2.1.0.Beta3/dist/elasticsearch-spark_2.10-2.1.0.BUILD-SNAPSHOT.jar %AddJar file:///tools/elasticsearch-hadoop-2.1.0.Beta3/dist/elasticsearch-hadoop-2.1.0.BUILD-SNAPSHOT.jar // Write something to spark/docs index // import org.elasticsearch.spark._ val michelangelo = Map(artist - Michelangelo, bio - Painter, sculptor, architect born in Florence in... ) val leonardo = Map(artist - Leonardo, bio - Absolute genius; painter, inventor born in the little village of Vinci...) sc.makeRDD(Seq(michelangelo, leonardo)).saveToEs(spark/docs) // Search for painters through spark/docs // val painters = sc.esRDD(spark/docs, ?q=painter) println(Number of painters in spark/docs: + painters.count()) painters.collect().foreach(x = println(ID: + x._1)) // Try to read using SparkSQL // import org.apache.spark.sql.SQLContext import org.elasticsearch.spark._ import org.elasticsearch.spark.sql._ val sql = new SQLContext(sc) // Here is where I get an exception // val docsSql = sql.esRDD(spark/docs) Name: java.lang.NoSuchMethodError Message: org.apache.spark.sql.catalyst.types.StructField.init(Ljava/lang/String;Lorg/apache/spark/sql/catalyst/types/DataType;Z)V StackTrace: org.elasticsearch.spark.sql.MappingUtils$.org$elasticsearch$spark$sql$MappingUtils$$convertField(MappingUtils.scala:75) org.elasticsearch.spark.sql.MappingUtils$$anonfun$convertToStruct$1.apply(MappingUtils.scala:54) org.elasticsearch.spark.sql.MappingUtils$$anonfun$convertToStruct$1.apply(MappingUtils.scala:54) scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108) scala.collection.TraversableLike$class.map(TraversableLike.scala:244) scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108) org.elasticsearch.spark.sql.MappingUtils$.convertToStruct(MappingUtils.scala:54) org.elasticsearch.spark.sql.MappingUtils$.discoverMapping(MappingUtils.scala:47) org.elasticsearch.spark.sql.EsSparkSQL$.esRDD(EsSparkSQL.scala:27) org.elasticsearch.spark.sql.EsSparkSQL$.esRDD(EsSparkSQL.scala:23) org.elasticsearch.spark.sql.package$SQLContextFunctions.esRDD(package.scala:16) $line460.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$c57ec8bf9b0d5f6161b97741d596ff0wC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.init(console:111) $line460.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.init(console:116) $line460.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.init(console:118) $line460.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$ What am I doing wrong? Thanks - Michele -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/49a5c624-bda0-4835-98bc-915ec80d4fa3%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Cannot read from Elasticsearch using Spark SQL
Beta3 work with Spark SQL 1.0 and 1.1 Spark SQL 1.2 was released after that and broke binary backwards compatibility however his has been fixed in master/dev version [1] Note that Spark SQL 1.3 was released as well and again, broke backwards compatibility this time significant hence why there are now two versions [2] - make sure to use the one appropriate for your Spark version. Cheers, [1] http://www.elastic.co/guide/en/elasticsearch/hadoop/master/install.html#download-dev [2] http://www.elastic.co/guide/en/elasticsearch/hadoop/master/spark.html#spark-sql-versions On 4/20/15 1:17 PM, michele crudele wrote: I wrote this simple notebook in scala using Elasticsearch Spark adapter: %AddJar file:///tools/elasticsearch-hadoop-2.1.0.Beta3/dist/elasticsearch-spark_2.10-2.1.0.BUILD-SNAPSHOT.jar %AddJar file:///tools/elasticsearch-hadoop-2.1.0.Beta3/dist/elasticsearch-hadoop-2.1.0.BUILD-SNAPSHOT.jar // Write something to spark/docs index // import org.elasticsearch.spark._ val michelangelo = Map(artist - Michelangelo, bio - Painter, sculptor, architect born in Florence in... ) val leonardo = Map(artist - Leonardo, bio - Absolute genius; painter, inventor born in the little village of Vinci...) sc.makeRDD(Seq(michelangelo, leonardo)).saveToEs(spark/docs) // Search for painters through spark/docs // val painters = sc.esRDD(spark/docs, ?q=painter) println(Number of painters in spark/docs: + painters.count()) painters.collect().foreach(x = println(ID: + x._1)) // Try to read using SparkSQL // import org.apache.spark.sql.SQLContext import org.elasticsearch.spark._ import org.elasticsearch.spark.sql._ val sql = new SQLContext(sc) // Here is where I get an exception // val docsSql = sql.esRDD(spark/docs) Name: java.lang.NoSuchMethodError Message: org.apache.spark.sql.catalyst.types.StructField.init(Ljava/lang/String;Lorg/apache/spark/sql/catalyst/types/DataType;Z)V StackTrace: org.elasticsearch.spark.sql.MappingUtils$.org$elasticsearch$spark$sql$MappingUtils$$convertField(MappingUtils.scala:75) org.elasticsearch.spark.sql.MappingUtils$$anonfun$convertToStruct$1.apply(MappingUtils.scala:54) org.elasticsearch.spark.sql.MappingUtils$$anonfun$convertToStruct$1.apply(MappingUtils.scala:54) scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108) scala.collection.TraversableLike$class.map(TraversableLike.scala:244) scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108) org.elasticsearch.spark.sql.MappingUtils$.convertToStruct(MappingUtils.scala:54) org.elasticsearch.spark.sql.MappingUtils$.discoverMapping(MappingUtils.scala:47) org.elasticsearch.spark.sql.EsSparkSQL$.esRDD(EsSparkSQL.scala:27) org.elasticsearch.spark.sql.EsSparkSQL$.esRDD(EsSparkSQL.scala:23) org.elasticsearch.spark.sql.package$SQLContextFunctions.esRDD(package.scala:16) $line460.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$c57ec8bf9b0d5f6161b97741d596ff0wC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.init(console:111) $line460.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.init(console:116) $line460.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.init(console:118) $line460.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$ What am I doing wrong? Thanks - Michele -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com mailto:elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/49a5c624-bda0-4835-98bc-915ec80d4fa3%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/49a5c624-bda0-4835-98bc-915ec80d4fa3%40googlegroups.com?utm_medium=emailutm_source=footer. For more options, visit https://groups.google.com/d/optout. -- Costin -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit
Creating Snapshot Repository on Windows cluster
Hi I'm having some trouble creating a snapshot repository on a cluster running on Windows. PUT _snapshot/main_backup { type: fs, settings: { location: gbr-t-ess-003\\Snapshots\\backup2\\, compress: true } } The above fails with a BlobStoreException: Failed to create directory. I've set the Share to Everyone Read/Write access to hopefully get this working but still no good. I've tried creating symbolic links on each machine and using local directory path, but no luck either. Anyone got this working? Sam -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/070addde-3528-425d-ab43-f1c3e1399e93%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
KIBANA-4 Flow Chart
Hi every one , i am doing flow chart for kibana4 which i am using. Now my doubt is since kibana home page loads using javascript files i am not able to follow such long scripts. Can any one help in doing a flow chart for kibana-4(or flow of kibana using scripts and html). thanks in advance -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/225e0d7f-11d4-4a88-b876-aeb7105b2714%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Distribute Search Results across a specific field / property
I have a pull request in the works that adds an option for maintaining diversity in results: https://github.com/elastic/elasticsearch/pull/10221 This is mainly for the purposes of sample-based aggregations but if used with the top_hits aggregation it might give you some of what you need. Cheers Mark On Monday, April 20, 2015 at 3:18:49 PM UTC+1, Frederik Lipfert wrote: Hi Guys, I am using ES to build out the search for an online store. The operator would like to have the results being returned in a way that showcases the varieties of manufactures he offers. So instead of returning order by score he would like there to be one result from each store on each page, at least on the first few pages. The issue is there is one manufactures who makes 80% of the portfolio so it always looks like there is only that one store. But if one could mix or distribute the results to showcase the variety that be pretty cool. I have seen a bunch of online stores do that somehow, although I do not know how. Is there a way to use ES to somehow do a query like, return 20 results and 2 of each Store? Thanks for your help. Frederik -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/feaece21-85c0-405c-8eda-77d627f74bb1%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
MongoDB river not copying all of the data from mongoDB to ES
Enter code here... Hi: I have been successful at creating a river between a MongoDB database and an Elasticsearch instance. The MongoDB for the database and specific collection has 8M+ documents. However when the river is setup and running less than 1/2 the number of docs are copied/transferred. I am using the elasticsearch-river-mongodb-2.0.7 version with Elasticsearch 1.4.4 Here is a sampling of the trace log messages from ES : [2015-04-17 12:56:22,045][TRACE][org.elasticsearch.river.mongodb.Indexer] Insert operation - id: 553148b8e4b09c4dd2407f92 - contains attachment: false [2015-04-17 12:56:22,045][TRACE][org.elasticsearch.river.mongodb.Indexer] updateBulkRequest for id: [553148b8e4b09c4dd2407fa6], operation: [INSERT] [2015-04-17 12:56:22,045][TRACE][org.elasticsearch.river.mongodb.Indexer] Operation: INSERT - index: twitter - type: one-pct-sane - routing: null - parent: null [2015-04-17 12:56:22,045][TRACE][org.elasticsearch.river.mongodb.Indexer] Insert operation - id: 553148b8e4b09c4dd2407fa6 - contains attachment: false [2015-04-17 12:56:22,045][TRACE][org.elasticsearch.river.mongodb.Indexer] updateBulkRequest for id: [553148bae4b09c4dd2407fd5], operation: [INSERT] [2015-04-17 12:56:22,045][TRACE][org.elasticsearch.river.mongodb.Indexer] Operation: INSERT - index: twitter - type: one-pct-sane - routing: null - parent: null [2015-04-17 12:56:22,045][TRACE][org.elasticsearch.river.mongodb.Indexer] Insert operation - id: 553148bae4b09c4dd2407fd5 - contains attachment: false [2015-04-17 12:56:22,045][TRACE][org.elasticsearch.river.mongodb.Indexer] updateBulkRequest for id: [553148bae4b09c4dd2407ffe], operation: [INSERT] [2015-04-17 12:56:22,045][TRACE][org.elasticsearch.river.mongodb.Indexer] Operation: INSERT - index: twitter - type: one-pct-sane - routing: null - parent: null [2015-04-17 12:56:22,045][TRACE][org.elasticsearch.river.mongodb.Indexer] Insert operation - id: 553148bae4b09c4dd2407ffe - contains attachment: false [2015-04-17 12:56:22,055][TRACE][org.elasticsearch.river.mongodb. MongoDBRiver] setLastTimestamp [one_pct_sane] [one-pct-sane.current] [ Timestamp.BSON(ts={ $ts : 1429282637 , $inc : 2})] [2015-04-17 12:56:22,095][TRACE][org.elasticsearch.river.mongodb. MongoDBRiverBulkProcessor] afterBulk - bulk [57498] success [80 items] [59 ms] total [3952638] [2015-04-17 12:56:22,217][TRACE][org.elasticsearch.river.mongodb. MongoDBRiverBulkProcessor] bulkQueueSize [50] - queue [0] - availability [1] [2015-04-17 12:56:22,217][TRACE][org.elasticsearch.river.mongodb. MongoDBRiverBulkProcessor] beforeBulk - new bulk [57499] of items [49] [2015-04-17 12:56:22,259][TRACE][org.elasticsearch.river.mongodb. MongoDBRiverBulkProcessor] afterBulk - bulk [57499] success [49 items] [42 ms] total [3952687] [2015-04-17 12:56:22,387][TRACE][org.elasticsearch.river.mongodb. MongoDBRiverBulkProcessor] bulkQueueSize [50] - queue [0] - availability [1] [2015-04-17 12:56:22,387][TRACE][org.elasticsearch.river.mongodb. MongoDBRiverBulkProcessor] beforeBulk - new bulk [57500] of items [1] [2015-04-17 12:56:22,389][TRACE][org.elasticsearch.river.mongodb. MongoDBRiverBulkProcessor] afterBulk - bulk [57500] success [1 items] [2 ms] total [3952688] [2015-04-17 12:56:22,390][INFO ][cluster.metadata ] [Star-Dancer] [ _river] update_mapping [one_pct_sane] (dynamic) [2015-04-17 13:06:20,497][INFO ][cluster.metadata ] [Star-Dancer] [ _river] update_mapping [one_pct_sane2] (dynamic) [2015-04-17 13:06:20,513][INFO ][cluster.metadata ] [Star-Dancer] [ _river] update_mapping [one_pct_sane2] (dynamic) [2015-04-17 13:06:20,523][INFO ][cluster.metadata ] [Star-Dancer] [ _river] update_mapping [one_pct_sane2] (dynamic) [2015-04-17 13:06:22,394][INFO ][cluster.metadata ] [Star-Dancer] [ _river] update_mapping [one_pct_sane2] (dynamic) Amongst the several questions I have, these are some : 1. Does the river copy data based on what exists in Oplogs ? (how does the river use the oplogs to get the data) 2. There aren't any obvious errors being shown, documents do come in. but as I mentioned earlier, less than 1/2 the number of documents in MongoDB are being copied over. why would that be ? Thanks for any help/assistance Ramdev -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/fb1f60d1-0204-4021-93a5-85263d939c8c%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Elasticseach issue with some indicies not populating data
Also, sanity check: root@logstash:/var/log/logstash# iptables -L Chain INPUT (policy ACCEPT) target prot opt source destination Chain FORWARD (policy ACCEPT) target prot opt source destination Chain OUTPUT (policy ACCEPT) target prot opt source destination root@logstash:/var/log/logstash# Don Pich | Jedi Master (aka System Administrator 2) | O: 701-952-5925 3320 Westrac Drive South, Suite A * Fargo, ND 58103 Facebook http://www.facebook.com/RealTruck | Youtube http://www.youtube.com/realtruckcom| Twitter http://twitter.com/realtruck | Google+ https://google.com/+Realtruck | Instagram http://instagram.com/realtruckcom | Linkedin http://www.linkedin.com/company/realtruck | Our Guiding Principles http://www.realtruck.com/our-guiding-principles/ “If it goes on a truck we got it, if it’s fun we do it” – RealTruck.com http://realtruck.com/ On Mon, Apr 20, 2015 at 9:38 AM, Don Pich dp...@realtruck.com wrote: Thanks for that info. Again, training wheels... :-) So below is my logstash config. If I do a tcpdump on port 5044, I see all of my forwarders communicating with the logstash server. However, if I do a tcpdump on port 9300, I do not see any traffic. This leads me to believe that I have a problem in my output. input { lumberjack # comes from logstash-forwarder, we sent ALL formats and types through this and control logType and logFormat on the client { # The port to listen on port = 5044 host = 192.168.1.72 # The paths to your ssl cert and key ssl_certificate = /opt/logstash-1.4.2/ssl/certs/lumberjack.crt # new cert needed for latest v of lumberjack-pusher ssl_key = /opt/logstash-1.4.2/ssl/private/lumberjack.key } tcp { # Remember with nxlog we're automatically converting our windows xml to JSON ssl_cert = /opt/logstash-1.4.2/ssl/certs/logstash-forwarder.crt ssl_key = /opt/logstash-1.4.2/ssl/private/logstash-forwarder.key ssl_enable = true debug=true type = windowsEventLog port = 3515 codec = line add_field={logType=windowsEventLog} } tcp { # Remember with nxlog we're automatically converting our windows xml to JSON # used for NFSServer which apparently cannot connect via SSL :( type = windowsEventLog port = 3516 codec = line add_field={logType=windowsEventLog} } } filter { if [logFormat] == nginxLog { mutate{add_field = [receivedAt,%{@timestamp}]} #preserve when we received this grok { break_on_match = false match = [message,%{IP:visitor_ip}\|[^|]+\|%{TIMESTAMP_ISO8601:entryDateTime}\|%{URIPATH:url}%{URIPARAM:query_string}?\|%{INT:http_response}\|%{INT:response_length}\|(?http_referrer[^|]+)\|(?user_agent[^|]+)\|%{BASE16FLOAT:request_time}\|%{BASE16FLOAT:upstream_response_time}] match = [url,\.(?extension(?:.(?!\.))+)$] } date { match = [entryDateTime,ISO8601] remove_field = [entryDateTime] } } else if [logFormat] == exim4 { mutate{add_field = [receivedAt,%{@timestamp}]} #preserve when we received this grok { break_on_match = false match = [message,(?entryDateTime[^ ]+ [^ ]+) \[(?processID.*)\] (?entry.*)] } date { match = [entryDateTime,-MM-dd HH:mm:ss] } } else if [logFormat]==proftpd { grok { break_on_match = false match = [message,(?ipAddress[^ ]+) (?remoteUserName[^ ]+) (?localUserID[^ ]+) \[(?entryDateTime.*)\] (?ftpCommand\.*\) (?ftpResponseCode[^ ]+) (?ftpResponse\.*\) (?bytesSent[^ ]+)] add_field = [receivedAt,%{@timestamp}] # preserve now before date overwrites } date { match = [entryDateTime,dd/MMM/:HH:mm:ss Z] #target = testDate } } else if [logFormat] == debiansyslog { # linux sysLog grok { break_on_match = false match = [message,(?entryDateTime[a-zA-Z]{3} [ 0-9]+ [^ ]+) (?hostName[^ ]+) (?service[^:]+):(?entry.*)] add_field = [receivedAt,%{@timestamp}] # preserve NOW before date overwrites } date { # Mar 2 02:21:28 primaryweb-wheezy logstash-forwarder[754]: 2015/03/02 02:21:28.607445 Registrar received 348 events match = [entryDateTime,MMM dd HH:mm:ss,MMM d HH:mm:ss] # problems with jodatime and missing leading 0 on days, we can supply multiple patterns :) } } else if [type] == windowsEventLog { json{ source = message } # set our source to the entire message as its JSON mutate { add_field = [receivedAt,%{@timestamp}]
Re: creation_date in index setteing
Prashant, What version of Elasticsearch are you using? The index creation date added to the index settings API in version 1.4.0 and will only show for indices created with that version or later (see https://github.com/elastic/elasticsearch/pull/7218). Colin On Monday, April 20, 2015 at 2:23:37 PM UTC+1, Prashy wrote: Hi All, I also require the indexing time to be returned by ES, but when i am firing the query like curl -XGET 'http://192.168.0.179:9200/16-04-2015-index/_settings' I am not able to get the index_creation time and getting the response as : {16-04-2015-index:{settings:{index:{uuid:rHmX564PSnuI8cye4GxA1g,number_of_replicas:0,number_of_shards:5,version:{created:1030099} Please let me know how i can get the index creation time for the same. ~Prashant -- View this message in context: http://elasticsearch-users.115913.n3.nabble.com/creation-date-in-index-setteing-tp4073837p4073853.html Sent from the Elasticsearch Users mailing list archive at Nabble.com. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b12a4019-c5ab-4e0f-aaec-0d3ab083da4d%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
find missing documents in an index
Is there a way for Elasticsearch to tell me documents that are NOT in an index given a set of criteria? I have a field in my documents that contains a unique numerical id. There are some ids that are missing from documents in the index and I want to find those ids. For example: { product_id: 1000 }, {product_id: 1002}, {product_id: 1004}, {product_id: 1005}, ... In this example I want to find that documents with a product_id of 1001 and 1003 are missing from the index. Is this something an aggregation could help me identify? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/50ba365a-02c8-40ce-8d9d-c49cedd7e2da%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Elasticseach issue with some indicies not populating data
Having unassigned shards is perfectly fine on a one node cluster. The fact that your cluster were yellow does not mean your cluster was not behaving correctly. -- David Pilato - Developer | Evangelist elastic.co @dadoonet https://twitter.com/dadoonet | @elasticsearchfr https://twitter.com/elasticsearchfr | @scrutmydocs https://twitter.com/scrutmydocs Le 20 avr. 2015 à 15:54, Don Pich dp...@realtruck.com a écrit : Hello David, I found and this online that made my cluster go 'green'. http://blog.trifork.com/2013/10/24/how-to-avoid-the-split-brain-problem-in-elasticsearch/ http://blog.trifork.com/2013/10/24/how-to-avoid-the-split-brain-problem-in-elasticsearch/ I don't know for certain if that was 100% of the problem, but there are no longer unassigned shards. root@logstash:/# curl -XGET 'localhost:9200/_cluster/health?pretty=true' { cluster_name : es-logstash, status : green, timed_out : false, number_of_nodes : 2, number_of_data_nodes : 2, active_primary_shards : 2792, active_shards : 5584, relocating_shards : 0, initializing_shards : 0, unassigned_shards : 0 } root@logstash:/# However, the root of my problem still exists. I did restart the forwarders, and TCP dump does show that traffic is indeed hitting the server. But my indicies folder does not contain fresh data except for one source. Don Pich | Jedi Master (aka System Administrator 2) | O: 701-952-5925 3320 Westrac Drive South, Suite A * Fargo, ND 58103 Facebook http://www.facebook.com/RealTruck | Youtube http://www.youtube.com/realtruckcom| Twitter http://twitter.com/realtruck | Google+ https://google.com/+Realtruck | Instagram http://instagram.com/realtruckcom | Linkedin http://www.linkedin.com/company/realtruck | Our Guiding Principles http://www.realtruck.com/our-guiding-principles/“If it goes on a truck we got it, if it’s fun we do it” – RealTruck.com http://realtruck.com/ On Sun, Apr 19, 2015 at 10:04 PM, David Pilato da...@pilato.fr mailto:da...@pilato.fr wrote: Are you using the same exact JVM version? Where do those logs come from? LS ? ES ? Could you try the same with a cleaned Elasticsearch ? I mean with no data ? My suspicion is that you have too many shards allocated on a single (tiny?) node. What is your node size BTW (memory / heap size)? David Le 19 avr. 2015 à 23:09, Don Pich dp...@realtruck.com mailto:dp...@realtruck.com a écrit : Thanks for taking the time to answer David. Again, got my training wheels on with an ELK stack so I will do my best to answer. Here is an example. The one indecy that is working has a fresh directory with todays date in the elasticsearch directory. The ones that are not working do not have a directory. Logstash and Elastisearch are running with the logs not generating much information as far as pointing to any error. log4j, [2015-04-19T13:41:44.723] WARN: org.elasticsearch.transport.netty: [logstash-logstash-3170-2032] Message not fully read (request) for [2] and action [internal:discovery/zen/unicast_gte_1_4], resetting log4j, [2015-04-19T13:41:49.569] WARN: org.elasticsearch.transport.netty: [logstash-logstash-3170-2032] Message not fully read (request) for [5] and action [internal:discovery/zen/unicast_gte_1_4], resetting log4j, [2015-04-19T13:41:54.572] WARN: org.elasticsearch.transport.netty: [logstash-logstash-3170-2032] Message not fully read (request) for [10] and action [internal:discovery/zen/unicast_gte_1_4], resetting Don Pich | Jedi Master (aka System Administrator 2) | O: 701-952-5925 tel:701-952-5925 3320 Westrac Drive South, Suite A * Fargo, ND 58103 Facebook http://www.facebook.com/RealTruck | Youtube http://www.youtube.com/realtruckcom| Twitter http://twitter.com/realtruck | Google+ https://google.com/+Realtruck | Instagram http://instagram.com/realtruckcom | Linkedin http://www.linkedin.com/company/realtruck | Our Guiding Principles http://www.realtruck.com/our-guiding-principles/“If it goes on a truck we got it, if it’s fun we do it” – RealTruck.com http://realtruck.com/ On Sun, Apr 19, 2015 at 2:38 PM, David Pilato da...@pilato.fr mailto:da...@pilato.fr wrote: From an Elasticsearch point of view, I don't see anything wrong. You have a way too much shards for sure so you might hit OOM exception or other troubles. So to answer to your question, check your Elasticsearch logs and if nothing looks wrong, check logstash. Just adding that Elasticsearch is not generating data so you probably meant that logstash stopped generating data, right? HTH David Le 19 avr. 2015 à 21:08, dp...@realtruck.com mailto:dp...@realtruck.com a écrit : I am new to elasticsearch and have a problem. I have 5 indicies. At first all of them were running without issue. However, over the last 2 weeks, all but one have stopped generating data. I have run a tcpdump on the logstash server and
Re: Elasticseach issue with some indicies not populating data
Thanks for that info. Again, training wheels... :-) So below is my logstash config. If I do a tcpdump on port 5044, I see all of my forwarders communicating with the logstash server. However, if I do a tcpdump on port 9300, I do not see any traffic. This leads me to believe that I have a problem in my output. input { lumberjack # comes from logstash-forwarder, we sent ALL formats and types through this and control logType and logFormat on the client { # The port to listen on port = 5044 host = 192.168.1.72 # The paths to your ssl cert and key ssl_certificate = /opt/logstash-1.4.2/ssl/certs/lumberjack.crt # new cert needed for latest v of lumberjack-pusher ssl_key = /opt/logstash-1.4.2/ssl/private/lumberjack.key } tcp { # Remember with nxlog we're automatically converting our windows xml to JSON ssl_cert = /opt/logstash-1.4.2/ssl/certs/logstash-forwarder.crt ssl_key = /opt/logstash-1.4.2/ssl/private/logstash-forwarder.key ssl_enable = true debug=true type = windowsEventLog port = 3515 codec = line add_field={logType=windowsEventLog} } tcp { # Remember with nxlog we're automatically converting our windows xml to JSON # used for NFSServer which apparently cannot connect via SSL :( type = windowsEventLog port = 3516 codec = line add_field={logType=windowsEventLog} } } filter { if [logFormat] == nginxLog { mutate{add_field = [receivedAt,%{@timestamp}]} #preserve when we received this grok { break_on_match = false match = [message,%{IP:visitor_ip}\|[^|]+\|%{TIMESTAMP_ISO8601:entryDateTime}\|%{URIPATH:url}%{URIPARAM:query_string}?\|%{INT:http_response}\|%{INT:response_length}\|(?http_referrer[^|]+)\|(?user_agent[^|]+)\|%{BASE16FLOAT:request_time}\|%{BASE16FLOAT:upstream_response_time}] match = [url,\.(?extension(?:.(?!\.))+)$] } date { match = [entryDateTime,ISO8601] remove_field = [entryDateTime] } } else if [logFormat] == exim4 { mutate{add_field = [receivedAt,%{@timestamp}]} #preserve when we received this grok { break_on_match = false match = [message,(?entryDateTime[^ ]+ [^ ]+) \[(?processID.*)\] (?entry.*)] } date { match = [entryDateTime,-MM-dd HH:mm:ss] } } else if [logFormat]==proftpd { grok { break_on_match = false match = [message,(?ipAddress[^ ]+) (?remoteUserName[^ ]+) (?localUserID[^ ]+) \[(?entryDateTime.*)\] (?ftpCommand\.*\) (?ftpResponseCode[^ ]+) (?ftpResponse\.*\) (?bytesSent[^ ]+)] add_field = [receivedAt,%{@timestamp}] # preserve now before date overwrites } date { match = [entryDateTime,dd/MMM/:HH:mm:ss Z] #target = testDate } } else if [logFormat] == debiansyslog { # linux sysLog grok { break_on_match = false match = [message,(?entryDateTime[a-zA-Z]{3} [ 0-9]+ [^ ]+) (?hostName[^ ]+) (?service[^:]+):(?entry.*)] add_field = [receivedAt,%{@timestamp}] # preserve NOW before date overwrites } date { # Mar 2 02:21:28 primaryweb-wheezy logstash-forwarder[754]: 2015/03/02 02:21:28.607445 Registrar received 348 events match = [entryDateTime,MMM dd HH:mm:ss,MMM d HH:mm:ss] # problems with jodatime and missing leading 0 on days, we can supply multiple patterns :) } } else if [type] == windowsEventLog { json{ source = message } # set our source to the entire message as its JSON mutate { add_field = [receivedAt,%{@timestamp}] } if [SourceModuleName] == eventlog { # use the date/time of the entry and not physical time so viewing acts as expected date { match = [EventTime,-MM-dd HH:mm:ss] } # message defaults to the entire message. Since we have json data for all properties, copy the event message into it instead mutate { replace = [ message, %{Message} ] } mutate { remove_field = [ Message ] } } } } output { if [logType] == webLog { elasticsearch { host=127.0.0.1 port=9300 cluster = es-logstash #node_name = es-logstash-n1 index = logstash-weblog-events-%{+.MM.dd} } } else if [logType] == mailLog { elasticsearch { host=127.0.0.1 port=9300 cluster = es-logstash #node_name = es-logstash-n1 index = logstash-mail-events-%{+.MM.dd} } } else if
Re: How to configure max file descriptors on windows OS?
Thanks Mark! On Friday, April 17, 2015 at 6:22:24 AM UTC+8, Mark Walkom wrote: -1 means unbound, ie unlimited. On 16 April 2015 at 20:54, Xudong You xudon...@gmail.com javascript: wrote: Anyone knows how to change the max_file_descriptors on windows? I built ES cluster on Windows and got following process information: max_file_descriptors : -1, open_file_descriptors : -1, What does “-1 mean? Is it possible to change the max file descriptors on windows platform? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com javascript:. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/aa22c565-80f5-4228-8f03-15d1b1e3f150%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/aa22c565-80f5-4228-8f03-15d1b1e3f150%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/6c87970e-ca32-413b-9333-9ec60a931c9e%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Elasticseach issue with some indicies not populating data
Might be. But you should ask this on the logstash mailing list. I think that elasticsearch is working fine here as you did not see any trouble in logs. That said I’d use: elasticsearch { protocol = http host = localhost } So using REST port (9200) that is. You can also add this output to make sure something is meant to be sent in elasticsearch: output { stdout { codec = rubydebug } elasticsearch { protocol = http host = localhost } } -- David Pilato - Developer | Evangelist elastic.co @dadoonet https://twitter.com/dadoonet | @elasticsearchfr https://twitter.com/elasticsearchfr | @scrutmydocs https://twitter.com/scrutmydocs Le 20 avr. 2015 à 16:38, Don Pich dp...@realtruck.com a écrit : Thanks for that info. Again, training wheels... :-) So below is my logstash config. If I do a tcpdump on port 5044, I see all of my forwarders communicating with the logstash server. However, if I do a tcpdump on port 9300, I do not see any traffic. This leads me to believe that I have a problem in my output. input { lumberjack # comes from logstash-forwarder, we sent ALL formats and types through this and control logType and logFormat on the client { # The port to listen on port = 5044 host = 192.168.1.72 # The paths to your ssl cert and key ssl_certificate = /opt/logstash-1.4.2/ssl/certs/lumberjack.crt # new cert needed for latest v of lumberjack-pusher ssl_key = /opt/logstash-1.4.2/ssl/private/lumberjack.key } tcp { # Remember with nxlog we're automatically converting our windows xml to JSON ssl_cert = /opt/logstash-1.4.2/ssl/certs/logstash-forwarder.crt ssl_key = /opt/logstash-1.4.2/ssl/private/logstash-forwarder.key ssl_enable = true debug=true type = windowsEventLog port = 3515 codec = line add_field={logType=windowsEventLog} } tcp { # Remember with nxlog we're automatically converting our windows xml to JSON # used for NFSServer which apparently cannot connect via SSL :( type = windowsEventLog port = 3516 codec = line add_field={logType=windowsEventLog} } } filter { if [logFormat] == nginxLog { mutate{add_field = [receivedAt,%{@timestamp}]} #preserve when we received this grok { break_on_match = false match = [message,%{IP:visitor_ip}\|[^|]+\|%{TIMESTAMP_ISO8601:entryDateTime}\|%{URIPATH:url}%{URIPARAM:query_string}?\|%{INT:http_response}\|%{INT:response_length}\|(?http_referrer[^|]+)\|(?user_agent[^|]+)\|%{BASE16FLOAT:request_time}\|%{BASE16FLOAT:upstream_response_time}] match = [url,\.(?extension(?:.(?!\.))+)$] } date { match = [entryDateTime,ISO8601] remove_field = [entryDateTime] } } else if [logFormat] == exim4 { mutate{add_field = [receivedAt,%{@timestamp}]} #preserve when we received this grok { break_on_match = false match = [message,(?entryDateTime[^ ]+ [^ ]+) \[(?processID.*)\] (?entry.*)] } date { match = [entryDateTime,-MM-dd HH:mm:ss] } } else if [logFormat]==proftpd { grok { break_on_match = false match = [message,(?ipAddress[^ ]+) (?remoteUserName[^ ]+) (?localUserID[^ ]+) \[(?entryDateTime.*)\] (?ftpCommand\.*\) (?ftpResponseCode[^ ]+) (?ftpResponse\.*\) (?bytesSent[^ ]+)] add_field = [receivedAt,%{@timestamp}] # preserve now before date overwrites } date { match = [entryDateTime,dd/MMM/:HH:mm:ss Z] #target = testDate } } else if [logFormat] == debiansyslog { # linux sysLog grok { break_on_match = false match = [message,(?entryDateTime[a-zA-Z]{3} [ 0-9]+ [^ ]+) (?hostName[^ ]+) (?service[^:]+):(?entry.*)] add_field = [receivedAt,%{@timestamp}] # preserve NOW before date overwrites } date { # Mar 2 02:21:28 primaryweb-wheezy logstash-forwarder[754]: 2015/03/02 02:21:28.607445 Registrar received 348 events match = [entryDateTime,MMM dd HH:mm:ss,MMM d HH:mm:ss] # problems with jodatime and missing leading 0 on days, we can supply multiple patterns :) } } else if [type] == windowsEventLog { json{ source = message } # set our source to the entire message as its JSON mutate { add_field = [receivedAt,%{@timestamp}] } if [SourceModuleName] == eventlog { # use the date/time of the entry and not physical time so viewing acts as expected date { match =
Distribute Search Results across a specific field / property
Hi Guys, I am using ES to build out the search for an online store. The operator would like to have the results being returned in a way that showcases the varieties of manufactures he offers. So instead of returning order by score he would like there to be one result from each store on each page, at least on the first few pages. The issue is there is one manufactures who makes 80% of the portfolio so it always looks like there is only that one store. But if one could mix or distribute the results to showcase the variety that be pretty cool. I have seen a bunch of online stores do that somehow, although I do not know how. Is there a way to use ES to somehow do a query like, return 20 results and 2 of each Store? Thanks for your help. Frederik -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5e3f3787-79a5-4481-908a-0c2092da31ea%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Elasticseach issue with some indicies not populating data
Thanks David. I will move over to logstash as I agree that is where it is starting to feel like the problem is there. I appreciate your help!! Don Pich | Jedi Master (aka System Administrator 2) | O: 701-952-5925 3320 Westrac Drive South, Suite A * Fargo, ND 58103 Facebook http://www.facebook.com/RealTruck | Youtube http://www.youtube.com/realtruckcom| Twitter http://twitter.com/realtruck | Google+ https://google.com/+Realtruck | Instagram http://instagram.com/realtruckcom | Linkedin http://www.linkedin.com/company/realtruck | Our Guiding Principles http://www.realtruck.com/our-guiding-principles/ “If it goes on a truck we got it, if it’s fun we do it” – RealTruck.com http://realtruck.com/ On Mon, Apr 20, 2015 at 9:43 AM, David Pilato da...@pilato.fr wrote: Might be. But you should ask this on the logstash mailing list. I think that elasticsearch is working fine here as you did not see any trouble in logs. That said I’d use: elasticsearch { protocol = http host = localhost } So using REST port (9200) that is. You can also add this output to make sure something is meant to be sent in elasticsearch: output { stdout { codec = rubydebug } elasticsearch { protocol = http host = localhost } } -- *David Pilato* - Developer | Evangelist *elastic.co http://elastic.co* @dadoonet https://twitter.com/dadoonet | @elasticsearchfr https://twitter.com/elasticsearchfr | @scrutmydocs https://twitter.com/scrutmydocs Le 20 avr. 2015 à 16:38, Don Pich dp...@realtruck.com a écrit : Thanks for that info. Again, training wheels... :-) So below is my logstash config. If I do a tcpdump on port 5044, I see all of my forwarders communicating with the logstash server. However, if I do a tcpdump on port 9300, I do not see any traffic. This leads me to believe that I have a problem in my output. input { lumberjack # comes from logstash-forwarder, we sent ALL formats and types through this and control logType and logFormat on the client { # The port to listen on port = 5044 host = 192.168.1.72 # The paths to your ssl cert and key ssl_certificate = /opt/logstash-1.4.2/ssl/certs/lumberjack.crt # new cert needed for latest v of lumberjack-pusher ssl_key = /opt/logstash-1.4.2/ssl/private/lumberjack.key } tcp { # Remember with nxlog we're automatically converting our windows xml to JSON ssl_cert = /opt/logstash-1.4.2/ssl/certs/logstash-forwarder.crt ssl_key = /opt/logstash-1.4.2/ssl/private/logstash-forwarder.key ssl_enable = true debug=true type = windowsEventLog port = 3515 codec = line add_field={logType=windowsEventLog} } tcp { # Remember with nxlog we're automatically converting our windows xml to JSON # used for NFSServer which apparently cannot connect via SSL :( type = windowsEventLog port = 3516 codec = line add_field={logType=windowsEventLog} } } filter { if [logFormat] == nginxLog { mutate{add_field = [receivedAt,%{@timestamp}]} #preserve when we received this grok { break_on_match = false match = [message,%{IP:visitor_ip}\|[^|]+\|%{TIMESTAMP_ISO8601:entryDateTime}\|%{URIPATH:url}%{URIPARAM:query_string}?\|%{INT:http_response}\|%{INT:response_length}\|(?http_referrer[^|]+)\|(?user_agent[^|]+)\|%{BASE16FLOAT:request_time}\|%{BASE16FLOAT:upstream_response_time}] match = [url,\.(?extension(?:.(?!\.))+)$] } date { match = [entryDateTime,ISO8601] remove_field = [entryDateTime] } } else if [logFormat] == exim4 { mutate{add_field = [receivedAt,%{@timestamp}]} #preserve when we received this grok { break_on_match = false match = [message,(?entryDateTime[^ ]+ [^ ]+) \[(?processID.*)\] (?entry.*)] } date { match = [entryDateTime,-MM-dd HH:mm:ss] } } else if [logFormat]==proftpd { grok { break_on_match = false match = [message,(?ipAddress[^ ]+) (?remoteUserName[^ ]+) (?localUserID[^ ]+) \[(?entryDateTime.*)\] (?ftpCommand\.*\) (?ftpResponseCode[^ ]+) (?ftpResponse\.*\) (?bytesSent[^ ]+)] add_field = [receivedAt,%{@timestamp}] # preserve now before date overwrites } date { match = [entryDateTime,dd/MMM/:HH:mm:ss Z] #target = testDate } } else if [logFormat] == debiansyslog { # linux sysLog grok { break_on_match = false match = [message,(?entryDateTime[a-zA-Z]{3} [ 0-9]+ [^ ]+) (?hostName[^ ]+) (?service[^:]+):(?entry.*)] add_field = [receivedAt,%{@timestamp}] #
Re: Elasticseach issue with some indicies not populating data
Hello David, I found and this online that made my cluster go 'green'. http://blog.trifork.com/2013/10/24/how-to-avoid-the-split-brain-problem-in-elasticsearch/ I don't know for certain if that was 100% of the problem, but there are no longer unassigned shards. root@logstash:/# curl -XGET 'localhost:9200/_cluster/health?pretty=true' { cluster_name : es-logstash, status : green, timed_out : false, number_of_nodes : 2, number_of_data_nodes : 2, active_primary_shards : 2792, active_shards : 5584, relocating_shards : 0, initializing_shards : 0, unassigned_shards : 0 } root@logstash:/# However, the root of my problem still exists. I did restart the forwarders, and TCP dump does show that traffic is indeed hitting the server. But my indicies folder does not contain fresh data except for one source. Don Pich | Jedi Master (aka System Administrator 2) | O: 701-952-5925 3320 Westrac Drive South, Suite A * Fargo, ND 58103 Facebook http://www.facebook.com/RealTruck | Youtube http://www.youtube.com/realtruckcom| Twitter http://twitter.com/realtruck | Google+ https://google.com/+Realtruck | Instagram http://instagram.com/realtruckcom | Linkedin http://www.linkedin.com/company/realtruck | Our Guiding Principles http://www.realtruck.com/our-guiding-principles/ “If it goes on a truck we got it, if it’s fun we do it” – RealTruck.com http://realtruck.com/ On Sun, Apr 19, 2015 at 10:04 PM, David Pilato da...@pilato.fr wrote: Are you using the same exact JVM version? Where do those logs come from? LS ? ES ? Could you try the same with a cleaned Elasticsearch ? I mean with no data ? My suspicion is that you have too many shards allocated on a single (tiny?) node. What is your node size BTW (memory / heap size)? David Le 19 avr. 2015 à 23:09, Don Pich dp...@realtruck.com a écrit : Thanks for taking the time to answer David. Again, got my training wheels on with an ELK stack so I will do my best to answer. Here is an example. The one indecy that is working has a fresh directory with todays date in the elasticsearch directory. The ones that are not working do not have a directory. Logstash and Elastisearch are running with the logs not generating much information as far as pointing to any error. log4j, [2015-04-19T13:41:44.723] WARN: org.elasticsearch.transport.netty: [logstash-logstash-3170-2032] Message not fully read (request) for [2] and action [internal:discovery/zen/unicast_gte_1_4], resetting log4j, [2015-04-19T13:41:49.569] WARN: org.elasticsearch.transport.netty: [logstash-logstash-3170-2032] Message not fully read (request) for [5] and action [internal:discovery/zen/unicast_gte_1_4], resetting log4j, [2015-04-19T13:41:54.572] WARN: org.elasticsearch.transport.netty: [logstash-logstash-3170-2032] Message not fully read (request) for [10] and action [internal:discovery/zen/unicast_gte_1_4], resetting Don Pich | Jedi Master (aka System Administrator 2) | O: 701-952-5925 3320 Westrac Drive South, Suite A * Fargo, ND 58103 Facebook http://www.facebook.com/RealTruck | Youtube http://www.youtube.com/realtruckcom| Twitter http://twitter.com/realtruck | Google+ https://google.com/+Realtruck | Instagram http://instagram.com/realtruckcom | Linkedin http://www.linkedin.com/company/realtruck | Our Guiding Principles http://www.realtruck.com/our-guiding-principles/ “If it goes on a truck we got it, if it’s fun we do it” – RealTruck.com http://realtruck.com/ On Sun, Apr 19, 2015 at 2:38 PM, David Pilato da...@pilato.fr wrote: From an Elasticsearch point of view, I don't see anything wrong. You have a way too much shards for sure so you might hit OOM exception or other troubles. So to answer to your question, check your Elasticsearch logs and if nothing looks wrong, check logstash. Just adding that Elasticsearch is not generating data so you probably meant that logstash stopped generating data, right? HTH David Le 19 avr. 2015 à 21:08, dp...@realtruck.com a écrit : I am new to elasticsearch and have a problem. I have 5 indicies. At first all of them were running without issue. However, over the last 2 weeks, all but one have stopped generating data. I have run a tcpdump on the logstash server and confirmed that logging packets are getting to the server. I have looked into the servers health. I have issued the following to check on the cluster: root@logstash:/# curl -XGET 'localhost:9200/_cluster/health?pretty=true' { cluster_name : es-logstash, status : yellow, timed_out : false, number_of_nodes : 1, number_of_data_nodes : 1, active_primary_shards : 2791, active_shards : 2791, relocating_shards : 0, initializing_shards : 0, unassigned_shards : 2791 } root@logstash:/# Can some one please point me in the right direction on troubleshooting this? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop
SHIELD terms lookup filter : AuthorizationException BUG
Hi, Using: * ElasticSearch 1.5.1 * SHIELD 1.2 Whenever I use a terms lookup filter in a search query, I get an UnAuthorizedException for the [__es_system_user] user although the actual user has even 'admin' role privileges. This seems a bug to me, where the terms filter does not have the correct security context. This is very easy to reproduce, see gist : https://gist.github.com/bertvermeiren/c29e0d9ee54bb5b0b73a Scenario : # Add user 'admin' with default 'admin' role. ./bin/shield/esusers useradd admin -p admin1 -r admin # create index. curl -XPUT 'admin:admin1@localhost:9200/customer' # create a document on the index curl -XPUT 'admin:admin1@localhost:9200/customer/external/1' -d ' { name : John Doe, token : token1 }' # create additional index for the terms lookup filter functionality curl -XPUT 'admin:admin1@localhost:9200/tokens' # create document in 'tokens' index curl -XPUT 'admin:admin1@localhost:9200/tokens/tokens/1' -d ' { group : 1, tokens : [token1, token2 ] }' # search with a terms lookup filter on the customer index, referring to the 'tokens' index. curl -XGET 'admin:admin1@localhost:9200/customer/external/_search' -d ' { query: { filtered: { query: { match_all: {} }, filter: { terms: { token: { index: tokens, type: tokens, id: 1, path: tokens } } } } } }' = org.elasticsearch.shield.authz.AuthorizationException: action [indices:data/read/get] is unauthorized for user [__es_system_user] -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/4419d9d4-9bcc-4fab-afa3-a70799891f44%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: jdbcRiver rebuilding after restart.
The column strategy is a community effort, it can manipulate SQL statement where clauses with timestamp filter. I do not have enough knowledge about column strategy. You are correct, at node restart, a river does not know from where to restart. There is no method to resolve this within river logic. Jörg On Mon, Apr 20, 2015 at 2:11 PM, GWired garrettcjohn...@gmail.com wrote: I can't look at the feeder setup now but I could in the future. Is my SQL statement incorrect? Should I be doing something differently? Does the river not utilize created_at and updated_at in this setup? I don't have a where clause because I thought using the column strategy it would take that in to account. This is an example of what I see in SQL server: SELECT id as _id, * FROM [MyDBName].[dbo].[MyTableName] WHERE ({fn TIMESTAMPDIFF(SQL_TSI_SECOND,@P0,createddate)} = 0) Which when i populate the @P0 with a timestamp it seems to be working fine. On a restart I'm guessing it doesn't know when to start. Any way that I can check values in elasticsearch within the column strategy? Such as using Max(CreatedDate) so that it can start there? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/06e9ce54-8b71-4337-971b-440a5b56f00d%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/06e9ce54-8b71-4337-971b-440a5b56f00d%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFOXtchcMm%3D3PN2gwA6P%2B%2BZoDNtSpwRAk51e2yNXuAdNQ%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: creation_date in index setteing
We are using version 1.3.0 On Apr 20, 2015 7:38 PM, Colin Goodheart-Smithe-2 [via Elasticsearch Users] ml-node+s115913n4073856...@n3.nabble.com wrote: Prashant, What version of Elasticsearch are you using? The index creation date added to the index settings API in version 1.4.0 and will only show for indices created with that version or later (see https://github.com/elastic/elasticsearch/pull/7218). Colin On Monday, April 20, 2015 at 2:23:37 PM UTC+1, Prashy wrote: Hi All, I also require the indexing time to be returned by ES, but when i am firing the query like curl -XGET 'a href=http://192.168.0.179:9200/16-04-2015-index/_settings; target=_blank rel=nofollow onmousedown=this.href=' http://www.google.com/url?q\75http%3A%2F%2F192.168.0.179%3A9200%2F16-04-2015-index%2F_settings\46sa\75D\46sntz\0751\46usg\75AFQjCNFc9za3EfTh2KMLpEv3Zd6SLxpoYw';return http://www.google.com/url?q%5C75http%3A%2F%2F192.168.0.179%3A9200%2F16-04-2015-index%2F_settings%5C46sa%5C75D%5C46sntz%5C0751%5C46usg%5C75AFQjCNFc9za3EfTh2KMLpEv3Zd6SLxpoYw';return true; onclick=this.href=' http://www.google.com/url?q\75http%3A%2F%2F192.168.0.179%3A9200%2F16-04-2015-index%2F_settings\46sa\75D\46sntz\0751\46usg\75AFQjCNFc9za3EfTh2KMLpEv3Zd6SLxpoYw';return http://www.google.com/url?q%5C75http%3A%2F%2F192.168.0.179%3A9200%2F16-04-2015-index%2F_settings%5C46sa%5C75D%5C46sntz%5C0751%5C46usg%5C75AFQjCNFc9za3EfTh2KMLpEv3Zd6SLxpoYw';return true;http://192.168.0.179:9200/16-04-2015-index/_settings' I am not able to get the index_creation time and getting the response as : {16-04-2015-index:{settings:{index:{uuid:rHmX564PSnuI8cye4GxA1g,number_of_replicas:0,number_of_shards:5,version:{created:1030099} Please let me know how i can get the index creation time for the same. ~Prashant -- View this message in context: a href= http://elasticsearch-users.115913.n3.nabble.com/creation-date-in-index-setteing-tp4073837p4073853.html; target=_blank rel=nofollow onmousedown=this.href=' http://www.google.com/url?q\75http%3A%2F%2Felasticsearch-users.115913.n3.nabble.com%2Fcreation-date-in-index-setteing-tp4073837p4073853.html\46sa\75D\46sntz\0751\46usg\75AFQjCNHHOI73zG2-BxkPf7wBxGUzmYqbtQ';return http://www.google.com/url?q%5C75http%3A%2F%2Felasticsearch-users.115913.n3.nabble.com%2Fcreation-date-in-index-setteing-tp4073837p4073853.html%5C46sa%5C75D%5C46sntz%5C0751%5C46usg%5C75AFQjCNHHOI73zG2-BxkPf7wBxGUzmYqbtQ';return true; onclick=this.href=' http://www.google.com/url?q\75http%3A%2F%2Felasticsearch-users.115913.n3.nabble.com%2Fcreation-date-in-index-setteing-tp4073837p4073853.html\46sa\75D\46sntz\0751\46usg\75AFQjCNHHOI73zG2-BxkPf7wBxGUzmYqbtQ';return http://www.google.com/url?q%5C75http%3A%2F%2Felasticsearch-users.115913.n3.nabble.com%2Fcreation-date-in-index-setteing-tp4073837p4073853.html%5C46sa%5C75D%5C46sntz%5C0751%5C46usg%5C75AFQjCNHHOI73zG2-BxkPf7wBxGUzmYqbtQ';return true;http://elasticsearch-users.115913.n3.nabble.com/creation-date-in-index-setteing-tp4073837p4073853.html Sent from the Elasticsearch Users mailing list archive at Nabble.com. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email] http:///user/SendEmail.jtp?type=nodenode=4073856i=0. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b12a4019-c5ab-4e0f-aaec-0d3ab083da4d%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/b12a4019-c5ab-4e0f-aaec-0d3ab083da4d%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- If you reply to this email, your message will be added to the discussion below: http://elasticsearch-users.115913.n3.nabble.com/creation-date-in-index-setteing-tp4073837p4073856.html To unsubscribe from creation_date in index setteing, click here http://elasticsearch-users.115913.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=4073837code=cHJhc2hhbnQuYWdyYXdhbEBwYWxhZGlvbi5uZXR8NDA3MzgzN3wtMTkxMzg2NjkyMw== . NAML http://elasticsearch-users.115913.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml -- View this message in context: http://elasticsearch-users.115913.n3.nabble.com/creation-date-in-index-setteing-tp4073837p4073861.html Sent from the Elasticsearch Users mailing list archive at Nabble.com. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to
Cannot specify a query in the target index and through es.query when working with ES, Wikipedia River and Hive
Hi, I've largely got everything setup to integrate ES and Hive. However, when I execute a query against the table wikitable as defined below, I get the error *Cannot specify a query in the target index and through es.query* Versions are ES Hive integration, 2.1.0.Beta3; ES, 1.4.4; and, I'm running on the latest Hadoop/Hive (installed last week directly from Apache). I suspect the error has to do with the definition of the resource for the external table wikipedia_river/page/_search?q which I found in this excellent article here http://ryrobes.com/systems/connecting-tableau-to-elasticsearch-read-how-to-query-elasticsearch-with-hive-sql-and-hadoop/ things may have changed however since the article was written. For instance, I had to take the es.host property out of the table definition and instead make sure I used es.nodes in a set statement in Hive. Things like that usually take a bit of digging and experimenting to figure out. I've gotten to where there's an attempt to execute the query and would appreciate it if anyone could shed some light on how to work beyond this point. Thanks! Hive statements -- set es.nodes=peace; set es.port=9201; DROP TABLE IF EXISTS wikitable; CREATE EXTERNAL TABLE wikitable ( title string, redirect_page string ) STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler' TBLPROPERTIES('es.resource' = 'wikipedia_river/page/_search?q=*' ); select count(distinct title) from wikitable; -- Full stack trace: -- org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: Cannot specify a query in the target index and through es.query at org.elasticsearch.hadoop.rest.Resource.init(Resource.java:48) at org.elasticsearch.hadoop.rest.RestRepository.init(RestRepository.java:88) at org.elasticsearch.hadoop.rest.RestService.findPartitions(RestService.java:226) at org.elasticsearch.hadoop.mr.EsInputFormat.getSplits(EsInputFormat.java:406) at org.elasticsearch.hadoop.hive.EsHiveInputFormat.getSplits(EsHiveInputFormat.java:112) at org.elasticsearch.hadoop.hive.EsHiveInputFormat.getSplits(EsHiveInputFormat.java:51) at org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:306) at org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:408) at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getCombineSplits(CombineHiveInputFormat.java:361) at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:571) at org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:624) at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:616) at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:492) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1296) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1293) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.hadoop.mapreduce.Job.submit(Job.java:1293) at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:562) at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:557) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:557) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:548) at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:428) at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:137) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:88) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1638) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1397) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1183) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1039) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:207) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:159) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:370) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:754) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:615) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
Elasticsearch puppet module's problem
Dear support, I'm trying to setup Es using this module - https://github.com/elastic/puppet-elasticsearch/blob/master/README.md For sake of testing I spin up a new VM and try to apply default module on it node somenode{ include elasticsearch } It ends up with rpm package installed but absent service Info: Applying configuration version '1429537010' Notice: /Stage[main]/Elasticsearch::Package/Package[elasticsearch]/ensure: created Info: Computing checksum on file /etc/init.d/elasticsearch Info: FileBucket got a duplicate file {md5}cbd974e711b257c941dddca3aef79e46 Info: /Stage[main]/Elasticsearch::Config/File[/etc/init.d/elasticsearch]: Filebucketed /etc/init.d/elasticsearch to puppet with sum cbd974e711b257c941dddca3aef79e46 Notice: /Stage[main]/Elasticsearch::Config/File[/etc/init.d/elasticsearch]/ensure: removed Info: Computing checksum on file /usr/lib/systemd/system/elasticsearch.service Info: FileBucket got a duplicate file {md5}649153421e86836314671510160c3798 Notice: /Stage[main]/Elasticsearch::Config/File[/usr/lib/systemd/system/elasticsearch.service]/ensure: removed Notice: /Stage[main]/Elasticsearch::Config/File[/usr/share/elasticsearch/lib/spatial4j-0.4.1.jar]/owner: owner changed 'root' to 'elasticsearch' Notice: /Stage[main]/Elasticsearch::Config/File[/usr/share/elasticsearch/lib/spatial4j-0.4.1.jar]/group: group changed 'root' to 'elasticsearch' Notice: /Stage[main]/Elasticsearch::Config/File[/usr/share/elasticsearch/lib/asm-4.1.jar]/owner: owner changed 'root' to 'elasticsearch' Notice: /Stage[main]/Elasticsearch::Config/File[/usr/share/elasticsearch/lib/asm-4.1.jar]/group: group changed 'root' to 'elasticsearch' Notice: /Stage[main]/Elasticsearch::Config/File[/usr/share/elasticsearch/lib/lucene-join-4.10.4.jar]/owner: owner changed 'root' to 'elasticsearch' Notice: /Stage[main]/Elasticsearch::Config/File[/usr/share/elasticsearch/lib/lucene-join-4.10.4.jar]/group: group changed 'root' to 'elasticsearch' ... Info: Computing checksum on file /etc/elasticsearch/elasticsearch.yml Info: FileBucket got a duplicate file {md5}08a09998560b7b786eca1e594b004ddc Info: /Stage[main]/Elasticsearch::Config/File[/etc/elasticsearch/elasticsearch.yml]: Filebucketed /etc/elasticsearch/elasticsearch.yml to puppet with sum 08a09998560b7b786eca1e594b004ddc Notice: /Stage[main]/Elasticsearch::Config/File[/etc/elasticsearch/elasticsearch.yml]/ensure: removed Info: Computing checksum on file /etc/elasticsearch/logging.yml Info: FileBucket got a duplicate file {md5}c0d21de98dc9a6015943dc47f2aa18f5 Info: /Stage[main]/Elasticsearch::Config/File[/etc/elasticsearch/logging.yml]: Filebucketed /etc/elasticsearch/logging.yml to puppet with sum c0d21de98dc9a6015943dc47f2aa18f5 Notice: /Stage[main]/Elasticsearch::Config/File[/etc/elasticsearch/logging.yml]/ensure: removed Notice: /Stage[main]/Elasticsearch::Config/File[/usr/share/elasticsearch/lib/jna-4.1.0.jar]/owner: owner changed 'root' to 'elasticsearch' Notice: /Stage[main]/Elasticsearch::Config/File[/usr/share/elasticsearch/lib/jna-4.1.0.jar]/group: group changed 'root' to 'elasticsearch' Notice: Finished catalog run in 43.57 seconds Also I noticed that module doesn't handle repository in default state, so I need to include variables elasticsearch::manage_repo: true elasticsearch::repo_version: 1.5 So what am I doing wrong ? Regards Sergey -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/d6e9bd4d-fbb1-44e7-82c1-f44705c8d61e%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Elasticseach issue with some indicies not populating data
Hey Christian, 8 gigs of ram -Xms6g -Xmx6g Don Pich | Jedi Master (aka System Administrator 2) | O: 701-952-5925 3320 Westrac Drive South, Suite A * Fargo, ND 58103 Facebook http://www.facebook.com/RealTruck | Youtube http://www.youtube.com/realtruckcom| Twitter http://twitter.com/realtruck | Google+ https://google.com/+Realtruck | Instagram http://instagram.com/realtruckcom | Linkedin http://www.linkedin.com/company/realtruck | Our Guiding Principles http://www.realtruck.com/our-guiding-principles/ “If it goes on a truck we got it, if it’s fun we do it” – RealTruck.com http://realtruck.com/ On Mon, Apr 20, 2015 at 10:29 AM, christian.dahlqv...@elasticsearch.com wrote: Hi, Having read through the thread it sounds like your configuration has been working in the past. Is that correct? If this is the case I would reiterate David's initial questions about your node's RAM and heap size as the number of shards look quite large for a single node. Could you please provide information about this? Best regards, Christian On Sunday, April 19, 2015 at 8:08:05 PM UTC+1, dp...@realtruck.com wrote: I am new to elasticsearch and have a problem. I have 5 indicies. At first all of them were running without issue. However, over the last 2 weeks, all but one have stopped generating data. I have run a tcpdump on the logstash server and confirmed that logging packets are getting to the server. I have looked into the servers health. I have issued the following to check on the cluster: root@logstash:/# curl -XGET 'localhost:9200/_cluster/health?pretty=true' { cluster_name : es-logstash, status : yellow, timed_out : false, number_of_nodes : 1, number_of_data_nodes : 1, active_primary_shards : 2791, active_shards : 2791, relocating_shards : 0, initializing_shards : 0, unassigned_shards : 2791 } root@logstash:/# Can some one please point me in the right direction on troubleshooting this? -- You received this message because you are subscribed to a topic in the Google Groups elasticsearch group. To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/0GEaRABjLQY/unsubscribe. To unsubscribe from this group and all its topics, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/2a4d7543-b110-499b-a8d3-ccfa19284617%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/2a4d7543-b110-499b-a8d3-ccfa19284617%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAHjBx_SY_%3DhVpsSNZ9urA2MGqetg0QrfOuorY_2rc0uCu_%2B1Xg%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Search Scroll issue
We are using scroll to do paging. We are encountering an issue where the last result from the initial search appears as the first result in our scroll request. so.. hits[length-1] == nextPageHits[0] This only seems to occur after we do a large series of writes and searches. Initially it doesn't occur. Is this behavior intended or a known issue? -shawn -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/fce7ae9f-87ec-4956-926e-57328522ee2d%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
how to detect changes in database and automatically adding new row to elasticsearch index
What I've already done: I connected my hbase with elasticsearch via this tutorial: http://lessc0de.github.io/connecting_hbase_to_elasticsearch.html And I get index with hbase table content, but after adding new row to hbase, it is not automatically added to elasticsearch index. I tried to add this line to my conf: schedule : * 1/5 * ? * * and mapping: mappings: { jdbc : { _id : { path : ID } } } which assigns _id = ID, and ID has unique value in my hbase table. It's working well: when I add new row to hbase it is uploaded to index in less then 5 minutes. But it is not good for performance, because every 5 minutes it executes a query and doesn't add old content to index only because of _id has to be unique. It is good for small db, but I had over 10 millions row in my hbase table, so my index is working all time. It is any solution or plugin to elasticsearch to automatically detected changes in db and add only the new row to index? I create index using: curl -XPUT 'localhost:9200/_river/jdbc/_meta' -d '{ type : jdbc, jdbc : { url : jdbc:phoenix:localhost, user : , password : , sql : select ID, MESSAGE from test, schedule : * 1/5 * ? * * } }' Thanks for help. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1ea1f467-171a-48cf-8661-1e1dddc8db31%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Elasticseach issue with some indicies not populating data
HI, That sounds like a very large amount of shards for a node that size, and this is most likely the source of your problems. Each shard in Elasticsearch corresponds to a Lucene instance and carries with it a certain amount of overhead. You therefore do not want your shards to be too small. For logging use cases a common shard size at least a few GB. If you are using daily indices and the default 5 shards per index, you may want to consider reducing the shard count for each of your indices and/or switch to weekly or perhaps monthly indices in order to reduce the number of shards created each day and increase the average shard size going forward. In order to get the instance working again you may also need to start closing the older insides in order to bring down the number of active shards and/or upgrade the node to get more RAM. Best regards, Christian On Monday, April 20, 2015 at 4:38:53 PM UTC+1, Don Pich wrote: Hey Christian, 8 gigs of ram -Xms6g -Xmx6g Don Pich | Jedi Master (aka System Administrator 2) | O: 701-952-5925 3320 Westrac Drive South, Suite A * Fargo, ND 58103 Facebook http://www.facebook.com/RealTruck | Youtube http://www.youtube.com/realtruckcom| Twitter http://twitter.com/realtruck | Google+ https://google.com/+Realtruck | Instagram http://instagram.com/realtruckcom | Linkedin http://www.linkedin.com/company/realtruck | Our Guiding Principles http://www.realtruck.com/our-guiding-principles/ “If it goes on a truck we got it, if it’s fun we do it” – RealTruck.com http://realtruck.com/ On Mon, Apr 20, 2015 at 10:29 AM, christian...@elasticsearch.com javascript: wrote: Hi, Having read through the thread it sounds like your configuration has been working in the past. Is that correct? If this is the case I would reiterate David's initial questions about your node's RAM and heap size as the number of shards look quite large for a single node. Could you please provide information about this? Best regards, Christian On Sunday, April 19, 2015 at 8:08:05 PM UTC+1, dp...@realtruck.com wrote: I am new to elasticsearch and have a problem. I have 5 indicies. At first all of them were running without issue. However, over the last 2 weeks, all but one have stopped generating data. I have run a tcpdump on the logstash server and confirmed that logging packets are getting to the server. I have looked into the servers health. I have issued the following to check on the cluster: root@logstash:/# curl -XGET 'localhost:9200/_cluster/health?pretty=true' { cluster_name : es-logstash, status : yellow, timed_out : false, number_of_nodes : 1, number_of_data_nodes : 1, active_primary_shards : 2791, active_shards : 2791, relocating_shards : 0, initializing_shards : 0, unassigned_shards : 2791 } root@logstash:/# Can some one please point me in the right direction on troubleshooting this? -- You received this message because you are subscribed to a topic in the Google Groups elasticsearch group. To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/0GEaRABjLQY/unsubscribe. To unsubscribe from this group and all its topics, send an email to elasticsearc...@googlegroups.com javascript:. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/2a4d7543-b110-499b-a8d3-ccfa19284617%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/2a4d7543-b110-499b-a8d3-ccfa19284617%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b7853329-aa6b-4cd5-b8e6-dfb30d779509%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Elasticseach issue with some indicies not populating data
Hi, Having read through the thread it sounds like your configuration has been working in the past. Is that correct? If this is the case I would reiterate David's initial questions about your node's RAM and heap size as the number of shards look quite large for a single node. Could you please provide information about this? Best regards, Christian On Sunday, April 19, 2015 at 8:08:05 PM UTC+1, dp...@realtruck.com wrote: I am new to elasticsearch and have a problem. I have 5 indicies. At first all of them were running without issue. However, over the last 2 weeks, all but one have stopped generating data. I have run a tcpdump on the logstash server and confirmed that logging packets are getting to the server. I have looked into the servers health. I have issued the following to check on the cluster: root@logstash:/# curl -XGET 'localhost:9200/_cluster/health?pretty=true' { cluster_name : es-logstash, status : yellow, timed_out : false, number_of_nodes : 1, number_of_data_nodes : 1, active_primary_shards : 2791, active_shards : 2791, relocating_shards : 0, initializing_shards : 0, unassigned_shards : 2791 } root@logstash:/# Can some one please point me in the right direction on troubleshooting this? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/2a4d7543-b110-499b-a8d3-ccfa19284617%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: shingle filter for sub phrase matching
Did you ever figure this out? I have the same exact issue but using different words. On Wednesday, July 23, 2014 at 10:37:03 AM UTC-4, Nick Tackes wrote: I have created a gist with an analyzer that uses filter shingle in attempt to match sub phrases. For instance I have entries in the table with discrete phrases like EGFR Lung Cancer Lung Cancer and I want to match these when searching the phrase 'EGFR related lung cancer My expectation is that the multi word matches score higher than the single matches, for instance... 1. Lung Cancer 2. Lung 3. Cancer 4. EGFR Additionally, I tried a standard analyzer match but this didn't yield the desired result either. One complicating aspect to this approach is that the min_shingle_size has to be 2 or more. How then would I be able to match single words like 'EGFR' or 'Lung'? thanks https://gist.github.com/nicktackes/ffdbf22aba393efc2169.js -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/9f480904-aca7-468b-9d43-4243b65899df%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Horizontal Bar Chart in Kibana
Hi, I need to implement horizontal bar chart in Kibana 4. I need help regarding. Please let me know. Thanks, Vijay.C.N -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/fef6d80d-364f-46ee-a7dc-91f45a0e8dc3%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Query boost values available in script_score?
Hi. Are query boost values available in script_score? Read the documentation with no success but perhaps I overlooked something. Thanks in advance. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5546904e-17b3-45a2-8586-14ff9fa6aea8%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Shards get not distributed across the cluster
Hey guys, i am struggeling atm with the shard allocation in my ES cluster. The problem is that i got the index [logstash-2015.04.13] that lays on the node Storage1 with 12 shards. I want that the index get evenly distributed to the 2 other Storage-nodes, the node with the SSDs in it should be avoided for allocation The settings of the index are: settings for logstash-2015.04.13 { index : { routing : { allocation : { include : { tag : }, require : { tag : }, exclude : { tag : SSD }, total_shards_per_node : 4 } }, Unfortunately i get the following logs in ES: [2015-04-20 08:01:55,070][DEBUG][cluster.routing.allocation] [SSD] [logstash-2015.04.13][4] allocated on [Node Info of Storage1], but can no longer be allocated on it, moving... [DEBUG][cluster.routing.allocation] [SSD] [logstash-2015.04.13][4] can't move I disabled the disk-based shard allocation to make sure that this isnt the problem. So is there a way i can see what denies the shard allocation to the other nodes? maybe a log of the Allocation-decider i can enable? Thanks in advance. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1692d1d9-ceaa-4235-9334-0d3d1947de75%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.