Re: Update Mapping for JDBC river freezes till next request is received

2014-05-27 Thread joergpra...@gmail.com
You should upgrade ES, there were bugs fixed regarding cluster update service and rivers. Jörg On Tue, May 27, 2014 at 6:44 PM, André Morais ano...@gmail.com wrote: Hello, I am using the JDBC river plugin (latest version with the name elasticsearch-river-jdbc-2.2.1.jar on ES 0.90.5) and

Re: implementing a plugin to process the whole input document

2014-05-27 Thread joergpra...@gmail.com
Yes, it is (not only) relevant to library catalog indexing, because Bibframe, a new project by Library of Congress, is built on RDF, and next-generation library systems will embrace W3C semantic web technologies. The RDF data I generate is indexed in JSON-LD format into Elasticsearch but for

Re: Sequence Numbers for Replica Recovery

2014-05-28 Thread joergpra...@gmail.com
I'm not sure if this is related but there is work on designing sequence numbers that are decentralized time based UUIDs. If they were assigned to Lucene segments, shards could declare what segments they already have, when a recovery process runs. Feature is planned for 1.3

Re: looking for heavy write optimization

2014-05-28 Thread joergpra...@gmail.com
For maximum write performance, you should - use fastest disk subsystem (SSD) - use RAID 0 with expensive controller to max out IO bandwidth - do not run more than one ES instance per server - do not use virtual servers, use physical servers - for ES data folder, disable acess time flag (noatime),

Re: How to retrieve just certain amount of docs from a larger query?

2014-05-28 Thread joergpra...@gmail.com
Look into the scan/scroll query http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-scroll.html It works like a cursor that iterates through all docs of a query result Jörg On Wed, May 28, 2014 at 1:42 PM, Tom t.opp...@superreal.de wrote: Hi, i need to fire

Re: Elasticsearch and Smile encoded JSON

2014-05-30 Thread joergpra...@gmail.com
, the communication/storage won’t be compressed using LZF? - Drew On May 29, 2014, at 2:52 PM, joergpra...@gmail.com wrote: 1. No (the cluster state of ES - not part of Lucene - is saved to disk in SMILE format) 2. No. 3. Yes, you can use SMILE on XContentBuilder classes. The result can

Re: IDF per customer, many customers per index - best practices

2014-05-30 Thread joergpra...@gmail.com
IDF is calculated per shard, and only in DFS search types, it is calculated over all nodes in an initial scatter phase. http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/_search_options.html#_literal_search_type_literal If you are concerned about IDF in a single multi-user index

Re: Improving a slow running Match_All Query

2014-05-30 Thread joergpra...@gmail.com
Is match_all always running at that time or is it getting faster after a first run? Did you run an optimize with maximum number of segments? What is your segment count? Jörg On Fri, May 30, 2014 at 9:20 PM, sai...@roblox.com wrote: *Bump* On Wednesday, May 28, 2014 4:10:26 PM UTC-7,

Re: Elasticsearch 1.20 and 1.1.2

2014-05-31 Thread joergpra...@gmail.com
Just look into org.elasticsearch.rest.BytesRestResponse, it supersedes XContentRestResponse Jörg On Sat, May 31, 2014 at 12:28 AM, Ben McCann benjamin.j.mcc...@gmail.com wrote: Jörg thanks for the heads up about XContentRestResponse going away. I've run into that as an issue with a river I

Re: ES 1.1.1 - Plugins _site not found

2014-05-31 Thread joergpra...@gmail.com
Each time you start a node, may it be a (transport) client node or a server node, all plugins are checked/loaded at initialization. Each plugin, also jvm plugins on the classpath, is by default examined if a directory named _site can be accessed. The purpose is to classify a plugin as site

Re: Elasticsearch 1.20 and 1.1.2

2014-05-31 Thread joergpra...@gmail.com
any suggestions for replacing XContentThrowableRestResponse and RestXContentBuilder? Thanks, Ben On Sat, May 31, 2014 at 2:35 AM, joergpra...@gmail.com joergpra...@gmail.com wrote: Just look into org.elasticsearch.rest.BytesRestResponse, it supersedes XContentRestResponse Jörg

Re: RFC 6902 requires variant type mapping

2014-06-02 Thread joergpra...@gmail.com
You'd have to use a plugin for such kind of operations, because vanilla ES does not support RFC 6902 I'm also interested in supporting HTTP PATCH by Elasticsearch, because this is a must have for modifying resources due to the rules of Linked Data Platform (LDP)

Re: Configuring cross-cloud cluster via REST API

2014-06-02 Thread joergpra...@gmail.com
You have to restart the whole cluster. Switching discovery while running a cluster is not possible. Jörg On Mon, Jun 2, 2014 at 12:49 PM, Martin Harris martin.har...@cloudsoftcorp.com wrote: Hi Folks, I'm trying to setup a cross-cloud elastic-search cluster. As it's cross-cloud, the usual

[ANN] Elasticsearch Simple Action Plugin

2014-06-03 Thread joergpra...@gmail.com
Hi, many of us want to start writing extensions for Elasticsearch. Except submitting pull requests to the core code, one great advantage of Elasticsearch is the plugin mechanism. Here, custom code can be hooked into Elasticsearch, without having to ask for inclusion into the core code.

Re: Migration from Solr to ElasticSearch

2014-06-03 Thread joergpra...@gmail.com
If you have indexed the data in Solr, you should consider a tool that can traverse the Lucene index and reconstruct the documents. This is not a straightforward process, as you know already, because analyzed fields look different than the original input. The reconstruction may not recover the

Re: Migration from Solr to ElasticSearch

2014-06-03 Thread joergpra...@gmail.com
If you can iterate over the Solr index doc ids and fetch the source docs from a secondary storage, you should consider doing this first. This is the most straightforward method for reindexing. Otherwise, if you can not access the filesystem storage for the docs (for whatever reason), the idea

Re: [ANN] Elasticsearch Simple Action Plugin

2014-06-03 Thread joergpra...@gmail.com
Usually, plugins that extend internal ES functionality should be installed on all nodes. This is easy to remember and preferable from an administrative view. All the nodes in the ES cluster must have access to plugin code under all circumstances, especially when executing actions, mappers,

Re: What's using memory in ElasticSearch? (Details to follow...)

2014-06-03 Thread joergpra...@gmail.com
What ES version is this? Your segment count is very high (1000) which is not efficient. Maybe index.codec.bloom.load: false can help reducing heap mem usage. http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules-codec.html Jörg -- You received this message

Re: All primary shards are in same node. Why? Version 1.1.1

2014-06-03 Thread joergpra...@gmail.com
Primary shards are addressed first when writing, but it is a myth they do all the writing. Secondary shards do the writing too, but only some milli seconds later. There is nothing to worry about. Jörg On Tue, Jun 3, 2014 at 9:49 PM, Santiago Ferrer Deheza sa.ferrer.deh...@gmail.com wrote: Hi

Re: [ANN] Elasticsearch Simple Action Plugin

2014-06-03 Thread joergpra...@gmail.com
Not sure if I understand your concern completely - as long as you're doing things right in your code, it should be possible to allocate resources only when required - this holds also for plugins. Jörg On Tue, Jun 3, 2014 at 11:48 PM, virgil virgil...@gmail.com wrote: Thank you Jörg. I see the

Re: Best cluster environment for search

2014-06-03 Thread joergpra...@gmail.com
Can you show your test code? You seem to look at the wrong settings - by adjusting node number, shard number, replica number alone, you can not find out the maximum node performance. E.g. concurrency settings, index optimizations, query optimizations, thread pooling, and most of all, fast disk

Re: [ANN] Elasticsearch Simple Action Plugin

2014-06-04 Thread joergpra...@gmail.com
You need resources on all nodes that hold shards, you can not do it with just one instance, because ES index is distributed. Rescoring would be very expensive if you did it on an extra central instance with an extra scatter/gather phase. It is also very expensive in scripting. A better method is

Re: iptablex trojan experiences?

2014-06-04 Thread joergpra...@gmail.com
One very essential feature, from the very beginning, is that Elasticsearch instances, when started, automatically form a cluster over the network. This is only possible in an open network environment and by having multicast enabled. Are you aware, that by talking about safe configuration options

Re: [ANN] Elasticsearch Simple Action Plugin

2014-06-04 Thread joergpra...@gmail.com
the internals and there are no code level comments. I always meant to experiment with the different action hierarchies via simple plugins and document my findings. Perhaps one day... Cheers, Ivan On Wed, Jun 4, 2014 at 1:09 AM, joergpra...@gmail.com joergpra...@gmail.com wrote: Sorry

Re: [ANN] Elasticsearch Simple Action Plugin

2014-06-04 Thread joergpra...@gmail.com
As said, it is true that scoring scripts (like the function score scripts o the AbstractSearchScript) need to reside on data nodes. Accessing fields is a low level operation in a script so it is not possible to install such a boost plugin that uses scripting on a data-less node. You would have to

Re: [ANN] Elasticsearch Simple Action Plugin

2014-06-04 Thread joergpra...@gmail.com
://manning.com/synhershko/ On Tue, Jun 3, 2014 at 6:15 PM, joergpra...@gmail.com joergpra...@gmail.com wrote: Hi, many of us want to start writing extensions for Elasticsearch. Except submitting pull requests to the core code, one great advantage of Elasticsearch is the plugin mechanism. Here

Re: Best cluster environment for search

2014-06-04 Thread joergpra...@gmail.com
Why do you use terms on _id field and not the the ids filter? ids filter is more efficient since it reuses the _uid field which is cached by default. Do the terms in the query vary from query to query? If so, caching might kill your heap. Another possible issue is that your query is not

Re: [ANN] Elasticsearch Simple Action Plugin

2014-06-05 Thread joergpra...@gmail.com
One more hint, you see org.elasticsearch.common.lucene.search.function.FieldValueFunction This implements the ScoreFunction and fetches boost values from a configured field in the doc, for use by the Java API for FunctionScoreQuery. If you can write a custom ScoreFunction, you could implement

Re: Inter-document Queries

2014-06-05 Thread joergpra...@gmail.com
A suggestion for the path model: - index also the path depth, and name the fields with the depth level - execute a nested aggregation query over the path depth levels Example doc with path info: { path0 : promo/A, path1 : sale/B ... } In this doc you know the user went from promo/A to

Re: Java Client - Error Handling

2014-06-05 Thread joergpra...@gmail.com
Do you use TransportClient or NodeClient? On NodeClient, you are tied to the cluster, as the node is being a part of it, on TransportClient, you can count the connected nodes. The discovery mechanism behind the scenes sends ping actions each few seconds for you. If an action fails, you will see

Re: Java Client - Error Handling

2014-06-05 Thread joergpra...@gmail.com
Check the Elasticsearch test code. There, you can see how Java API works. For example GetIndexTemplatesResponse response = client().admin().indices().prepareGetTemplates().get(); You can get an empty response if template does not exist, or the execution throws an exception, when something went

Re: Shard count and plugin questions

2014-06-05 Thread joergpra...@gmail.com
The knapsack plugin does not come with a downtime. You can increase shards on the fly by copying an index over to another index (even on another cluster). The index should be write disabled during copy though. Increasing replica level is a very simple command, no index copy required. It seems

Re: Shard count and plugin questions

2014-06-05 Thread joergpra...@gmail.com
, 2014 at 9:21 AM, joergpra...@gmail.com joergpra...@gmail.com wrote: The knapsack plugin does not come with a downtime. You can increase shards on the fly by copying an index over to another index (even on another cluster). The index should be write disabled during copy though. Increasing

Re: A plugin to change the result set before sending it back to the http client

2014-06-05 Thread joergpra...@gmail.com
Just a quick question, do you just want to extract a field from the json source? There are field filters and parameters for shaping such a JSON result, maybe they can already help? Or can you give an example of the problem? Jörg On Thu, Jun 5, 2014 at 7:45 PM, Mario Mueller ma...@xenji.com

Re: A plugin to change the result set before sending it back to the http client

2014-06-05 Thread joergpra...@gmail.com
RestResponseListener that takes a SearchResponse and creates a simplified version with no metadata. Should be an interesting quick plugin, but it looks like Jorg is going to beat me to it (I'm still at work for several more hours). -- Ivan On Thu, Jun 5, 2014 at 1:08 PM, joergpra...@gmail.com

Re: Shard count and plugin questions

2014-06-05 Thread joergpra...@gmail.com
probably come up with 2 indexing strategies we can apply to an application's index based on the heuristics from the operations they're performing. Thanks for the feedback! Todd On Thu, Jun 5, 2014 at 10:55 AM, joergpra...@gmail.com joergpra...@gmail.com wrote: Thanks for raising

Re: A plugin to change the result set before sending it back to the http client

2014-06-05 Thread joergpra...@gmail.com
, but I noticed that you provided your own parseSearchRequest, but still call RestSearchAction.parseSearchRequest from inside handleRequest. Did I misinterpret the code or is that a mistake? -- Ivan On Thu, Jun 5, 2014 at 2:37 PM, joergpra...@gmail.com joergpra...@gmail.com wrote: OK, I

Re: Could a custom Aggregator be used for general purpose Map/Reduce or bulk update?

2014-06-05 Thread joergpra...@gmail.com
I try to answer some of the queries though I must admit, I am not too much familiar with the aggregation source code yet (still exploring). Aggregations work like a search, they are embedded into the search actions, and work over the result set of a search. They run in each shard, just like the

Re: If I set index.number_of_replica:1, then the minimum number of nodes should be 3 to assure that the status of the cluster is gree?

2014-06-06 Thread joergpra...@gmail.com
1. No. Did you change the configuration? You have two data nodes connected? 2. You do not need to be concerned where primary shards are allocated, secondary shards play the same role (except primaries receive writes first a few milliseconds earlier than secondaries). Elasticsearch randomly

Re: A plugin to change the result set before sending it back to the http client

2014-06-06 Thread joergpra...@gmail.com
I drink Kölsch only :) ävver et hätt noh immer joot jejange Greetings from Cologne! Jörg On Fri, Jun 6, 2014 at 7:14 AM, Mario Mueller ma...@xenji.com wrote: You guys are totally awesome! Thanks a lot! If you ever visit Duesseldorf drop me a line, I owe you a beer. @Brian: Interesting

Re: Correct way to use TransportClient connection object

2014-06-06 Thread joergpra...@gmail.com
Closing the transport client may not be enough. Try this: - wait for all outstanding actions (all actions send responses asynchronously) - then shut down client.threadpool() (perhaps with shutdownNow() or shutdown()), this effectively disables new actions form being started - then close the

Re: Analyzing queries in the client side of Elasticsearch but not on the server

2014-06-06 Thread joergpra...@gmail.com
Please ask your question here. Thanks. Jörg On Fri, Jun 6, 2014 at 9:28 AM, ohw o...@zhihu.com wrote: Hi folks I just asked a question in StackOverflow, please have a look if you have encountered similar problem or have some input to it. Thanks in advance! -- You received this message

Re: Analyzing queries in the client side of Elasticsearch but not on the server

2014-06-06 Thread joergpra...@gmail.com
the query parsers into elasticsearch, would you please elaborate more on this? On Fri, Jun 6, 2014 at 4:53 PM, joergpra...@gmail.com joergpra...@gmail.com wrote: The Query DSL is not equivalent to Lucene Query but close to, with enhancements. If you want to make use of Lucene Query

Re: If I set index.number_of_replica:1, then the minimum number of nodes should be 3 to assure that the status of the cluster is gree?

2014-06-06 Thread joergpra...@gmail.com
this happened..Is there something I ignore? I want to know how ES allocates nodes. Is there some reference? I googled but couldn't find it. Thank you :D On Fri, Jun 6, 2014 at 3:05 PM, joergpra...@gmail.com joergpra...@gmail.com wrote: 1. No. Did you change the configuration? You have

Re: Get by _id doesn't work but search does.

2014-06-06 Thread joergpra...@gmail.com
Look here for the tool and how to use it http://www.elasticsearch.org/blog/tool-help-routing-issues-elasticsearch-1-2-0/ Jörg On Fri, Jun 6, 2014 at 11:24 AM, Luke Wilson-Mawer lukewilsonma...@gmail.com wrote: Great, thanks Adrien. I will eagerly await the tool. Kind regards, Luke On

Re: What's using memory in ElasticSearch? (Details to follow...)

2014-06-06 Thread joergpra...@gmail.com
No, the settings will not merge existing segments unless you call _optimize action via API. And take some patience. Thousands of segments take time - also, they need quite few memory resources to merge... I suggest backup your data first, to stay safe if the merging fails / aborts... Jörg On

Re: Max doc size for indexing over HTTP

2014-06-06 Thread joergpra...@gmail.com
1gb is a very large document and it is unusual to index such sizes. There is a limit check against the heap. In order to be able to process such length, you need a large heap alone to store the document source. Depending on analyzer, heap demand increases even more. You can index documents of

Re: [ANN] Elasticsearch Simple Action Plugin

2014-06-06 Thread joergpra...@gmail.com
I mean, you can add a MyOwnFunctionBuilder/MyOwnFunctionParser to Elasticsearch via plugin. See package org.elasticsearch.index.query.functionscore for the standard implementations. The functionscore code is masterpiece quality - no need to modify existing code! It is pluggable. A close example

Re: [ANN] Elasticsearch Simple Action Plugin

2014-06-06 Thread joergpra...@gmail.com
For an example function score plugin implementation, see https://github.com/elasticsearch/elasticsearch/blob/master/src/test/java/org/elasticsearch/search/functionscore/FunctionScorePluginTests.java Jörg On Fri, Jun 6, 2014 at 7:10 PM, joergpra...@gmail.com joergpra...@gmail.com wrote: I

Re: [ANN] Elasticsearch Simple Action Plugin

2014-06-07 Thread joergpra...@gmail.com
I have implemented a function score based conditional boost plugin for demonstration. Very useful for faking relevance scoring, in dependency of document field values which were originally not meant to contribute for boosting. A list of boost values can be specified in dependency of indexed

Re: What's the difference between bind_host and publish_host in ElasticSearch?

2014-06-07 Thread joergpra...@gmail.com
bind_host is the host that an Elasticsearch node uses in the socket bind call when starting the network. Due to socket programming model, you can bind to an address. By referencing an address, the socket allows access to one or all underlying network devices. There are several addresses with

Re: What's using memory in ElasticSearch? (Details to follow...)

2014-06-07 Thread joergpra...@gmail.com
Maybe the segment count is just counting new segments as they are created... can you look into the data folders to examine if the segment file count is still high? And can you verify if the settings are really active... not sure what's going on without seeing details. The _optimize call takes a

Re: compresstion in ES 1.2.1

2014-06-08 Thread joergpra...@gmail.com
Compression is always enabled by default. Jörg On Sun, Jun 8, 2014 at 6:01 PM, sri 1.fr@gmail.com wrote: Hello everyone, I have read posts and blogs on how elasticsearch compression can be enabled in the previous versions(0.17 - 0.19). I am currently using ES 1.2.1, i wasn't able to

Re: compresstion in ES 1.2.1

2014-06-08 Thread joergpra...@gmail.com
The Elasticsearch file size does not only contain compressed fields, but much more. For example, term vectors, norms, etc. You would have to disable field attributes you do not want. Also note, Elasticsearch has replica enabled by default, and segment count is not optimized automatically. Jörg

Re: compresstion in ES 1.2.1

2014-06-08 Thread joergpra...@gmail.com
Lucene uses LZ4 compression http://blog.jpountz.net/post/35667727458/stored-fields-compression-in-lucene-4-1 so you should not run ES on a ZFS file system with compression enabled. Jörg On Sun, Jun 8, 2014 at 8:47 PM, Patrick Proniewski elasticsea...@patpro.net wrote: Hello, I don't

Re: compresstion in ES 1.2.1

2014-06-08 Thread joergpra...@gmail.com
Try this index template for new index creations curl -XPUT 'localhost:9200/_template/template1' -d ' { template : *, mappings : { _default_ : { _source : { enabled : false }, _all : { enabled : false} } } } ' See also

Re: JDBC river: trouble getting analyzer in type mapping to be applied

2014-06-09 Thread joergpra...@gmail.com
There is a bug in the JDBC river introduced recently that prevents it from using type_mapping parameter if there is no index_settings parameter defined. It will be fixed asap A work around might be adding an empty settings parameter like index_settings : {} Jörg On Mon, Jun 9, 2014 at 1:00

Re: Cannot Increase Write TPS in Elasticsearch by adding more nodes

2014-06-09 Thread joergpra...@gmail.com
There are many reasons that may cause this, just to name a few - benchmarking tool setup ( do they show correct numbers?) - network bandwidth limits - cluster setup (e.g. complex mapping, high latency between nodes) - pattern of the data input - method of data input (bulk vs. index, HTTP vs. Java

Re: Cannot Increase Write TPS in Elasticsearch by adding more nodes

2014-06-09 Thread joergpra...@gmail.com
How do you try to figure out you're hitting limits? I have not enough information to help. Marvel, Elastic HQ, etc. are all very useful tools but should be combined with OS-related monitoring to get an overall picture. Jörg On Mon, Jun 9, 2014 at 9:31 PM, pranav amin parulpate...@gmail.com

Re: Exposing elastic search query APIs at a public endpoint

2014-06-10 Thread joergpra...@gmail.com
It depend on your requirements and your product strategy - both is possible with pros and cons: - are your users proficient in a report language? Do they already write report specs in a standard report language? Do you want to support this report language standard? Do you like to share report

Re: elasticsearch Java API for function_score query

2014-06-10 Thread joergpra...@gmail.com
Try this import org.elasticsearch.action.search.SearchRequest; import org.elasticsearch.index.query.functionscore.FunctionScoreQueryBuilder; import java.util.Arrays; import static org.elasticsearch.client.Requests.searchRequest; import static

Re: Creating a browse interface from ES

2014-06-11 Thread joergpra...@gmail.com
Welcome to the show :) I also build library catalog on Elasticsearch professionally. Some time ago I wrote a Perl Dancer starter app just to show how very basic features like a hit list and facets are look like. https://github.com/jprante/Elasticsearch-Dancer-App The browsing UI you mean is a

Re: Urgent

2014-06-11 Thread joergpra...@gmail.com
Have you tried the schedule setting in JDBC river plugin? https://github.com/jprante/elasticsearch-river-jdbc#time-scheduled-execution-of-jdbc-river You can also try the feeder mode of the JDBC plugin, combined with cronjob from your crontab. Best, Jörg 2014-06-11 11:27 GMT+02:00 Sekrafi

Re: Performance as a sql result cache

2014-06-11 Thread joergpra...@gmail.com
You should run your search query more than just once. The first time executed, ES will load the Lucene index fields, and ramp up internal resources, which adds some overhead. Subsequent queries will be faster (around 1ms on my MacBook Pro with SSD but SSD is not important, it is the filesystem

Re: Slow search perfomance when using mmap versus memory.

2014-06-11 Thread joergpra...@gmail.com
Can you share your setup configuration, and an example document and a query? So it is possible to recreate your situation? Also interesting would be OS version, ES version, Java JVM version. Thanks, Jörg On Wed, Jun 11, 2014 at 6:44 PM, MikeP michael...@gmail.com wrote: Our servers have 130

Re: Slow search perfomance when using mmap versus memory.

2014-06-11 Thread joergpra...@gmail.com
started). Index store memory is not faster. Jörg On Wed, Jun 11, 2014 at 11:09 PM, joergpra...@gmail.com joergpra...@gmail.com wrote: Can you share your setup configuration, and an example document and a query? So it is possible to recreate your situation? Also interesting would be OS version

Re: Query Result Caching in Elasticsearch similar to SOLR

2014-06-11 Thread joergpra...@gmail.com
In Elasticsearch you use filters in queries where the results are cached. More info: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules-cache.html Jörg On Wed, Jun 11, 2014 at 10:00 PM, sai...@roblox.com wrote: Is there a way to mimic the Query Result Caching

Re: Slow search perfomance when using mmap versus memory.

2014-06-12 Thread joergpra...@gmail.com
You should use a boolean query and wrap it into a constant core query. Constant score query is important, otherwise each clause will lead to score calculation which has a significant impact on the overall search response time. There is also a notable difference of performance on AWS between

Re: Elastic Search and consistency

2014-06-12 Thread joergpra...@gmail.com
I think the documentation is quite clear, but I try to explain in my own words. 1.1 Not sure what you mean after the quorum check. Write consistency is a model where ES makes sure there are enough recipients (nodes) before writes are executed. consistency=quorum fails if you have too few nodes to

Re: Securing Data in Elasticsearch

2014-06-12 Thread joergpra...@gmail.com
There are a lot of methods to tamper with ES files, and physically, everything is possible to modify in files as long as your operating system permits more than something like append-only mode for ES files (not that I know this would work) So it depends on your requirements about the security

Re: Sorting on timestamps from multiple fields

2014-06-12 Thread joergpra...@gmail.com
If you have two (or more) date fields to sort on, look at copy_to mapping feature to copy them over to a third field e.g. sort_date. So you have a single field you can happily to sort on, without having to change fields in the source. Same method works for tag/category fields in different indexes

Re: ES 1.2.1 sort by _timestamp

2014-06-12 Thread joergpra...@gmail.com
Do you set timestamp value from you client or do you let ES fill them for you? Do you run more than one node? Are the clocks on your nodes running synchronously? Jörg On Thu, Jun 12, 2014 at 2:13 PM, Stefan Eberl cpppw...@gmail.com wrote: Hey all, I have a question regarding sorting by

Re: Securing Data in Elasticsearch

2014-06-12 Thread joergpra...@gmail.com
If you want ES-level security, you should first reduce attack vectors, by closing down all the open ports and resources that are not necessary. One step would be to disable HTTP REST API completely (port 9200) and run Logstash Elasticsearch output only

Re: implementing a plugin to process the whole input document

2014-06-12 Thread joergpra...@gmail.com
Short answer: modifying the source after having executed a standard index or bulk action is not possible. Long answer: it depends, if you look at https://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/elasticsearch/action/index/TransportIndexAction.java#L188 you can see how

Re: Cassandra with JDBC river plugin

2014-06-13 Thread joergpra...@gmail.com
The Cassandra Java Driver is not a JDBC driver. Jörg On Fri, Jun 13, 2014 at 11:11 AM, Abhishek Mukherjee 4271...@gmail.com wrote: Checking the Elasticsearch log files I found this. No suitable driver found for jdbc:cassandra:// 192.168.1.103:9160/transactionlogdb at

Re: Runtime JRE?

2014-06-13 Thread joergpra...@gmail.com
Yes, you can use Java Server JRE. It is a build without Java desktop graphics library (aka headless JVM). Jörg On Fri, Jun 13, 2014 at 1:53 PM, thatguy1...@gmail.com wrote: I know the guide says the following: While a JRE can be used for the Elasticsearch service, due to its use of a

Re: Securing Data in Elasticsearch

2014-06-13 Thread joergpra...@gmail.com
You should start HTTP only on localhost then and run Kibana on a selected number of nodes only. There are some authentication solutions for Kibana. I am not able to find security features like audit trails or preventing writes in Kibana/ES so you have to take care. Assessing Kibana for attacks

[ANN] Elasticsearch syslog plugin

2014-06-14 Thread joergpra...@gmail.com
Hi, here is a small plugin for Elasticsearch for receiving syslog messages via UDP or TCP. It is very similar to the bulk UDP module, but can parse syslog RFC messages. https://github.com/jprante/elasticsearch-syslog As always, feedback is most welcome. Best, Jörg -- You received this

Re: Elastic Search and consistency

2014-06-15 Thread joergpra...@gmail.com
index.gateway.local.sync: 0 is related to durability, it means, the underlying data is really going to disk by using the guarantee of FileChannel.force(false). This destroys performance compared to the default value of ES, because there are a lot more I/O operations on OS layer when fsync() is

Re: Securing Data in Elasticsearch

2014-06-15 Thread joergpra...@gmail.com
From what I know about Kibana, it just uses the HTTP API _search endpoint, but I have not examined it more thoroughly. It is quite simple to set up an nginx/apache reverse proxy to filter requests. You should add http: host: 127.0.0.1 to your config/elasticsearch.yml to ensure that HTTP

Re: Securing Data in Elasticsearch

2014-06-15 Thread joergpra...@gmail.com
No, with the setting, you can run Logstash and Kibana on different hosts. Only on ES node side, you start an additional nginx/apache, to wrap the HTTP 9200 port service with a HTTP port 80 reverse proxy service. On Kibana, you change all port 9200 configs to port 80 configs (also the remote host

Re: Creating a browse interface from ES

2014-06-16 Thread joergpra...@gmail.com
What about this: - build author name index - page size is static (e.g. 20) - absolute position: you must index each author name with absolute position info (sort author names before indexing, use a counter and increment it while indexing) - sort asc/desc works on author's name keyword analyzed

Re: IllegalArgumentException[No type mapped for [43]], version 1.2.1

2014-06-16 Thread joergpra...@gmail.com
I guess you hit the following condition: - you insert data with bulk indexing - your index has dynamic mapping and already has huge field mappings - bulk requests span over many nodes / shards / replicas and introduce tons of new fields into the dynamic mapping - you do not wait for bulk

Re: Creating a browse interface from ES

2014-06-17 Thread joergpra...@gmail.com
exact counts, only an estimated count. For register search you need absolutely exact counts. Jörg On Tue, Jun 17, 2014 at 7:28 AM, Robin Sheat ro...@catalyst.net.nz wrote: joergpra...@gmail.com schreef op ma 16-06-2014 om 13:12 [+0200]: This is how I implement register search

Re: Elasticsearch support for Java 1.8?

2014-06-17 Thread joergpra...@gmail.com
Scripting issues were due to MVEL, but with MVEL 2.2.0.Final, this has been fixed in ES. So yes, you can run ES on Java 8 JVM. Jörg On Tue, Jun 17, 2014 at 3:58 PM, Georgi Ivanov georgi.r.iva...@gmail.com wrote: As far as I know , ES will work just fine with java 1.8, except script support.

Re: Scroll Questions

2014-06-17 Thread joergpra...@gmail.com
1. yes 2. facet/aggregations are not very useful while scrolling (I doubt they even work at all) because scrolling works on shard level and aggregations work on indices level 3. a scroll request takes resources. The purpose of ClearScrollRequest is to release those resources explicitly. This is

Re: Words frequency from a set of tweets within a range of dates

2014-06-18 Thread joergpra...@gmail.com
Execute a range query http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-range-query.html#query-dsl-range-query then you can access term statistics from scripting http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-advanced-scripting.html

Re: Securing Data in Elasticsearch

2014-06-18 Thread joergpra...@gmail.com
As said, you can wrap HTTP REST, and filter for GET, or just for _search endpoint but that is only one part, and it is an incomplete solution. More important is to isolate ES in a private network and to maintain a safe and trusted environment (where every operation on OS level is logged and must

Re: Splunk vs. Elastic search performance?

2014-06-20 Thread joergpra...@gmail.com
It is correct you noted that Elasticsearch comes with developer settings - that is exactly what a packages ES is meant for. If you find issues when configuring and setting up ES for critical use, it would be nice to post your issues so others can also find help too, and maybe share their

Re: Bulk inserting is slow

2014-06-23 Thread joergpra...@gmail.com
Your bulk insert size is too large. It makes no sense to insert 100.000 with one request. Use 1000-1 instead. Also you should submit bulk requests in parallel and not sequential like you do. Sequential bulk is slow if client CPU/network is not saturated. Check if you have disabled the index

Re: Elasticsearch logs in JSON?

2014-06-23 Thread joergpra...@gmail.com
Have you checked https://github.com/logstash/log4j-jsonevent-layout ? Jörg On Mon, Jun 23, 2014 at 10:21 AM, Robin Clarke robi...@gmail.com wrote: Is there any way to configure Elasticsearch to output its logs in JSON (custom log format, or configuration option)? This would make it much

Re: Wait for yellow status

2014-06-23 Thread joergpra...@gmail.com
It would be helpful to add methods like waitForGreenToYellow(), waitForYellowToGreen(), waitFor RedToYellow(), waitForYellowToRed(), ... for describing exactly the cluster state transitions to wait for. Jörg On Mon, Jun 23, 2014 at 6:33 PM, Ivan Brusic i...@brusic.com wrote: It appears that

Re: Old shards on re-joining nodes useful?

2014-06-23 Thread joergpra...@gmail.com
Yes, if the recovery of an index succeeds, the shards of the rejoined node for the index will be used. Do you mean orphaned shards, where the index does no longer exist? Jörg On Mon, Jun 23, 2014 at 7:26 PM, Yongtao You yongtao@gmail.com wrote: Hi, Quick question, please. If a node

Re: TransportClient Throws 'java.lang.OutOfMemoryError: GC overhead limit exceeded' when all nodes in cluster are down (1.1.1)

2014-06-23 Thread joergpra...@gmail.com
Most likely you have memory leaks in your app and your client memory was exhausted. If you can show the client code how you submit queries and process responses and the stack traces you receive, more help could be possible to offer. A general hint is to switch to Java 7. Jörg On Mon, Jun 23,

Re: Old shards on re-joining nodes useful?

2014-06-23 Thread joergpra...@gmail.com
No, you must not remove any data. There are several options what ES can do with orphaned shards: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-gateway-local.html Example of a log entry when orphaned shard is detected: [2014-06-23 21:46:05,841][INFO

Re: TransportClient Throws 'java.lang.OutOfMemoryError: GC overhead limit exceeded' when all nodes in cluster are down (1.1.1)

2014-06-23 Thread joergpra...@gmail.com
Maybe it is not OOM but running out of file descriptors, that can only be seen in the stack trace. TransportClient, by default, tries to reconnect quite aggressively, so if you could monitor the number of open network ports while you get OOM this would be helpful for analysis. Maybe you have

Re: Reduce threads used by elasticsearch

2014-06-23 Thread joergpra...@gmail.com
You can reduce netty workers by transport.netty.worker_count setting which is by default set to 2 * CPU cores Jörg On Mon, Jun 23, 2014 at 10:34 PM, jnortey jeremy.nor...@gmail.com wrote: We have a development and production offering that uses elasticsearch. In development, it is not

Re: Bulk inserting is slow

2014-06-24 Thread joergpra...@gmail.com
You should use the org.elasticsearch.action.bulk.BulkProcessor helper class for concurrent bulk indexing. Jörg On Tue, Jun 24, 2014 at 5:34 PM, Frederic Esnault esnault.frede...@gmail.com wrote: Hi again, any idea about how to parallelize the bulk insert process ? I tried creating 4

Re: Rivers are reimporting data at each ElasticSearch restart

2014-06-25 Thread joergpra...@gmail.com
It is up to the river implementation how the data import is handled. The JDBC river, in the simple strategy, imports data when the river is started, regardless of existing cluster or index. It is possible to implement other strategies, for example, a strategy that performs a check before

Re: Elasticsearch river for postgres

2014-06-25 Thread joergpra...@gmail.com
You did not specify an index for the JDBC river to index to, so it assumes the index name is jdbc. It means, if you search curl '0:9200/jdbc/_search' you should see some of the indexed documents. Jörg On Wed, Jun 25, 2014 at 11:00 AM, Jorge von Rudno jorge.vonrudno...@googlemail.com wrote:

  1   2   3   4   5   6   7   8   9   10   >