You should upgrade ES, there were bugs fixed regarding cluster update
service and rivers.
Jörg
On Tue, May 27, 2014 at 6:44 PM, André Morais ano...@gmail.com wrote:
Hello,
I am using the JDBC river plugin (latest version with the name
elasticsearch-river-jdbc-2.2.1.jar on ES 0.90.5) and
Yes, it is (not only) relevant to library catalog indexing, because
Bibframe, a new project by Library of Congress, is built on RDF, and
next-generation library systems will embrace W3C semantic web technologies.
The RDF data I generate is indexed in JSON-LD format into Elasticsearch but
for
I'm not sure if this is related but there is work on designing sequence
numbers that are decentralized time based UUIDs. If they were assigned to
Lucene segments, shards could declare what segments they already have, when
a recovery process runs. Feature is planned for 1.3
For maximum write performance, you should
- use fastest disk subsystem (SSD)
- use RAID 0 with expensive controller to max out IO bandwidth
- do not run more than one ES instance per server
- do not use virtual servers, use physical servers
- for ES data folder, disable acess time flag (noatime),
Look into the scan/scroll query
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-scroll.html
It works like a cursor that iterates through all docs of a query result
Jörg
On Wed, May 28, 2014 at 1:42 PM, Tom t.opp...@superreal.de wrote:
Hi,
i need to fire
, the communication/storage won’t be compressed
using LZF?
- Drew
On May 29, 2014, at 2:52 PM, joergpra...@gmail.com wrote:
1. No (the cluster state of ES - not part of Lucene - is saved to disk in
SMILE format)
2. No.
3. Yes, you can use SMILE on XContentBuilder classes. The result can
IDF is calculated per shard, and only in DFS search types, it is calculated
over all nodes in an initial scatter phase.
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/_search_options.html#_literal_search_type_literal
If you are concerned about IDF in a single multi-user index
Is match_all always running at that time or is it getting faster after a
first run?
Did you run an optimize with maximum number of segments? What is your
segment count?
Jörg
On Fri, May 30, 2014 at 9:20 PM, sai...@roblox.com wrote:
*Bump*
On Wednesday, May 28, 2014 4:10:26 PM UTC-7,
Just look into org.elasticsearch.rest.BytesRestResponse, it supersedes
XContentRestResponse
Jörg
On Sat, May 31, 2014 at 12:28 AM, Ben McCann benjamin.j.mcc...@gmail.com
wrote:
Jörg thanks for the heads up about XContentRestResponse going away. I've
run into that as an issue with a river I
Each time you start a node, may it be a (transport) client node or a server
node, all plugins are checked/loaded at initialization.
Each plugin, also jvm plugins on the classpath, is by default examined if a
directory named _site can be accessed. The purpose is to classify a
plugin as site
any suggestions for
replacing XContentThrowableRestResponse and RestXContentBuilder?
Thanks,
Ben
On Sat, May 31, 2014 at 2:35 AM, joergpra...@gmail.com
joergpra...@gmail.com wrote:
Just look into org.elasticsearch.rest.BytesRestResponse, it supersedes
XContentRestResponse
Jörg
You'd have to use a plugin for such kind of operations, because vanilla ES
does not support RFC 6902
I'm also interested in supporting HTTP PATCH by Elasticsearch, because this
is a must have for modifying resources due to the rules of Linked Data
Platform (LDP)
You have to restart the whole cluster. Switching discovery while running a
cluster is not possible.
Jörg
On Mon, Jun 2, 2014 at 12:49 PM, Martin Harris
martin.har...@cloudsoftcorp.com wrote:
Hi Folks,
I'm trying to setup a cross-cloud elastic-search cluster. As it's
cross-cloud, the usual
Hi,
many of us want to start writing extensions for Elasticsearch.
Except submitting pull requests to the core code, one great advantage of
Elasticsearch is the plugin mechanism. Here, custom code can be hooked into
Elasticsearch, without having to ask for inclusion into the core code.
If you have indexed the data in Solr, you should consider a tool that can
traverse the Lucene index and reconstruct the documents. This is not a
straightforward process, as you know already, because analyzed fields look
different than the original input. The reconstruction may not recover the
If you can iterate over the Solr index doc ids and fetch the source docs
from a secondary storage, you should consider doing this first. This is the
most straightforward method for reindexing.
Otherwise, if you can not access the filesystem storage for the docs (for
whatever reason), the idea
Usually, plugins that extend internal ES functionality should be installed
on all nodes. This is easy to remember and preferable from an
administrative view. All the nodes in the ES cluster must have access to
plugin code under all circumstances, especially when executing actions,
mappers,
What ES version is this?
Your segment count is very high (1000) which is not efficient.
Maybe index.codec.bloom.load: false can help reducing heap mem usage.
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules-codec.html
Jörg
--
You received this message
Primary shards are addressed first when writing, but it is a myth they do
all the writing. Secondary shards do the writing too, but only some milli
seconds later. There is nothing to worry about.
Jörg
On Tue, Jun 3, 2014 at 9:49 PM, Santiago Ferrer Deheza
sa.ferrer.deh...@gmail.com wrote:
Hi
Not sure if I understand your concern completely - as long as you're doing
things right in your code, it should be possible to allocate resources only
when required - this holds also for plugins.
Jörg
On Tue, Jun 3, 2014 at 11:48 PM, virgil virgil...@gmail.com wrote:
Thank you Jörg. I see the
Can you show your test code?
You seem to look at the wrong settings - by adjusting node number, shard
number, replica number alone, you can not find out the maximum node
performance. E.g. concurrency settings, index optimizations, query
optimizations, thread pooling, and most of all, fast disk
You need resources on all nodes that hold shards, you can not do it with
just one instance, because ES index is distributed. Rescoring would be very
expensive if you did it on an extra central instance with an extra
scatter/gather phase. It is also very expensive in scripting.
A better method is
One very essential feature, from the very beginning, is that Elasticsearch
instances, when started, automatically form a cluster over the network.
This is only possible in an open network environment and by having
multicast enabled.
Are you aware, that by talking about safe configuration options
the internals and there are
no code level comments. I always meant to experiment with the different
action hierarchies via simple plugins and document my findings. Perhaps one
day...
Cheers,
Ivan
On Wed, Jun 4, 2014 at 1:09 AM, joergpra...@gmail.com
joergpra...@gmail.com wrote:
Sorry
As said, it is true that scoring scripts (like the function score scripts o
the AbstractSearchScript) need to reside on data nodes. Accessing fields is
a low level operation in a script so it is not possible to install such a
boost plugin that uses scripting on a data-less node. You would have to
://manning.com/synhershko/
On Tue, Jun 3, 2014 at 6:15 PM, joergpra...@gmail.com
joergpra...@gmail.com wrote:
Hi,
many of us want to start writing extensions for Elasticsearch.
Except submitting pull requests to the core code, one great advantage of
Elasticsearch is the plugin mechanism. Here
Why do you use terms on _id field and not the the ids filter? ids filter is
more efficient since it reuses the _uid field which is cached by default.
Do the terms in the query vary from query to query? If so, caching might
kill your heap.
Another possible issue is that your query is not
One more hint, you see
org.elasticsearch.common.lucene.search.function.FieldValueFunction
This implements the ScoreFunction and fetches boost values from a
configured field in the doc, for use by the Java API for FunctionScoreQuery.
If you can write a custom ScoreFunction, you could implement
A suggestion for the path model:
- index also the path depth, and name the fields with the depth level
- execute a nested aggregation query over the path depth levels
Example doc with path info:
{
path0 : promo/A,
path1 : sale/B
...
}
In this doc you know the user went from promo/A to
Do you use TransportClient or NodeClient?
On NodeClient, you are tied to the cluster, as the node is being a part of
it, on TransportClient, you can count the connected nodes.
The discovery mechanism behind the scenes sends ping actions each few
seconds for you. If an action fails, you will see
Check the Elasticsearch test code. There, you can see how Java API works.
For example
GetIndexTemplatesResponse response =
client().admin().indices().prepareGetTemplates().get();
You can get an empty response if template does not exist, or the execution
throws an exception, when something went
The knapsack plugin does not come with a downtime. You can increase shards
on the fly by copying an index over to another index (even on another
cluster). The index should be write disabled during copy though.
Increasing replica level is a very simple command, no index copy required.
It seems
, 2014 at 9:21 AM, joergpra...@gmail.com
joergpra...@gmail.com wrote:
The knapsack plugin does not come with a downtime. You can increase
shards on the fly by copying an index over to another index (even on
another cluster). The index should be write disabled during copy though.
Increasing
Just a quick question, do you just want to extract a field from the json
source?
There are field filters and parameters for shaping such a JSON result,
maybe they can already help?
Or can you give an example of the problem?
Jörg
On Thu, Jun 5, 2014 at 7:45 PM, Mario Mueller ma...@xenji.com
RestResponseListener that takes a SearchResponse and creates a
simplified version with no metadata.
Should be an interesting quick plugin, but it looks like Jorg is going to
beat me to it (I'm still at work for several more hours).
--
Ivan
On Thu, Jun 5, 2014 at 1:08 PM, joergpra...@gmail.com
probably come up with 2 indexing strategies we can apply to an
application's index based on the heuristics from the operations they're
performing.
Thanks for the feedback!
Todd
On Thu, Jun 5, 2014 at 10:55 AM, joergpra...@gmail.com
joergpra...@gmail.com wrote:
Thanks for raising
, but
I noticed that you provided your own parseSearchRequest, but still
call RestSearchAction.parseSearchRequest from inside handleRequest. Did I
misinterpret the code or is that a mistake?
--
Ivan
On Thu, Jun 5, 2014 at 2:37 PM, joergpra...@gmail.com
joergpra...@gmail.com wrote:
OK, I
I try to answer some of the queries though I must admit, I am not too much
familiar with the aggregation source code yet (still exploring).
Aggregations work like a search, they are embedded into the search
actions, and work over the result set of a search. They run in each shard,
just like the
1. No. Did you change the configuration? You have two data nodes connected?
2. You do not need to be concerned where primary shards are allocated,
secondary shards play the same role (except primaries receive writes first
a few milliseconds earlier than secondaries). Elasticsearch randomly
I drink Kölsch only :) ävver et hätt noh immer joot jejange
Greetings from Cologne!
Jörg
On Fri, Jun 6, 2014 at 7:14 AM, Mario Mueller ma...@xenji.com wrote:
You guys are totally awesome! Thanks a lot! If you ever visit Duesseldorf
drop me a line, I owe you a beer.
@Brian:
Interesting
Closing the transport client may not be enough.
Try this:
- wait for all outstanding actions (all actions send responses
asynchronously)
- then shut down client.threadpool() (perhaps with shutdownNow() or
shutdown()), this effectively disables new actions form being started
- then close the
Please ask your question here. Thanks.
Jörg
On Fri, Jun 6, 2014 at 9:28 AM, ohw o...@zhihu.com wrote:
Hi folks
I just asked a question in StackOverflow, please have a look if you have
encountered similar problem or have some input to it.
Thanks in advance!
--
You received this message
the query parsers into
elasticsearch, would you please elaborate more on this?
On Fri, Jun 6, 2014 at 4:53 PM, joergpra...@gmail.com
joergpra...@gmail.com wrote:
The Query DSL is not equivalent to Lucene Query but close to, with
enhancements.
If you want to make use of Lucene Query
this
happened..Is there something I ignore?
I want to know how ES allocates nodes. Is there some reference? I googled
but couldn't find it.
Thank you :D
On Fri, Jun 6, 2014 at 3:05 PM, joergpra...@gmail.com
joergpra...@gmail.com wrote:
1. No. Did you change the configuration? You have
Look here for the tool and how to use it
http://www.elasticsearch.org/blog/tool-help-routing-issues-elasticsearch-1-2-0/
Jörg
On Fri, Jun 6, 2014 at 11:24 AM, Luke Wilson-Mawer
lukewilsonma...@gmail.com wrote:
Great, thanks Adrien. I will eagerly await the tool.
Kind regards,
Luke
On
No, the settings will not merge existing segments unless you call _optimize
action via API.
And take some patience. Thousands of segments take time - also, they need
quite few memory resources to merge...
I suggest backup your data first, to stay safe if the merging fails /
aborts...
Jörg
On
1gb is a very large document and it is unusual to index such sizes.
There is a limit check against the heap. In order to be able to process
such length, you need a large heap alone to store the document source.
Depending on analyzer, heap demand increases even more.
You can index documents of
I mean, you can add a MyOwnFunctionBuilder/MyOwnFunctionParser to
Elasticsearch via plugin. See
package org.elasticsearch.index.query.functionscore for the standard
implementations.
The functionscore code is masterpiece quality - no need to modify existing
code! It is pluggable.
A close example
For an example function score plugin implementation, see
https://github.com/elasticsearch/elasticsearch/blob/master/src/test/java/org/elasticsearch/search/functionscore/FunctionScorePluginTests.java
Jörg
On Fri, Jun 6, 2014 at 7:10 PM, joergpra...@gmail.com joergpra...@gmail.com
wrote:
I
I have implemented a function score based conditional boost plugin for
demonstration.
Very useful for faking relevance scoring, in dependency of document field
values which were originally not meant to contribute for boosting.
A list of boost values can be specified in dependency of indexed
bind_host is the host that an Elasticsearch node uses in the socket bind
call when starting the network. Due to socket programming model, you can
bind to an address. By referencing an address, the socket allows access
to one or all underlying network devices. There are several addresses with
Maybe the segment count is just counting new segments as they are
created... can you look into the data folders to examine if the segment
file count is still high?
And can you verify if the settings are really active... not sure what's
going on without seeing details.
The _optimize call takes a
Compression is always enabled by default.
Jörg
On Sun, Jun 8, 2014 at 6:01 PM, sri 1.fr@gmail.com wrote:
Hello everyone,
I have read posts and blogs on how elasticsearch compression can be
enabled in the previous versions(0.17 - 0.19).
I am currently using ES 1.2.1, i wasn't able to
The Elasticsearch file size does not only contain compressed fields, but
much more. For example, term vectors, norms, etc. You would have to disable
field attributes you do not want. Also note, Elasticsearch has replica
enabled by default, and segment count is not optimized automatically.
Jörg
Lucene uses LZ4 compression
http://blog.jpountz.net/post/35667727458/stored-fields-compression-in-lucene-4-1
so you should not run ES on a ZFS file system with compression enabled.
Jörg
On Sun, Jun 8, 2014 at 8:47 PM, Patrick Proniewski elasticsea...@patpro.net
wrote:
Hello,
I don't
Try this index template for new index creations
curl -XPUT 'localhost:9200/_template/template1' -d '
{
template : *,
mappings : {
_default_ : {
_source : { enabled : false },
_all : { enabled : false}
}
}
}
'
See also
There is a bug in the JDBC river introduced recently that prevents it from
using type_mapping parameter if there is no index_settings parameter
defined.
It will be fixed asap
A work around might be adding an empty settings parameter like
index_settings : {}
Jörg
On Mon, Jun 9, 2014 at 1:00
There are many reasons that may cause this, just to name a few
- benchmarking tool setup ( do they show correct numbers?)
- network bandwidth limits
- cluster setup (e.g. complex mapping, high latency between nodes)
- pattern of the data input
- method of data input (bulk vs. index, HTTP vs. Java
How do you try to figure out you're hitting limits? I have not enough
information to help.
Marvel, Elastic HQ, etc. are all very useful tools but should be combined
with OS-related monitoring to get an overall picture.
Jörg
On Mon, Jun 9, 2014 at 9:31 PM, pranav amin parulpate...@gmail.com
It depend on your requirements and your product strategy - both is possible
with pros and cons:
- are your users proficient in a report language? Do they already write
report specs in a standard report language? Do you want to support this
report language standard? Do you like to share report
Try this
import org.elasticsearch.action.search.SearchRequest;
import
org.elasticsearch.index.query.functionscore.FunctionScoreQueryBuilder;
import java.util.Arrays;
import static org.elasticsearch.client.Requests.searchRequest;
import static
Welcome to the show :)
I also build library catalog on Elasticsearch professionally.
Some time ago I wrote a Perl Dancer starter app just to show how very basic
features like a hit list and facets are look like.
https://github.com/jprante/Elasticsearch-Dancer-App
The browsing UI you mean is a
Have you tried the schedule setting in JDBC river plugin?
https://github.com/jprante/elasticsearch-river-jdbc#time-scheduled-execution-of-jdbc-river
You can also try the feeder mode of the JDBC plugin, combined with cronjob
from your crontab.
Best,
Jörg
2014-06-11 11:27 GMT+02:00 Sekrafi
You should run your search query more than just once. The first time
executed, ES will load the Lucene index fields, and ramp up internal
resources, which adds some overhead. Subsequent queries will be faster
(around 1ms on my MacBook Pro with SSD but SSD is not important, it is the
filesystem
Can you share your setup configuration, and an example document and a
query? So it is possible to recreate your situation?
Also interesting would be OS version, ES version, Java JVM version.
Thanks,
Jörg
On Wed, Jun 11, 2014 at 6:44 PM, MikeP michael...@gmail.com wrote:
Our servers have 130
started). Index store
memory is not faster.
Jörg
On Wed, Jun 11, 2014 at 11:09 PM, joergpra...@gmail.com
joergpra...@gmail.com wrote:
Can you share your setup configuration, and an example document and a
query? So it is possible to recreate your situation?
Also interesting would be OS version
In Elasticsearch you use filters in queries where the results are cached.
More info:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules-cache.html
Jörg
On Wed, Jun 11, 2014 at 10:00 PM, sai...@roblox.com wrote:
Is there a way to mimic the Query Result Caching
You should use a boolean query and wrap it into a constant core query.
Constant score query is important, otherwise each clause will lead to score
calculation which has a significant impact on the overall search response
time.
There is also a notable difference of performance on AWS between
I think the documentation is quite clear, but I try to explain in my own
words.
1.1 Not sure what you mean after the quorum check. Write consistency is a
model where ES makes sure there are enough recipients (nodes) before writes
are executed. consistency=quorum fails if you have too few nodes to
There are a lot of methods to tamper with ES files, and physically,
everything is possible to modify in files as long as your operating system
permits more than something like append-only mode for ES files (not that
I know this would work)
So it depends on your requirements about the security
If you have two (or more) date fields to sort on, look at copy_to mapping
feature to copy them over to a third field e.g. sort_date. So you have a
single field you can happily to sort on, without having to change fields in
the source.
Same method works for tag/category fields in different indexes
Do you set timestamp value from you client or do you let ES fill them for
you?
Do you run more than one node? Are the clocks on your nodes running
synchronously?
Jörg
On Thu, Jun 12, 2014 at 2:13 PM, Stefan Eberl cpppw...@gmail.com wrote:
Hey all,
I have a question regarding sorting by
If you want ES-level security, you should first reduce attack vectors, by
closing down all the open ports and resources that are not necessary.
One step would be to disable HTTP REST API completely (port 9200) and run
Logstash Elasticsearch output only
Short answer: modifying the source after having executed a standard index
or bulk action is not possible.
Long answer: it depends, if you look at
https://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/elasticsearch/action/index/TransportIndexAction.java#L188
you can see how
The Cassandra Java Driver is not a JDBC driver.
Jörg
On Fri, Jun 13, 2014 at 11:11 AM, Abhishek Mukherjee 4271...@gmail.com
wrote:
Checking the Elasticsearch log files I found this.
No suitable driver found for jdbc:cassandra://
192.168.1.103:9160/transactionlogdb
at
Yes, you can use Java Server JRE. It is a build without Java desktop
graphics library (aka headless JVM).
Jörg
On Fri, Jun 13, 2014 at 1:53 PM, thatguy1...@gmail.com wrote:
I know the guide says the following:
While a JRE can be used for the Elasticsearch service, due to its use of a
You should start HTTP only on localhost then and run Kibana on a selected
number of nodes only.
There are some authentication solutions for Kibana.
I am not able to find security features like audit trails or preventing
writes in Kibana/ES so you have to take care. Assessing Kibana for attacks
Hi,
here is a small plugin for Elasticsearch for receiving syslog messages via
UDP or TCP. It is very similar to the bulk UDP module, but can parse syslog
RFC messages.
https://github.com/jprante/elasticsearch-syslog
As always, feedback is most welcome.
Best,
Jörg
--
You received this
index.gateway.local.sync: 0 is related to durability, it means, the
underlying data is really going to disk by using the guarantee of
FileChannel.force(false). This destroys performance compared to the default
value of ES, because there are a lot more I/O operations on OS layer when
fsync() is
From what I know about Kibana, it just uses the HTTP API _search endpoint,
but I have not examined it more thoroughly.
It is quite simple to set up an nginx/apache reverse proxy to filter
requests.
You should add
http:
host: 127.0.0.1
to your config/elasticsearch.yml to ensure that HTTP
No, with the setting, you can run Logstash and Kibana on different hosts.
Only on ES node side, you start an additional nginx/apache, to wrap the
HTTP 9200 port service with a HTTP port 80 reverse proxy service.
On Kibana, you change all port 9200 configs to port 80 configs (also the
remote host
What about this:
- build author name index
- page size is static (e.g. 20)
- absolute position: you must index each author name with absolute position
info (sort author names before indexing, use a counter and increment it
while indexing)
- sort asc/desc works on author's name keyword analyzed
I guess you hit the following condition:
- you insert data with bulk indexing
- your index has dynamic mapping and already has huge field mappings
- bulk requests span over many nodes / shards / replicas and introduce tons
of new fields into the dynamic mapping
- you do not wait for bulk
exact counts, only an estimated count. For register
search you need absolutely exact counts.
Jörg
On Tue, Jun 17, 2014 at 7:28 AM, Robin Sheat ro...@catalyst.net.nz wrote:
joergpra...@gmail.com schreef op ma 16-06-2014 om 13:12 [+0200]:
This is how I implement register search
Scripting issues were due to MVEL, but with MVEL 2.2.0.Final, this has been
fixed in ES.
So yes, you can run ES on Java 8 JVM.
Jörg
On Tue, Jun 17, 2014 at 3:58 PM, Georgi Ivanov georgi.r.iva...@gmail.com
wrote:
As far as I know , ES will work just fine with java 1.8,
except script support.
1. yes
2. facet/aggregations are not very useful while scrolling (I doubt they
even work at all) because scrolling works on shard level and aggregations
work on indices level
3. a scroll request takes resources. The purpose of ClearScrollRequest is
to release those resources explicitly. This is
Execute a range query
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-range-query.html#query-dsl-range-query
then you can access term statistics from scripting
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-advanced-scripting.html
As said, you can wrap HTTP REST, and filter for GET, or just for _search
endpoint but that is only one part, and it is an incomplete solution.
More important is to isolate ES in a private network and to maintain a safe
and trusted environment (where every operation on OS level is logged and
must
It is correct you noted that Elasticsearch comes with developer settings -
that is exactly what a packages ES is meant for.
If you find issues when configuring and setting up ES for critical use, it
would be nice to post your issues so others can also find help too, and
maybe share their
Your bulk insert size is too large. It makes no sense to insert 100.000
with one request. Use 1000-1 instead.
Also you should submit bulk requests in parallel and not sequential like
you do. Sequential bulk is slow if client CPU/network is not saturated.
Check if you have disabled the index
Have you checked https://github.com/logstash/log4j-jsonevent-layout ?
Jörg
On Mon, Jun 23, 2014 at 10:21 AM, Robin Clarke robi...@gmail.com wrote:
Is there any way to configure Elasticsearch to output its logs in JSON
(custom log format, or configuration option)? This would make it much
It would be helpful to add methods like waitForGreenToYellow(),
waitForYellowToGreen(), waitFor RedToYellow(), waitForYellowToRed(), ...
for describing exactly the cluster state transitions to wait for.
Jörg
On Mon, Jun 23, 2014 at 6:33 PM, Ivan Brusic i...@brusic.com wrote:
It appears that
Yes, if the recovery of an index succeeds, the shards of the rejoined node
for the index will be used. Do you mean orphaned shards, where the index
does no longer exist?
Jörg
On Mon, Jun 23, 2014 at 7:26 PM, Yongtao You yongtao@gmail.com wrote:
Hi,
Quick question, please. If a node
Most likely you have memory leaks in your app and your client memory was
exhausted.
If you can show the client code how you submit queries and process
responses and the stack traces you receive, more help could be possible to
offer.
A general hint is to switch to Java 7.
Jörg
On Mon, Jun 23,
No, you must not remove any data. There are several options what ES can do
with orphaned shards:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-gateway-local.html
Example of a log entry when orphaned shard is detected:
[2014-06-23 21:46:05,841][INFO
Maybe it is not OOM but running out of file descriptors, that can only be
seen in the stack trace.
TransportClient, by default, tries to reconnect quite aggressively, so if
you could monitor the number of open network ports while you get OOM this
would be helpful for analysis. Maybe you have
You can reduce netty workers by transport.netty.worker_count setting which
is by default set to 2 * CPU cores
Jörg
On Mon, Jun 23, 2014 at 10:34 PM, jnortey jeremy.nor...@gmail.com wrote:
We have a development and production offering that uses elasticsearch. In
development, it is not
You should use the org.elasticsearch.action.bulk.BulkProcessor helper class
for concurrent bulk indexing.
Jörg
On Tue, Jun 24, 2014 at 5:34 PM, Frederic Esnault
esnault.frede...@gmail.com wrote:
Hi again,
any idea about how to parallelize the bulk insert process ?
I tried creating 4
It is up to the river implementation how the data import is handled.
The JDBC river, in the simple strategy, imports data when the river is
started, regardless of existing cluster or index. It is possible to
implement other strategies, for example, a strategy that performs a check
before
You did not specify an index for the JDBC river to index to, so it assumes
the index name is jdbc.
It means, if you search
curl '0:9200/jdbc/_search'
you should see some of the indexed documents.
Jörg
On Wed, Jun 25, 2014 at 11:00 AM, Jorge von Rudno
jorge.vonrudno...@googlemail.com wrote:
1 - 100 of 1234 matches
Mail list logo