Is there anything else we could try here to debug elasticsearch-hadoop
being unable to write to Elasticsearch? We're still seeing the same number
of these fails during the nightly batch runs even after switching to
2.0.2.BUILD-SNAPSHOT,
and I don't see any additional lines from
You can always enable TRACE though that is likely to create way too much information in production and slow things down
considerably.
The first thing you can do is minimize the batch size to give ES more breathing space by minimizing the batch size (say
to 512KB) or the number
of entries (500
Our Hadoop and Elasticsearch are all on AWS. We have 2 MR jobs that write
to ES - 1 of them works fine, and one of them takes forever due to 10-20%
of tasks failing in the way I've described. So I don't think it's any kind
of network/firewall issue. There are no nightly backups related to ES or
What type of AWS instances are you using? Virtualization tends to interfere in various ways with a running system -
sometime for good, sometimes for worse.
The number of tasks is good to compute the total number of data and entries you are throwing at ES at one time. You are
looking at a
Our 4 ES nodes are all m1.large (
http://www.ec2instances.info/?filter=m1.large) and our 5 Hadoop nodes are
all m1.xlarge (http://www.ec2instances.info/?filter=m1.xlarge).
Thanks for the troubleshooting pointers - we'll do some more research.
On Fri, Oct 3, 2014 at 11:27 AM, Costin Leau
Hi Costin - we updated our dependencies to use elasticsearch-hadoop
2.0.2.BUILD-SNAPSHOT, but that didn't seem to change anything. We're still
seeing the same task failures while trying to write to Elasticsearch. The
only difference in the logs is that now I don't see
the
The error indicates the ES nodes don't reply in a timely fashion and thus the connection drops. Based on your logs it
seems to be either a GC or a network issue.
You could try turning on logging in package 'org.elasticsearch.hadoop.rest' to
DEBUG.
How many tasks do you have and what's your bulk
This particular job has 1353 map tasks, Hadoop cluster has 5 nodes with
total map task capacity of 25. Elasticsearch cluster has 4 nodes.
Where can I find the bulk size/entries numbers?
Thanks,
Zach
On Wed, Oct 1, 2014 at 7:19 AM, Costin Leau costin.l...@gmail.com wrote:
The error indicates
Hi Costin - by bulk size/entries number are you referring to the
es.batch.size.bytes and es.batch.size.entries config values described here?
http://www.elasticsearch.org/guide/en/elasticsearch/hadoop/master/configuration.html#configuration-serialization
It looks like the only
Hi - we're having problems with one of our map-reduce jobs that writes to
Elasticsearch. Lots of map tasks are failing due to ES being unavailable,
with logs like this:
https://gist.githubusercontent.com/zcox/3d6cf4329d49ca03271b/raw/57c46a5e4c9ea04d5c4209414d6f847492d16c0d/gistfile1.txt
Seems
What version of es-hadoop/es/cascading are you using?
On 9/30/14 6:16 PM, Zach Cox wrote:
Hi - we're having problems with one of our map-reduce jobs that writes to
Elasticsearch. Lots of map tasks are failing
due to ES being unavailable, with logs like this:
Hi Costin:
elasticsearch-hadoop 2.0.0
cascading 2.5.4
scalding 0.10.0
Thanks,
Zach
On Tuesday, September 30, 2014 10:25:10 AM UTC-5, Costin Leau wrote:
What version of es-hadoop/es/cascading are you using?
On 9/30/14 6:16 PM, Zach Cox wrote:
Hi - we're having problems with one of our
Can you please try the 2.0.2.BUILD-SNAPSHOT? I think you might be running into issue #256 which was fixed some time ago
and will be part of the upcoming
2.0.2, 2.1 Beta2.
Cheers,
On 9/30/14 6:43 PM, Zach Cox wrote:
Hi Costin:
elasticsearch-hadoop 2.0.0
cascading 2.5.4
scalding 0.10.0
13 matches
Mail list logo