Re: hdfs-bolt write/sync problems

马哲超 Mon, 18 Apr 2016 19:57:42 -0700

Problems may be in the hadoop, like wrong permission.

2015-04-29 20:51 GMT+08:00 Volker Janz <volker.j...@innogames.com>:


> Hi,
>
> we are using the storm-hdfs bolt (0.9.4) to write data from Kafka to
> Hadoop (Hadoop 2.5.0-cdh5.2.0).
>
> This works fine for us but we discovered some unexpected behavior:
>
> Our bolt uses the TimedRotationPolicy to rotate finished files from one
> location within HDFS to another. Unfortunately, there are some files that
> remain within the "writing" location and do not get rotated, as the
> following list shows (I performed this command today and our rotation
> policy is set to 180 seconds):
>
>
> *hadoop fs -ls /tmp/storm-events/valid/collecting | grep "\-25"*
> -rw-r--r--   3 storm storm   20512704 2015-04-25 12:41
> /tmp/storm-events/valid/collecting/events_hdfs-bolt-valid-16-2-1429965520003.txt
> -rw-r--r--   3 storm storm    5559950 2015-04-25 12:32
> /tmp/storm-events/valid/collecting/events_hdfs-bolt-valid-16-270-1429965058462.txt
> -rw-r--r--   3 storm storm    4174336 2015-04-25 00:00
> /tmp/storm-events/valid/collecting/events_hdfs-bolt-valid-16-769-1429916336332.txt
> -rw-r--r--   3 storm storm  125230972 2015-04-25 12:43
> /tmp/storm-events/valid/collecting/events_hdfs-bolt-valid-19-0-1429965627846.txt
> -rw-r--r--   3 storm storm  115531743 2015-04-25 12:45
> /tmp/storm-events/valid/collecting/events_hdfs-bolt-valid-19-0-1429965816167.txt
> -rw-r--r--   3 storm storm  106212613 2015-04-25 12:48
> /tmp/storm-events/valid/collecting/events_hdfs-bolt-valid-19-0-1429965953513.txt
> -rw-r--r--   3 storm storm   25599779 2015-04-25 12:39
> /tmp/storm-events/valid/collecting/events_hdfs-bolt-valid-19-1042-1429965476558.txt
> -rw-r--r--   3 storm storm   20513134 2015-04-25 12:41
> /tmp/storm-events/valid/collecting/events_hdfs-bolt-valid-21-2-1429965520003.txt
> -rw-r--r--   3 storm storm    5556055 2015-04-25 12:32
> /tmp/storm-events/valid/collecting/events_hdfs-bolt-valid-21-270-1429965058462.txt
> -rw-r--r--   3 storm storm    4171264 2015-04-25 00:00
> /tmp/storm-events/valid/collecting/events_hdfs-bolt-valid-21-769-1429916336335.txt
>
>
> If you check those files with "hadoop fsck -openforwrite", there are no
> open filehandles.
>
> Now, if we have a look at the nimbus ui, there are a lot of failed tuples
> (but only on specific workers):
>
>
>
> The worker logs gave an explanation of those failures:
>
>
> *tail -f worker-6704.log*
> 2015-04-29T11:31:58.337+0000 o.a.s.h.b.HdfsBolt [WARN] write/sync failed.
> org.apache.hadoop.ipc.RemoteException:
> java.lang.ArrayIndexOutOfBoundsException
>
>     at org.apache.hadoop.ipc.Client.call(Client.java:1347)
> ~[stormjar.jar:0.1.0]
>     at org.apache.hadoop.ipc.Client.call(Client.java:1300)
> ~[stormjar.jar:0.1.0]
>     at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
> ~[stormjar.jar:0.1.0]
>     at com.sun.proxy.$Proxy8.updatePipeline(Unknown Source) ~[na:na]
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> ~[na:1.8.0_31]
>     at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> ~[na:1.8.0_31]
>     at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> ~[na:1.8.0_31]
>     at java.lang.reflect.Method.invoke(Method.java:483) ~[na:1.8.0_31]
>     at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
> ~[stormjar.jar:0.1.0]
>     at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
> ~[stormjar.jar:0.1.0]
>     at com.sun.proxy.$Proxy8.updatePipeline(Unknown Source) ~[na:na]
>     at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.updatePipeline(ClientNamenodeProtocolTranslatorPB.java:791)
> ~[stormjar.jar:0.1.0]
>     at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1047)
> ~[stormjar.jar:0.1.0]
>     at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:823)
> ~[stormjar.jar:0.1.0]
>     at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:475)
> ~[stormjar.jar:0.1.0]
>
>
> So it seems, that the hdfs-bolt has still an instance of
> FSDataOutputStream which points to one of those files but as soon as it
> tries to write to or rotate it, this exception occurs. I also had a look at
> the hdfs-bolt implementation to find the exact handling of such problems (
> https://github.com/ptgoetz/storm-hdfs):
>
>
> src/main/java/org/apache/storm/hdfs/bolt/HdfsBolt.java:89-118
>
>     @Override
>     public void execute(Tuple tuple) {
>         try {
>             [... write and/or rotate ...]
>         } catch (IOException e) {
>             LOG.warn("write/sync failed.", e);
>             this.collector.fail(tuple);
>         }
>     }
>
>
> This handling will just fail the tuple but keep the corrupt
> FSDataOutputStream instance. Therefore, those hdfs-bolt instances will
> always fail for every tuple. Of course, this does not result into data loss
> because the tuple gets reprocessed and might be handled by an working
> instance, but still this causes some trouble :-).
>
> Since the exception is not thrown up, we can not handle this issue in our
> implementation. It might be a solution to adjust the exception handling
> within the hdfs-bolt to renew the FSDataOutputStream instance in case of an
> IOException - and still fail the tuple, of course. This might be useful for
> other cases and users as well.
>
> The question now is, wheter some of you discovered a similar problem and
> whether our solution makes sense?
>
> Thanks a lot and best wishes
> Volker
>
>

Re: hdfs-bolt write/sync problems

Reply via email to