Re: StandAlone job on k8s fails with "Unknown method truncate" on restore

Vishal Santoshi Sat, 09 Mar 2019 04:31:57 -0800

We reverted back to BucketingSink and that works as expected. In conclusion
RollingFileSink needs hadoop 2.7 for for hadoop File System.


On Sat, Feb 23, 2019 at 2:30 PM Vishal Santoshi <vishal.santo...@gmail.com>
wrote:

> Any one ? I am sure there are hadoop 2.6 integrations with 1.7.1 OR I am
> overlooking something...
>
> On Fri, Feb 15, 2019 at 2:44 PM Vishal Santoshi <vishal.santo...@gmail.com>
> wrote:
>
>> Not sure,  but it seems this
>> https://issues.apache.org/jira/browse/FLINK-10203 may be a connected
>> issue.
>>
>> On Fri, Feb 15, 2019 at 11:57 AM Vishal Santoshi <
>> vishal.santo...@gmail.com> wrote:
>>
>>> That log does not appear. It looks like we have egg and chicken issue.
>>>
>>> 2019-02-15 16:49:15,045 DEBUG org.apache.hadoop.hdfs.DFSClient
>>>                     - Connecting to datanode 10.246.221.10:50010
>>>
>>> 2019-02-15 16:49:15,045 DEBUG
>>> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient
>>> - SASL client skipping handshake in unsecured configuration for
>>>
>>> addr = /10.246.221.10, datanodeId = DatanodeInfoWithStorage[
>>> 10.246.221.10:50010,DS-c57a7667-f697-4f03-9fb1-532c5b82a9e8,DISK]
>>>
>>> 2019-02-15 16:49:15,072 DEBUG
>>> org.apache.flink.runtime.fs.hdfs.HadoopFsFactory              -
>>> Instantiating for file system scheme hdfs Hadoop File System
>>> org.apache.hadoop.hdfs.DistributedFileSystem
>>>
>>> 2019-02-15 16:49:15,072 DEBUG org.apache.hadoop.hdfs.BlockReaderLocal
>>>                     - dfs.client.use.legacy.blockreader.local = false
>>>
>>> 2019-02-15 16:49:15,072 DEBUG org.apache.hadoop.hdfs.BlockReaderLocal
>>>                     - dfs.client.read.shortcircuit = false
>>>
>>> 2019-02-15 16:49:15,072 DEBUG org.apache.hadoop.hdfs.BlockReaderLocal
>>>                     - dfs.client.domain.socket.data.traffic = false
>>>
>>> 2019-02-15 16:49:15,072 DEBUG org.apache.hadoop.hdfs.BlockReaderLocal
>>>                     - dfs.domain.socket.path =
>>>
>>> 2019-02-15 16:49:15,076 DEBUG org.apache.hadoop.io.retry.RetryUtils
>>>                     - multipleLinearRandomRetry = null
>>>
>>> 2019-02-15 16:49:15,076 DEBUG org.apache.hadoop.ipc.Client
>>>                     - getting client out of cache:
>>> org.apache.hadoop.ipc.Client@31920ade
>>>
>>> 2019-02-15 16:49:15,076 DEBUG
>>> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.DataTransferSaslUtil  -
>>> DataTransferProtocol not using SaslPropertiesResolver, no QOP found in
>>> configuration for dfs.data.transfer.protection
>>>
>>> 2019-02-15 16:49:15,080 INFO
>>> org.apache.flink.streaming.api.functions.sink.filesystem.Buckets  -
>>> Subtask 3 initializing its state (max part counter=58).
>>>
>>> 2019-02-15 16:49:15,081 DEBUG
>>> org.apache.flink.streaming.api.functions.sink.filesystem.Buckets  -
>>> Subtask 3 restoring: BucketState for
>>> bucketId=ls_kraken_events/dt=2019-02-14/evt=ad_fill and
>>> bucketPath=hdfs://nn-crunchy:8020/tmp/kafka-to-hdfs/ls_kraken_events/dt=2019-02-14/evt=ad_fill,
>>> has open part file created @ 1550247946437
>>>
>>> 2019-02-15 16:49:15,085 DEBUG org.apache.hadoop.ipc.Client
>>>                     - IPC Client (1270836494) connection to
>>> nn-crunchy.bf2.tumblr.net/10.246.199.154:8020 from root sending #56
>>>
>>> 2019-02-15 16:49:15,188 DEBUG org.apache.hadoop.ipc.Client
>>>                     - IPC Client (1270836494) connection to
>>> nn-crunchy.bf2.tumblr.net/10.246.199.154:8020 from root got value #56
>>>
>>> 2019-02-15 16:49:15,196 INFO  org.apache.flink.runtime.taskmanager.Task
>>>                   - Source: Custom Source -> (Sink: Unnamed, Process ->
>>> Timestamps/Watermarks) (4/4) (f73403ac4763c99e6a244cba3797f7e9) switched
>>> from RUNNING to FAILED.
>>>
>>> java.io.IOException: Missing data in tmp file:
>>> hdfs://nn-crunchy:8020/tmp/kafka-to-hdfs/ls_kraken_events/dt=2019-02-14/evt=ad_fill/
>>> .part-3-32.inprogress.da2a75d1-0c83-47bc-9c83-950360c55c86
>>>
>>>         at
>>> org.apache.flink.runtime.fs.hdfs.HadoopRecoverableFsDataOutputStream.<init>(HadoopRecoverableFsDataOutputStream.java:93)
>>>
>>>
>>>
>>>
>>>
>>>
>>> I do see
>>>
>>>
>>> 2019-02-15 16:47:33,582 INFO  
>>> org.apache.flink.runtime.taskexecutor.TaskManagerRunner
>>>       -  Current Hadoop/Kerberos user: root
>>>
>>> 2019-02-15 16:47:33,582 INFO  
>>> org.apache.flink.runtime.taskexecutor.TaskManagerRunner
>>>       -  JVM: OpenJDK 64-Bit Server VM - Oracle Corporation -
>>> 1.8/25.181-b13
>>>
>>> 2019-02-15 16:47:33,582 INFO  
>>> org.apache.flink.runtime.taskexecutor.TaskManagerRunner
>>>       -  Maximum heap size: 1204 MiBytes
>>>
>>> 2019-02-15 16:47:33,582 INFO  
>>> org.apache.flink.runtime.taskexecutor.TaskManagerRunner
>>>       -  JAVA_HOME: /docker-java-home
>>>
>>> 2019-02-15 16:47:33,585 INFO  
>>> org.apache.flink.runtime.taskexecutor.TaskManagerRunner
>>>       -  Hadoop version: 2.7.5
>>>
>>>
>>>
>>> which has to be expected given that we are running the hadoop27flink
>>> 1.7.1 version.
>>>
>>>
>>>
>>> Does it make sense to go with a hadoop less version and inject the
>>> required jar files ?  Has that been done by anyone ?
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Fri, Feb 15, 2019 at 2:33 AM Yun Tang <myas...@live.com> wrote:
>>>
>>>> Hi
>>>>
>>>> When 'RollingSink' try to initialize state, it would first check
>>>> current file system supported truncate method. If file system not
>>>> supported, it would use another work-around solution, which means you
>>>> should not meet the problem. Otherwise 'RollingSink' thought and found the
>>>> reflection method of 'truncate' while the file system actually not support.
>>>> You could try to open DEBUG level to see whether log below could  be
>>>> printed:
>>>> Truncate not found. Will write a file with suffix '.valid-length' and
>>>> prefix '_' to specify how many bytes in a bucket are valid.
>>>>
>>>> However, from your second email, the more serious problem should be
>>>> using 'Buckets' with Hadoop-2.6. From what I know the `RecoverableWriter`
>>>> within 'Buckets' can only support Hadoop-2.7+ , I'm not sure whether
>>>> existed work around solution.
>>>>
>>>> Best
>>>> Yun Tang
>>>> ------------------------------
>>>> *From:* Vishal Santoshi <vishal.santo...@gmail.com>
>>>> *Sent:* Friday, February 15, 2019 3:43
>>>> *To:* user
>>>> *Subject:* Re: StandAlone job on k8s fails with "Unknown method
>>>> truncate" on restore
>>>>
>>>> And yes  cannot work with RollingFleSink for hadoop 2.6 release of
>>>> 1.7.1 b'coz of this.
>>>>
>>>> java.lang.UnsupportedOperationException: Recoverable writers on Hadoop are 
>>>> only supported for HDFS and for Hadoop version 2.7 or newer
>>>>    at 
>>>> org.apache.flink.runtime.fs.hdfs.HadoopRecoverableWriter.<init>(HadoopRecoverableWriter.java:57)
>>>>    at 
>>>> org.apache.flink.runtime.fs.hdfs.HadoopFileSystem.createRecoverableWriter(HadoopFileSystem.java:202)
>>>>    at 
>>>> org.apache.flink.core.fs.SafetyNetWrapperFileSystem.createRecoverableWriter(SafetyNetWrapperFileSystem.java:69)
>>>>    at 
>>>> org.apache.flink.streaming.api.functions.sink.filesystem.Buckets.<init>(Buckets.java:112)
>>>>
>>>>
>>>> Any work around ?
>>>>
>>>>
>>>> On Thu, Feb 14, 2019 at 1:42 PM Vishal Santoshi <
>>>> vishal.santo...@gmail.com> wrote:
>>>>
>>>> The job uses a RolllingFileSink to push data to hdfs. Run an HA
>>>> standalone cluster on k8s,
>>>>
>>>> * get the job running
>>>> * kill the pod.
>>>>
>>>> The k8s deployment relaunches the pod but fails with
>>>>
>>>> java.io.IOException: Missing data in tmp file:
>>>> hdfs://nn-crunchy:8020/tmp/kafka-to-hdfs/ls_kraken_events/dt=2019-02-14/evt=ad_fill/.part-2-16.inprogress.449e8668-e886-4f89-b5f6-45ac68e25987
>>>>
>>>>
>>>> Unknown method truncate called on
>>>> org.apache.hadoop.hdfs.protocol.ClientProtocol protocol.
>>>>
>>>>
>>>> The file does exist. We work with hadoop 2.6 , which does no have
>>>> truncate. The previous version would see that "truncate" was not supported
>>>> and drop a length file for the ,inprogress file and rename it to a valid
>>>> part file.
>>>>
>>>>
>>>>
>>>> Is this a known issue ?
>>>>
>>>>
>>>> Regards.
>>>>
>>>>
>>>>
>>>>
>>>>

Re: StandAlone job on k8s fails with "Unknown method truncate" on restore

Reply via email to