Re: Support Hadoop 2.6 for StreamingFileSink

2018-08-23 Thread Artsem Semianenka
Hi guys, I've created a ticket for that issue in Jira and proposed possible solution just to continue our discussion and develop a plan how to fix the issue. https://issues.apache.org/jira/browse/FLINK-10203 Cheers, Artsem On Tue, 21 Aug 2018 at 16:59, Artsem Semianenka wrote: > Thanks

Re: Support Hadoop 2.6 for StreamingFileSink

2018-08-21 Thread Artsem Semianenka
Thanks Kostas for reply, But till there are distributions like Cloudera which latest version (5.15) based on Hadoop 2.6 I and many other Cloudera users obliged to use an older HDFS version. Moreover I read discussion on Cloudera forum regarding moving to more fresh version of Hadoop, and Cloudera

Re: Support Hadoop 2.6 for StreamingFileSink

2018-08-21 Thread Kostas Kloudas
Hi Artsem, Till is correct in that getting rid of the “valid-length” file was a design decision for the new StreamingFileSink since the beginning. The motivation was that users were reporting that essentially it was very cumbersome to use. In general, when the BucketingSink gets deprecated, I

Re: Support Hadoop 2.6 for StreamingFileSink

2018-08-21 Thread Artsem Semianenka
Thanks for reply, Till ! Buy the way, If Flink going to support compatibility with Hadoop 2.6 I don't see another way how to achieve it. As I mention before one of popular distributive Cloudera still based on Hadoop 2.6 and it very sad if Flink unsupport it. I really want to help Flink comunity

Re: Support Hadoop 2.6 for StreamingFileSink

2018-08-21 Thread Till Rohrmann
Hi Artsem, if I recall correctly, then we explicitly decided to not support the valid file length files with the new StreamingFileSink because they are really hard to handle for the user. I've pulled Klou into this conversation who is more knowledgeable and can give you a bit more advice.

Re: Support Hadoop 2.6 for StreamingFileSink

2018-08-20 Thread Artsem Semianenka
I have an idea to create new version of HadoopRecoverableFsDataOutputStream class (for example with name LegacyHadoopRecoverableFsDataOutputStream :) ) which will works with valid-length files without invoking truncate. And modify check in HadoopRecoverableWriter to use

Support Hadoop 2.6 for StreamingFileSink

2018-08-20 Thread Artsem Semianenka
Hi guys ! I have a question regarding new StreamingFileSink (introduced in 1.6 version) . We use this sink to write data into Parquet format. But I faced with issue when trying to run job on Yarn cluster and save result to HDFS. In our case we use latest Cloudera distributive (CHD 5.15) and it