Hi guys,
I've created a ticket for that issue in Jira and proposed possible solution
just to continue our discussion
and develop a plan how to fix the issue.
https://issues.apache.org/jira/browse/FLINK-10203
Cheers,
Artsem
On Tue, 21 Aug 2018 at 16:59, Artsem Semianenka
wrote:
> Thanks
Thanks Kostas for reply,
But till there are distributions like Cloudera which latest version (5.15)
based on Hadoop 2.6
I and many other Cloudera users obliged to use an older HDFS version.
Moreover I read discussion
on Cloudera forum regarding moving to more fresh version of Hadoop, and
Cloudera
Hi Artsem,
Till is correct in that getting rid of the “valid-length” file was a design
decision
for the new StreamingFileSink since the beginning. The motivation was that
users were reporting that essentially it was very cumbersome to use.
In general, when the BucketingSink gets deprecated, I
Thanks for reply, Till !
Buy the way, If Flink going to support compatibility with Hadoop 2.6 I
don't see another way how to achieve it.
As I mention before one of popular distributive Cloudera still based on
Hadoop 2.6 and it very sad if Flink unsupport it.
I really want to help Flink comunity
Hi Artsem,
if I recall correctly, then we explicitly decided to not support the valid
file length files with the new StreamingFileSink because they are really
hard to handle for the user. I've pulled Klou into this conversation who is
more knowledgeable and can give you a bit more advice.
I have an idea to create new version of HadoopRecoverableFsDataOutputStream
class (for example with name LegacyHadoopRecoverableFsDataOutputStream :) )
which will works with valid-length files without invoking truncate. And
modify check in HadoopRecoverableWriter to use
Hi guys !
I have a question regarding new StreamingFileSink (introduced in 1.6
version) . We use this sink to write data into Parquet format. But I faced
with issue when trying to run job on Yarn cluster and save result to HDFS.
In our case we use latest Cloudera distributive (CHD 5.15) and it