liang yu created FLINK-36112:
--------------------------------

             Summary: Add Support for CreateFlag.NO_LOCAL_WRITE in FLINK on 
YARN's File Creation to Manage Disk Space and Network Load in Labeled YARN Nodes
                 Key: FLINK-36112
                 URL: https://issues.apache.org/jira/browse/FLINK-36112
             Project: Flink
          Issue Type: Improvement
            Reporter: liang yu


{*}Description{*}: I am currently using Apache Flink to write files into 
Hadoop. The Flink application runs on a labeled YARN queue. During operation, 
it has been observed that the local disks on these labeled nodes get filled up 
quickly, and the network load is significantly high. This issue arises because 
Hadoop prioritizes writing files to the local node first, and the number of 
these labeled nodes is quite limited.

 

{*}Problem{*}: The current behavior leads to inefficient disk space utilization 
and high network traffic on these few labeled nodes, which could potentially 
affect the performance and reliability of the application. As shown in the 
picture, the host I circled have a average net_bytes_sent speed 1.2GB/s while 
the others are just 50MB/s, this imbalance in network and disk space nearly 
destroyed the whole cluster. 

 

{*}Implementation{*}: The implementation would involve adding a method of 
FileSystem.class to support the {{CreateFlag.NO_LOCAL_WRITE}}  when we try to 
create a new file through HadoopFileSystem.create() API. What's more, I modify 
the code of FileSink class so that we can choose to enable no_local_write or 
disable this feature. This will provide flexibility to  Flink running in 
labeled Yarn queues to opt for non-local writes when necessary.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to