Appending file makes Hadoop cluster out of storage.

Quan Nguyen Hong Mon, 24 Aug 2015 05:38:29 -0700

Hi all,


Have a good day!

 

I used these below code to append file in HDFS from a local file.

 

The local file size is 85MB.

The Hadoop cluster (CDH 5.4.2, hdfs 2.6, replica number is 3) has 140GB
free.

I have a while loop, in there I do

FSDataOutputStream out = fs.append(outFile);

                           out.write(buffer, 0, bytesRead);

                           out.close();

Each time I append 1024 byte from local file to HDFS file, the above loop
makes my cluster out of storage and my program couldn't finished yet.

 

Here's the full code.

import java.io.*;

import java.net.URI;

import java.net.URISyntaxException;

import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.fs.*;

public class writeflushexisted {

             public static void main(String[] argv) throws IOException,
URISyntaxException {

                           Configuration conf = new Configuration();

                           FileSystem fs = FileSystem.get(new URI(
"hdfs://192.168.94.185:8020" ),conf);

 

                           Path inFile = new Path("testdata.txt");

                           Path outFile = new Path("/myhdfs/testdata.txt");

 

                           File localFile = new File(inFile.toString());

                           // Read from and write to new file

                           FileInputStream in = new
FileInputStream(localFile);

                           

                           int i = 0;

                           byte buffer[] = new byte[1024];

                           try {

                                        int bytesRead = 0;

                                        while ((bytesRead = in.read(buffer))
> 0) {

                                                     FSDataOutputStream out
= fs.append(outFile);

                                                     out.write(buffer, 0,
bytesRead);

                                                     out.close();

                                                     i++;

                                        }

                           } catch (IOException e) {

                                        System.out.println("Error while
copying file: " + e.getMessage());

                                        

                           } finally {

                                        in.close();

                                        System.out.println("Number of loop:"
+ i);

                           }

             }

}

 

Here's the information before I run this code

---------------------------------------------------------------

[hdfs@chdhost125 current]$ hadoop fs -df -h

Filesystem                                     Size    Used  Available  Use%

hdfs://chdhost185.vitaldev.com:8020  266.4 G  38.2 G    139.8 G   14%

---------------------------------------------------------------

[hdfs@chdhost125 lib]$ hadoop fs -du -h /

67.7 M  1.3 G   /hbase

0       0       /myhdfs

0       0       /solr

1.8 G   5.4 G   /tmp

10.6 G  31.4 G  /user

 

 

Here's the information while above code was running 

---------------------------------------------------------------

Filesystem                                     Size     Used  Available
Use%

hdfs://chdhost185.vitaldev.com:8020  266.4 G  170.2 G     95.9 G   64%

---------------------------------------------------------------

[hdfs@chdhost125 lib]$ hadoop fs -du -h /

67.7 M  1.3 G   /hbase

32.9 M  384 M   /myhdfs

0       0       /solr

1.8 G   5.4 G   /tmp

10.6 G  31.4 G  /user

 

 

After 10 minutes, my cluster is out of storage and my program throw
exception with error: "Error while copying file: Failed to replace a bad
datanode on the existing pipeline due to no more good datanodes being
available to try. (Nodes: current=[192.168.94.185:50010,
192.168.94.27:50010], original=[192.168.94.185:50010, 192.168.94.27:50010]).
The current failed datanode replacement policy is DEFAULT, and a client may
configure this via
'dfs.client.block.write.replace-datanode-on-failure.policy' in its
configuration."

 

So, why append file with little append size (1024 byte) make my cluster out
of space (local file is 85MB, but hdfs consumes ~ 140GB to append file)?

 

Is any problem with my code? I know that append file with small size is not
recommend, but I just want to know the reason why hdfs consume so much
space.

 

 

Thanks and Regards,

Quan Nguyen

Appending file makes Hadoop cluster out of storage.

Reply via email to