Hi,
On Tue, Jun 21, 2011 at 16:14, Mapred Learn mapred.le...@gmail.com wrote:
The problem is when 1 text file goes on HDFS as 60 GB file, one mapper takes
more than an hour to convert it to sequence file and finally fails.
I was thinking how to split it from the client box before uploading to
Simple answer: don't. The Hadoop framework will take care of that for you and
split the file. The logical 60 GB file you see in the HDFS actually *is* split
into smaller chunks (default size is 64 MB) and physically distributed across
the cluster.
Regards,
Christoph
-Ursprüngliche
Hi,
On Mon, Jun 20, 2011 at 16:13, Mapred Learn mapred.le...@gmail.com wrote:
But this file is a gzipped text file. In this case, it will only go to 1
mapper than the case if it was
split into 60 1 GB files which will make map-red job finish earlier than one
60 GB file as it will
Hv 60
Evert Lammerts at Sara.nl did something seemed to your problem, spliting
a big 2.7 TB file to chunks of 10 GB.
This work was presented on the BioAssist Programmers' Day on January of
this year and its name was
Large-Scale Data Storage and Processing for Scientist in The Netherlands