Re: AW: How to split a big file in HDFS by size

2011-06-21 Thread Niels Basjes
Hi, On Tue, Jun 21, 2011 at 16:14, Mapred Learn mapred.le...@gmail.com wrote: The problem is when 1 text file goes on HDFS as 60 GB file, one mapper takes more than an hour to convert it to sequence file and finally fails. I was thinking how to split it from the client box before uploading to

Re: How to split a big file in HDFS by size

2011-06-20 Thread Mapred Learn
Hi Christopher, If I get all 60 Gb on HDFs, can I then split it into 60 1 Gb files and then run a map-red job on those 60 text fixed length files ? If yes, do you have any idea how to do this ? On Sun, Jun 19, 2011 at 11:28 PM, Christoph Schmitz christoph.schm...@1und1.de wrote: JJ,

AW: How to split a big file in HDFS by size

2011-06-20 Thread Christoph Schmitz
Simple answer: don't. The Hadoop framework will take care of that for you and split the file. The logical 60 GB file you see in the HDFS actually *is* split into smaller chunks (default size is 64 MB) and physically distributed across the cluster. Regards, Christoph -Ursprüngliche

Re: AW: How to split a big file in HDFS by size

2011-06-20 Thread Niels Basjes
Hi, On Mon, Jun 20, 2011 at 16:13, Mapred Learn mapred.le...@gmail.com wrote: But this file is a gzipped text file. In this case, it will only go to 1 mapper than the case if it was split into 60 1 GB files which will make map-red job finish earlier than one 60 GB file as it will Hv 60

Re: AW: How to split a big file in HDFS by size

2011-06-20 Thread Marcos Ortiz
Evert Lammerts at Sara.nl did something seemed to your problem, spliting a big 2.7 TB file to chunks of 10 GB. This work was presented on the BioAssist Programmers' Day on January of this year and its name was Large-Scale Data Storage and Processing for Scientist in The Netherlands