, which allows Hadoop to process all blocks in
> parallel.) Note that you’ll need enough storage capacity. I don’t have
> example code, but I’m guessing Google can help.
>
>
>
>
>
>
>
> From: Mapred Learn [mailto:mapred.le...@gmail.com]
> Sent: maandag 20 juni 2011 18:0
Evert Lammerts at Sara.nl did something seemed to your problem, spliting
a big 2.7 TB file to chunks of 10 GB.
This work was presented on the BioAssist Programmers' Day on January of
this year and its name was
"Large-Scale Data Storage and Processing for Scientist in The Netherlands"
http://www
Hi,
On Mon, Jun 20, 2011 at 16:13, Mapred Learn wrote:
> But this file is a gzipped text file. In this case, it will only go to 1
> mapper than the case if it was
> split into 60 1 GB files which will make map-red job finish earlier than one
> 60 GB file as it will
> Hv 60 mappers running in pa
Nachricht-
Von: Mapred Learn [mailto:mapred.le...@gmail.com]
Gesendet: Montag, 20. Juni 2011 16:14
An: mapreduce-user@hadoop.apache.org
Cc: mapreduce-user@hadoop.apache.org
Betreff: Re: AW: How to split a big file in HDFS by size
But this file is a gzipped text file. In this case, it will
But this file is a gzipped text file. In this case, it will only go to 1 mapper
than the case if it was split into 60 1 GB files which will make map-red job
finish earlier than one 60 GB file as it will Hv 60 mappers running in
parallel. Isn't it so ?
Sent from my iPhone
On Jun 20, 2011, at 12
Simple answer: don't. The Hadoop framework will take care of that for you and
split the file. The logical 60 GB file you see in the HDFS actually *is* split
into smaller chunks (default size is 64 MB) and physically distributed across
the cluster.
Regards,
Christoph
-Ursprüngliche Nachrich
JJ,
uploading 60 GB single-threaded (i.e. hadoop fs -copyFromLocal etc.) will be
slow. If possible, try to get the files in smaller chunks where they are
created, and upload them in parallel with a simple MapReduce job that only
passes the data through (i.e. uses the standard Mapper and Reducer