I have the similar situation, I have very small files,
I never tried HBase (want to), but you can also group them
and write (let's say) 20-30 into a file as every file becomes a key in that
big file.
There are methods in API which you can write an object as a file into HDFS,
and read again
to get
The way I typically address that is to write a zip file using the zip
utilities. Commonly for output.
HDFS is not optimized for low latency, but for high through put for bulk
operations.
2009/5/7 Edward Capriolo
> 2009/5/7 Jeff Hammerbacher :
> > Hey,
> >
> > You can read more about why small fi
2009/5/7 Jeff Hammerbacher :
> Hey,
>
> You can read more about why small files are difficult for HDFS at
> http://www.cloudera.com/blog/2009/02/02/the-small-files-problem.
>
> Regards,
> Jeff
>
> 2009/5/7 Piotr Praczyk
>
>> If You want to use many small files, they are probably having the same
>>
Hey,
You can read more about why small files are difficult for HDFS at
http://www.cloudera.com/blog/2009/02/02/the-small-files-problem.
Regards,
Jeff
2009/5/7 Piotr Praczyk
> If You want to use many small files, they are probably having the same
> purpose and struc?
> Why not use HBase instead
If You want to use many small files, they are probably having the same
purpose and struc?
Why not use HBase instead of a raw HDFS ? Many small files would be packed
together and the problem would disappear.
cheers
Piotr
2009/5/7 Jonathan Cao
> There are at least two design choices in Hadoop tha
There are at least two design choices in Hadoop that have implications for
your scenario.
1. All the HDFS meta data is stored in name node memory -- the memory size
is one limitation on how many "small" files you can have
2. The efficiency of map/reduce paradigm dictates that each mapper/reducer
j
Please try -D dfs.block.size=4096000
The specification must be in bytes.
On Tue, May 5, 2009 at 4:47 AM, Christian Ulrik Søttrup wrote:
- 隐藏引用文字 -
> Hi all,
>
> I have a job that creates very big local files so i need to split it to as
> many mappers as possible. Now the DFS block size I'm
> us