Re: file/directory sizes

2008-02-21 Thread Ted Dunning
It is definitely better to combine files into larger ones, if only to make sure that you use sequential reads as much as possible. On 2/21/08 9:48 PM, "Steve Sapovits" <[EMAIL PROTECTED]> wrote: > Amar Kamat wrote: > >> File sizes and number of files (assuming thats what you want to tweak) >>

Re: file/directory sizes

2008-02-21 Thread Steve Sapovits
Amar Kamat wrote: File sizes and number of files (assuming thats what you want to tweak) is not much of a concern for map-reduce. What ultimately matters is the dfs-block-size and split-size. The basic unit of replication in DFS is the block while the basic processing unit for map-reduce is th

Re: file/directory sizes

2008-02-21 Thread Amar Kamat
File sizes and number of files (assuming thats what you want to tweak) is not much of a concern for map-reduce. What ultimately matters is the dfs-block-size and split-size. The basic unit of replication in DFS is the block while the basic processing unit for map-reduce is the split. Other para

file/directory sizes

2008-02-21 Thread Steve Sapovits
I'm looking for any information on "best" type Hadoop configurations, in terms of numbers of files, numbers of files per directory, and file sizes (e.g., are lots of small files more of a problem than fewer larger ones, etc.). Any pointers to documentation or experience feedback appreciated.