It is definitely better to combine files into larger ones, if only to make
sure that you use sequential reads as much as possible.
On 2/21/08 9:48 PM, "Steve Sapovits" <[EMAIL PROTECTED]> wrote:
> Amar Kamat wrote:
>
>> File sizes and number of files (assuming thats what you want to tweak)
>>
Amar Kamat wrote:
File sizes and number of files (assuming thats what you want to tweak)
is not much of a concern for map-reduce. What ultimately matters is the
dfs-block-size and split-size. The basic unit of replication in DFS is
the block while the basic processing unit for map-reduce is th
File sizes and number of files (assuming thats what you want to tweak)
is not much of a concern for map-reduce. What ultimately matters is the
dfs-block-size and split-size. The basic unit of replication in DFS is
the block while the basic processing unit for map-reduce is the split.
Other para
I'm looking for any information on "best" type Hadoop configurations, in terms
of
numbers of files, numbers of files per directory, and file sizes (e.g., are
lots of small
files more of a problem than fewer larger ones, etc.).
Any pointers to documentation or experience feedback appreciated.