I see your point.
I have one more question. If you are writing 10 blocks for a file and let's
say in 10th block namenode fails, all previous 9 blocks are lost because you
were not able to close the file and therefore namenode did not persist the
information about 9 blocks to the fsimage file.
How
First of all, your 10MB files will only be 10MB long and will take one block
rather than 5. This is a win already.
Secondly, you can have a consolidation process that merges your small files
every hour or day into large files. By building the consolidated file in a
side directory that is moved
> But you could do all this with larger blocks as well. Having a large
block
> size only says that a block CAN be that long, not that it MUST be that
long.
No you cannot.
Imagine a streaming server where users send real time generated data to your
server and each file is not more than 100MB. Let
But you could do all this with larger blocks as well. Having a large block
size only says that a block CAN be that long, not that it MUST be that long.
Also, you said that the average size was ~ 40MB (20 x 2MB blocks). If that
is so, then you should be able to radically decrease the number of b
fault tolerance. As files are uploaded into our server, we can continuously
write the data in small chunks and if our server fails, we can tolerate this
failure by switching our user to another server and the user can continue to
write. Otherwise we have to wait on the server until we get the whole
Cagdas Gerede wrote:
We will have 5 million files each having 20 blocks of 2MB. With the minimum
replication of 3, we would have 300 million blocks.
300 million blocks would store 600TB. At ~10TB/node, this means a 60 node
system.
Do you think these numbers are suitable for Hadoop DFS.
Why are
Why did you pick such a small block size?
Why not go with the default of 64MB?
That would give you only 10 million blocks for your 600TB.
I don't see any advantage to the tiny block size.
On 5/2/08 1:06 PM, "Cagdas Gerede" <[EMAIL PROTECTED]> wrote:
> Thanks Doug for your answers. Our interes
Thanks Doug for your answers. Our interest is on more distributed file
system part rather than map reduce.
I must confess that our block size is not as large as how a lot of people
configure. I appreciate if I can get your and others' input.
Do you think these numbers are suitable?
We will have 5
Cagdas Gerede wrote:
In the system I am working, we have 6 million blocks total and the namenode
heap size is about 600 MB and it takes about 5 minutes for namenode to leave
the safemode.
How big is are your files? Are they several blocks on average? Hadoop
is not designed for small files, b
In the system I am working, we have 6 million blocks total and the namenode
heap size is about 600 MB and it takes about 5 minutes for namenode to leave
the safemode.
I try to estimate what would be the heap size if we have 100 - 150 million
blocks, and what would be the amount of time for namenod
10 matches
Mail list logo