Re: Master Heap Size and Master Startup Time vs. Number of Blocks

2008-05-02 Thread Cagdas Gerede
I see your point. I have one more question. If you are writing 10 blocks for a file and let's say in 10th block namenode fails, all previous 9 blocks are lost because you were not able to close the file and therefore namenode did not persist the information about 9 blocks to the fsimage file. How

Re: Master Heap Size and Master Startup Time vs. Number of Blocks

2008-05-02 Thread Ted Dunning
First of all, your 10MB files will only be 10MB long and will take one block rather than 5. This is a win already. Secondly, you can have a consolidation process that merges your small files every hour or day into large files. By building the consolidated file in a side directory that is moved

Re: Master Heap Size and Master Startup Time vs. Number of Blocks

2008-05-02 Thread Cagdas Gerede
> But you could do all this with larger blocks as well. Having a large block > size only says that a block CAN be that long, not that it MUST be that long. No you cannot. Imagine a streaming server where users send real time generated data to your server and each file is not more than 100MB. Let

Re: Master Heap Size and Master Startup Time vs. Number of Blocks

2008-05-02 Thread Ted Dunning
But you could do all this with larger blocks as well. Having a large block size only says that a block CAN be that long, not that it MUST be that long. Also, you said that the average size was ~ 40MB (20 x 2MB blocks). If that is so, then you should be able to radically decrease the number of b

Re: Master Heap Size and Master Startup Time vs. Number of Blocks

2008-05-02 Thread Cagdas Gerede
fault tolerance. As files are uploaded into our server, we can continuously write the data in small chunks and if our server fails, we can tolerate this failure by switching our user to another server and the user can continue to write. Otherwise we have to wait on the server until we get the whole

Re: Master Heap Size and Master Startup Time vs. Number of Blocks

2008-05-02 Thread Doug Cutting
Cagdas Gerede wrote: We will have 5 million files each having 20 blocks of 2MB. With the minimum replication of 3, we would have 300 million blocks. 300 million blocks would store 600TB. At ~10TB/node, this means a 60 node system. Do you think these numbers are suitable for Hadoop DFS. Why are

Re: Master Heap Size and Master Startup Time vs. Number of Blocks

2008-05-02 Thread Ted Dunning
Why did you pick such a small block size? Why not go with the default of 64MB? That would give you only 10 million blocks for your 600TB. I don't see any advantage to the tiny block size. On 5/2/08 1:06 PM, "Cagdas Gerede" <[EMAIL PROTECTED]> wrote: > Thanks Doug for your answers. Our interes

Re: Master Heap Size and Master Startup Time vs. Number of Blocks

2008-05-02 Thread Cagdas Gerede
Thanks Doug for your answers. Our interest is on more distributed file system part rather than map reduce. I must confess that our block size is not as large as how a lot of people configure. I appreciate if I can get your and others' input. Do you think these numbers are suitable? We will have 5

Re: Master Heap Size and Master Startup Time vs. Number of Blocks

2008-05-02 Thread Doug Cutting
Cagdas Gerede wrote: In the system I am working, we have 6 million blocks total and the namenode heap size is about 600 MB and it takes about 5 minutes for namenode to leave the safemode. How big is are your files? Are they several blocks on average? Hadoop is not designed for small files, b

Master Heap Size and Master Startup Time vs. Number of Blocks

2008-05-02 Thread Cagdas Gerede
In the system I am working, we have 6 million blocks total and the namenode heap size is about 600 MB and it takes about 5 minutes for namenode to leave the safemode. I try to estimate what would be the heap size if we have 100 - 150 million blocks, and what would be the amount of time for namenod