Re: Estimating disk space requirements

2013-01-18 Thread Ted Dunning
Jeff makes some good points here. On Fri, Jan 18, 2013 at 5:01 PM, Jeffrey Buell wrote: > I disagree. There are some significant advantages to using "many small > nodes" instead of "few big nodes". As Ted points out, there are some > disadvantages as well, so you have to look at the trade-offs

Re: Estimating disk space requirements

2013-01-18 Thread Jeffrey Buell
ers, sharing physical hardware with other workloads, etc. Jeff - Original Message - From: "Ted Dunning" To: user@hadoop.apache.org Sent: Friday, January 18, 2013 3:36:30 PM Subject: Re: Estimating disk space requirements If you make 20 individual small servers, that

Re: Estimating disk space requirements

2013-01-18 Thread Ted Dunning
If you make 20 individual small servers, that isn't much different from 20 from one server. The only difference would be if the neighbors of the separate VMs use less resource. On Fri, Jan 18, 2013 at 3:34 PM, Panshul Whisper wrote: > ah now i understand what you mean. > I will be creating 20 in

Re: Estimating disk space requirements

2013-01-18 Thread Panshul Whisper
ah now i understand what you mean. I will be creating 20 individual servers on the cloud, and not create one big server and make several virtual nodes inside it. I will be paying for 20 different nodes.. all configured with hadoop and connected to the cluster. Thanx for the intel :) On Fri, Jan

Re: Estimating disk space requirements

2013-01-18 Thread Ted Dunning
It is usually better to not subdivide nodes into virtual nodes. You will generally get better performance form the original node because you only pay for the OS once and because your disk I/O will be scheduled better. If you look at EC2 pricing, however, the spot market often has arbitrage opport

Re: Estimating disk space requirements

2013-01-18 Thread Mohammad Tariq
You can attach a separate disk to your instance (for example an EBS volume in case of AWS), where you will be storing only Hadoop related stuff. And one disk for OS related stuff. Warm Regards, Tariq https://mtariq.jux.com/ cloudfront.blogspot.com On Sat, Jan 19, 2013 at 4:00 AM, Panshul Whisper

Re: Estimating disk space requirements

2013-01-18 Thread Panshul Whisper
Thnx for the reply Ted, You can find 40 GB disks when u make virtual nodes on a cloud like Rackspace ;-) About the os partitions I did not exactly understand what you meant. I have made a server on the cloud.. And I just installed and configured hadoop and hbase in the /use/local folder. And I am

Re: Estimating disk space requirements

2013-01-18 Thread Ted Dunning
Where do you find 40gb disks now a days? Normally your performance is going to be better with more space but your network may be your limiting factor for some computations. That could give you some paradoxical scaling. Hbase will rarely show this behavior. Keep in mind you also want to allow

Re: Estimating disk space requirements

2013-01-18 Thread Mohammad Tariq
I have been using AWS since quite sometime and I have never faced any issue. Personally speaking, I found AWS really flexible. You get a great deal of flexibility in choosing services depending upon your requirements. Warm Regards, Tariq https://mtariq.jux.com/ cloudfront.blogspot.com On Fri, Ja

Re: Estimating disk space requirements

2013-01-18 Thread Panshul Whisper
Thank you for the reply. It will be great if someone can suggest, if setting up my cluster on Rackspace is good or on Amazon using EC2 servers? keeping in mind Amazon services have been having a lot of downtimes... My main point of concern is performance and availablitiy. My cluster has to be very

Re: Estimating disk space requirements

2013-01-18 Thread Jean-Marc Spaggiari
It all depend what you want to do with this data and the power of each single node. There is no one size fit all rule. The more nodes you have, the more CPU power you will have to process the data... But if you 80GB boxes CPUs are faster than your 40GB boxes CPU ,maybe you should take the 80GB the

Re: Estimating disk space requirements

2013-01-18 Thread Panshul Whisper
If we look at it with performance in mind, is it better to have 20 Nodes with 40 GB HDD or is it better to have 10 Nodes with 80 GB HDD? they are connected on a gigabit LAN Thnx On Fri, Jan 18, 2013 at 2:26 PM, Jean-Marc Spaggiari < jean-m...@spaggiari.org> wrote: > 20 nodes with 40 GB will do

Re: Estimating disk space requirements

2013-01-18 Thread Jean-Marc Spaggiari
20 nodes with 40 GB will do the work. After that you will have to consider performances based on your access pattern. But that's another story. JM 2013/1/18, Panshul Whisper : > Thank you for the replies, > > So I take it that I should have atleast 800 GB on total free space on > HDFS.. (combine

Re: Estimating disk space requirements

2013-01-18 Thread Panshul Whisper
Thank you for the replies, So I take it that I should have atleast 800 GB on total free space on HDFS.. (combined free space of all the nodes connected to the cluster). So I can connect 20 nodes having 40 GB of hdd on each node to my cluster. Will this be enough for the storage? Please confirm. T

Re: Estimating disk space requirements

2013-01-18 Thread Mirko Kämpf
Hi, some comments are inside your message ... 2013/1/18 Panshul Whisper > Hello, > > I was estimating how much disk space do I need for my cluster. > > I have 24 million JSON documents approx. 5kb each > the Json is to be stored into HBASE with some identifying data in coloumns > and I also wa

Re: Estimating disk space requirements

2013-01-18 Thread Jean-Marc Spaggiari
Hi Panshul, If you have 20 GB with a replication factor set to 3, you have only 6.6GB available, not 11GB. You need to divide the total space by the replication factor. Also, if you store your JSon into HBase, you need to add the key size to it. If you key is 4 bytes, or 1024 bytes, it makes a di