Jeff makes some good points here.
On Fri, Jan 18, 2013 at 5:01 PM, Jeffrey Buell wrote:
> I disagree. There are some significant advantages to using "many small
> nodes" instead of "few big nodes". As Ted points out, there are some
> disadvantages as well, so you have to look at the trade-offs
ers, sharing physical hardware with
other workloads, etc.
Jeff
- Original Message -
From: "Ted Dunning"
To: user@hadoop.apache.org
Sent: Friday, January 18, 2013 3:36:30 PM
Subject: Re: Estimating disk space requirements
If you make 20 individual small servers, that
If you make 20 individual small servers, that isn't much different from 20
from one server. The only difference would be if the neighbors of the
separate VMs use less resource.
On Fri, Jan 18, 2013 at 3:34 PM, Panshul Whisper wrote:
> ah now i understand what you mean.
> I will be creating 20 in
ah now i understand what you mean.
I will be creating 20 individual servers on the cloud, and not create one
big server and make several virtual nodes inside it.
I will be paying for 20 different nodes.. all configured with hadoop and
connected to the cluster.
Thanx for the intel :)
On Fri, Jan
It is usually better to not subdivide nodes into virtual nodes. You will
generally get better performance form the original node because you only
pay for the OS once and because your disk I/O will be scheduled better.
If you look at EC2 pricing, however, the spot market often has arbitrage
opport
You can attach a separate disk to your instance (for example an
EBS volume in case of AWS), where you will be storing only
Hadoop related stuff. And one disk for OS related stuff.
Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com
On Sat, Jan 19, 2013 at 4:00 AM, Panshul Whisper
Thnx for the reply Ted,
You can find 40 GB disks when u make virtual nodes on a cloud like
Rackspace ;-)
About the os partitions I did not exactly understand what you meant.
I have made a server on the cloud.. And I just installed and configured
hadoop and hbase in the /use/local folder.
And I am
Where do you find 40gb disks now a days?
Normally your performance is going to be better with more space but your
network may be your limiting factor for some computations. That could give you
some paradoxical scaling. Hbase will rarely show this behavior.
Keep in mind you also want to allow
I have been using AWS since quite sometime and I have
never faced any issue. Personally speaking, I found AWS
really flexible. You get a great deal of flexibility in choosing
services depending upon your requirements.
Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com
On Fri, Ja
Thank you for the reply.
It will be great if someone can suggest, if setting up my cluster on
Rackspace is good or on Amazon using EC2 servers?
keeping in mind Amazon services have been having a lot of downtimes...
My main point of concern is performance and availablitiy.
My cluster has to be very
It all depend what you want to do with this data and the power of each
single node. There is no one size fit all rule.
The more nodes you have, the more CPU power you will have to process
the data... But if you 80GB boxes CPUs are faster than your 40GB boxes
CPU ,maybe you should take the 80GB the
If we look at it with performance in mind,
is it better to have 20 Nodes with 40 GB HDD
or is it better to have 10 Nodes with 80 GB HDD?
they are connected on a gigabit LAN
Thnx
On Fri, Jan 18, 2013 at 2:26 PM, Jean-Marc Spaggiari <
jean-m...@spaggiari.org> wrote:
> 20 nodes with 40 GB will do
20 nodes with 40 GB will do the work.
After that you will have to consider performances based on your access
pattern. But that's another story.
JM
2013/1/18, Panshul Whisper :
> Thank you for the replies,
>
> So I take it that I should have atleast 800 GB on total free space on
> HDFS.. (combine
Thank you for the replies,
So I take it that I should have atleast 800 GB on total free space on
HDFS.. (combined free space of all the nodes connected to the cluster). So
I can connect 20 nodes having 40 GB of hdd on each node to my cluster. Will
this be enough for the storage?
Please confirm.
T
Hi,
some comments are inside your message ...
2013/1/18 Panshul Whisper
> Hello,
>
> I was estimating how much disk space do I need for my cluster.
>
> I have 24 million JSON documents approx. 5kb each
> the Json is to be stored into HBASE with some identifying data in coloumns
> and I also wa
Hi Panshul,
If you have 20 GB with a replication factor set to 3, you have only
6.6GB available, not 11GB. You need to divide the total space by the
replication factor.
Also, if you store your JSon into HBase, you need to add the key size
to it. If you key is 4 bytes, or 1024 bytes, it makes a di
16 matches
Mail list logo