On Fri, Jun 10, 2011 at 3:58 AM, Michel Segel wrote:
> Not to mention you don't get a linear boost w port bonding.
>
Well, you don't get linear boost with switch level bonding. You can get it,
however.
> You have to be careful on hardware recommendations because there are
> pricing sweet spots
Expensive is relative and with the latest intel hardware release you're
starting to see 10gbe on the motherboard.
Not to mention you don't get a linear boost w port bonding.
You have to be careful on hardware recommendations because there are pricing
sweet spots and technology changes.
Sent fr
Ensure enough networking bandwidth to match your drive-bandwidth, otherwise
your compaction rates are going to be abysmal. 10 GigE ports are expensive,
so consider 2 x 1GigE per box (or even 4 x 1GigE if you can get that many
on-board NICs).
On Wed, Jun 8, 2011 at 8:49 AM, Andrew Purtell wrote:
> From: Ted Dunning
> Lots of people are moving towards more spindles per box to
> increase IOP/s
>
> This is particular important for cases where the working
> set gets pushed out of memory.
Indeed.
Our spec is more like 12x 500 GB SATA disks, to push IOPS and more evenly
balance CPUs (fast du
07, 2011 1:11 AM
To: hbase-u...@hadoop.apache.org
Subject: Hbase Hardware requirement
Dear friends,
Please suggest a standard hardware configuration for hbase cluster which is
going to be used to pull and store a lot of data.
--
Thanks,
Shah
Sandy Bridge and SolarFlare are changing some of the design
> considerations.
>
>> Date: Tue, 7 Jun 2011 10:32:58 +0200
>> Subject: Re: Hbase Hardware requirement
>> From: timrobertson...@gmail.com
>> To: user@hbase.apache.org
>>
>> http://www.cloudera.com/blo
And even that recommendation isn't right. ;-)
I think Sandy Bridge and SolarFlare are changing some of the design
considerations.
> Date: Tue, 7 Jun 2011 10:32:58 +0200
> Subject: Re: Hbase Hardware requirement
> From: timrobertson...@gmail.com
> To: user@hbase.apa
Lots of people are moving towards more spindles per box to increase IOP/s
This is particular important for cases where the working set gets pushed out
of memory.
On Tue, Jun 7, 2011 at 1:32 AM, Tim Robertson wrote:
>
> http://www.cloudera.com/blog/2010/03/clouderas-support-team-shares-some-basic
http://www.cloudera.com/blog/2010/03/clouderas-support-team-shares-some-basic-hardware-recommendations/
"4 1TB hard disks in a JBOD (Just a Bunch Of Disks) configuration
2 quad core CPUs, running at least 2-2.5GHz
16-24GBs of RAM (24-32GBs if you’re considering HBase)
Gigabit Ethernet"
HTH,
Tim
Dear friends,
Please suggest a standard hardware configuration for hbase cluster which is
going to be used to pull and store a lot of data.
--
Thanks,
Shah
task
and then you have 1GB for datanode and 1GB for Task Tracker.
The you'd have some head room to run 1024MB per task, or more if you end up
with larger jobs.
HTH
-Mike
> Date: Mon, 22 Nov 2010 15:50:05 +0200
> Subject: Re: Hadoop/HBase hardware requirement
> From: li...@inf
I believe our bottleneck is currently with memory.
Most of the CPU is dedicated to parsing and working with gzip files (we
don't have heavy computational tasks).
But on the other hand, more memory and disk mean we can run more M/R and
scans which needs more CPU
Lior
On Mon, Nov 22, 2010 at 3
Hi Lior,
Depends on your load, is it IO or CPU bound? Sounds like IO or Disk
from the above, right? I would opt for the more machines! This will
spread the load better across the cluster. And you can always add more
disks in v2 of your setup.
Lars
On Mon, Nov 22, 2010 at 1:56 PM, Lior Schachter
Hi Lars,
I agree with every sentence you wrote (and that's why we chose HBase).
However, from a managerial point-of-view the question of the initial
investment is very important (specially when considering a new technology).
Lior
p.s. The price is in USD
On Mon, Nov 22, 2010 at 2:43 PM, La
And another more concrete question:
lets say that on every machine with two quad core CPUs, 4T and 16GB I can
buy 2 machines with one quad, 2T, 16GB.
Which configuration should I choose ?
Lior
On Mon, Nov 22, 2010 at 2:27 PM, Lior Schachter wrote:
> Hi all, Thanks for your input and assistance
Hi Lior,
I can only hope you state this in Schekel! But 20 nodes with Hadoop
can do quite a lot and you cannot compare a single Oracle box with a
20 node Hadoop cluster as they serve slightly different use-cases. You
need to make a commitment to what you want to achieve with HBase and
that growth
Hi all, Thanks for your input and assistance.
>From your answers I understand that:
1. more is better but our configuration might work.
2. there are small tweaks we can do that will improve our configuration
(like having 4x500GB disks).
3. use monitoring (like Ganglia) to find the bottlenecks.
F
Oleg,
Do you have Ganglia or some other graphing tool running against the
cluster? It gives you metrics that are crucial here, for example the
load on Hadoop and its DataNodes as well as insertion rates etc. on
HBase. What is also interesting is the compaction queue to see if the
cluster is going
On Sun, Nov 21, 2010 at 10:39 PM, Krishna Sankar wrote:
> Oleg & Lior,
>
> Couple of questions & couple of suggestions to ponder:
> A) When you say 20 Name Servers, I assume you are talking about 20 Task
> Servers
>
Yes
> B) What type are your M/R jobs ? Compute Intensive vs. storage intensiv
Yes , I am fully agree with that , but As I understand there is a limitation
: Bulk load works only with one column family.
It is not my case. Can you advice me workaround how to use bulk load
with multi column families?
Thanks Oleg.
> Are these insertions the output of the MR jobs?
>
> If s
On Sun, Nov 21, 2010 at 9:55 PM, Jean-Daniel Cryans wrote:
I'm unclear about the 2TB disk thing, is it 1x2TB or 2x1TB or 4x500GB?
I hope it's the last one, as you want to have as many spindles as possible.
We have 2X1TB ,
I would prefer 24GB to 16, this is what we run on and it
works like a char
On Sun, Nov 21, 2010 at 5:53 AM, Oleg Ruchovets wrote:
> Hi all,
> After testing HBase for few months with very light configurations (5
> machines, 2 TB disk, 8 GB RAM), we are now planing for production.
> Our Load -
> 1) 50GB log files to process per day by Map/Reduce jobs.
> 2) Insert 4-5GB t
Oleg & Lior,
Couple of questions & couple of suggestions to ponder:
A) When you say 20 Name Servers, I assume you are talking about 20 Task
Servers
B) What type are your M/R jobs ? Compute Intensive vs. storage intensive ?
C) What is your Data growth ?
D) With the current jobs, are you saturat
I'm unclear about the 2TB disk thing, is it 1x2TB or 2x1TB or 4x500GB?
I hope it's the last one, as you want to have as many spindles as
possible. I would prefer 24GB to 16, this is what we run on and it
works like a charm, and gives more room for those memory hungry jobs.
What kind of stability i
Hi all,
After testing HBase for few months with very light configurations (5
machines, 2 TB disk, 8 GB RAM), we are now planing for production.
Our Load -
1) 50GB log files to process per day by Map/Reduce jobs.
2) Insert 4-5GB to 3 tables in hbase.
3) Run 10-20 scans per day (scanning about 20 r
25 matches
Mail list logo