Re: Cluster Size/Node Density

2011-02-19 Thread Jean-Daniel Cryans
It would be the second report of someone having u23 being less stable than u17 that I see in less than a week. Interesting... J-D On Sat, Feb 19, 2011 at 9:43 AM, Wayne wrote: > What JVM is recommended for the new memstore allocator? We swtiched from u23 > back to u17 which helped a lot. Is this

Re: Cluster Size/Node Density

2011-02-19 Thread Stack
It is not jvm version dependent. Stack On Feb 19, 2011, at 6:43, Wayne wrote: > What JVM is recommended for the new memstore allocator? We swtiched from u23 > back to u17 which helped a lot. Is this optimized for a specific JVM or does > it not matter? > > On Fri, Feb 18, 2011 at 5:46 PM, T

Re: Cluster Size/Node Density

2011-02-19 Thread Wayne
What JVM is recommended for the new memstore allocator? We swtiched from u23 back to u17 which helped a lot. Is this optimized for a specific JVM or does it not matter? On Fri, Feb 18, 2011 at 5:46 PM, Todd Lipcon wrote: > On Fri, Feb 18, 2011 at 12:10 PM, Jean-Daniel Cryans >wrote: > > > The b

Re: Cluster Size/Node Density

2011-02-18 Thread Todd Lipcon
On Fri, Feb 18, 2011 at 12:10 PM, Jean-Daniel Cryans wrote: > The bigger the heap the longer the GC pause of the world when fragmentation requires it, 8GB is "safer". > On my boxes, a stop-the-world on 8G heap is already around 80 seconds... pretty catastrophic. Of course we've bumped the ZK tim

Re: Cluster Size/Node Density

2011-02-18 Thread Jean-Daniel Cryans
The bigger the heap the longer the GC pause of the world when fragmentation requires it, 8GB is "safer". In 0.90.1 you can try enabling the new memstore allocator that seems to do a really good job, checkout the jira first: https://issues.apache.org/jira/browse/HBASE-3455 J-D On Fri, Feb 18, 201

Re: Cluster Size/Node Density

2011-02-18 Thread Ted Dunning
Actually, having a smaller heap will decrease the risk of a catastrophic GC. It probably wil also increase the likelihood of a full GC. Having a larger heap will let you go long without a full GC, but with a very large heap a full GC may take your region server off-line long enough to be consider

Re: Cluster Size/Node Density

2011-02-18 Thread Chris Tarnas
Thank you , ad that bring me to my next question... What is the current recommendation on the max heap size for Hbase if RAM on the server is not an issue? Right now I am at 8GB and have no issues, can I safely do 12GB? The servers have plenty of RAM (48GB) so that should not be an issue - I j

Re: Cluster Size/Node Density

2011-02-18 Thread Jean-Daniel Cryans
That's what I usually recommend, the bigger the flushed files the better. On the other hand, you only have so much memory to dedicate to the MemStore... J-D On Fri, Feb 18, 2011 at 11:50 AM, Chris Tarnas wrote: > Would it be a good idea to raise the hbase.hregion.memstore.flush.size if you > ha

Re: Cluster Size/Node Density

2011-02-18 Thread Chris Tarnas
Would it be a good idea to raise the hbase.hregion.memstore.flush.size if you have really large regions? -chris On Feb 18, 2011, at 11:43 AM, Jean-Daniel Cryans wrote: > Less regions, but it's often a good thing if you have a lot of data :) > > It's probably a good thing to bump the HDFS block

Re: Cluster Size/Node Density

2011-02-18 Thread Jean-Daniel Cryans
Less regions, but it's often a good thing if you have a lot of data :) It's probably a good thing to bump the HDFS block size to 128 or 256MB since you know you're going to have huge-ish files. But anyway regarding penalties, I can't think of one that clearly comes out (unless you use a very smal

Re: Cluster Size/Node Density

2011-02-18 Thread Jason Rutherglen
> We are also using a 5Gb region size to keep our region > counts in the 100-200 range/node per Jonathan Grey's recommendation. So there isn't a penalty incurred from increasing the max region size from 256MB to 5GB? On Fri, Feb 18, 2011 at 10:12 AM, Wayne wrote: > We have managed to get a litt

Re: Cluster Size/Node Density

2011-02-18 Thread Wayne
We have managed to get a little more than 1k QPS to date with 10 nodes. Honestly we are not quite convinced that disk i/o seeks are our biggest bottleneck. Of course they should be...but waiting for RPC connections, network latency, thrift etc. all play into the time to get reads. The std dev. of r

Re: Cluster Size/Node Density

2011-02-17 Thread Ryan Rawson
And dont forget that reading that data from the RS does not use compression, so you are limited to about 120 MB/sec of read bandwidth per node, minus bandwidth used for HDFS replication and other incidentals. gige is just too damn slow. I look forward to 10g, perhaps we'll start seeing DC buildou

Re: Cluster Size/Node Density

2011-02-17 Thread M. C. Srivas
I was reading this thread with interest. Here's my $.02 On Fri, Dec 17, 2010 at 12:29 PM, Wayne wrote: > Sorry, I am sure my questions were far too broad to answer. > > Let me *try* to ask more specific questions. Assuming all data requests are > cold (random reading pattern) and everything come

Re: Cluster Size/Node Density

2010-12-20 Thread Stack
On Mon, Dec 20, 2010 at 9:12 AM, Wayne wrote: > Can we control the WAL and write buffer size via thrift? We assume we have > to use java for writes to get access to the settings below which we assume > we need to get extremely fast writes. We are looking for something in the > range of 100k writes

Re: Cluster Size/Node Density

2010-12-20 Thread Wayne
th HBase as those will use > HDFS > >> pread instead of seek/read. For this application, you absolutely must > be > >> using pread. > >> > >> Good luck. I'm interested in seeing how you can get HBase to perform, > we > >> are here to

Re: Cluster Size/Node Density

2010-12-20 Thread Stack
eing how you can get HBase to perform, we >> are here to help if you have any issues. >> >> JG >> >> > -Original Message----- >> > From: Wayne [mailto:wav...@gmail.com] >> > Sent: Friday, December 17, 2010 2:28 PM >> > To: user@hbase.apache.

Re: Cluster Size/Node Density

2010-12-20 Thread Wayne
t; > > -Original Message- > > From: Wayne [mailto:wav...@gmail.com] > > Sent: Friday, December 17, 2010 2:28 PM > > To: user@hbase.apache.org > > Subject: Re: Cluster Size/Node Density > > > > What can we expect from HDFS in terms of random reads? It is our o

RE: Cluster Size/Node Density

2010-12-17 Thread Jonathan Gray
ng how you can get HBase to perform, we are here to help if you have any issues. JG > -Original Message- > From: Wayne [mailto:wav...@gmail.com] > Sent: Friday, December 17, 2010 2:28 PM > To: user@hbase.apache.org > Subject: Re: Cluster Size/Node Density > > What can

Re: Cluster Size/Node Density

2010-12-17 Thread Wayne
ginal Message- > > From: Wayne [mailto:wav...@gmail.com] > > Sent: Friday, December 17, 2010 12:29 PM > > To: user@hbase.apache.org > > Subject: Re: Cluster Size/Node Density > > > > Sorry, I am sure my questions were far too broad to answer. > > > >

RE: Cluster Size/Node Density

2010-12-17 Thread Jonathan Gray
e.apache.org > Subject: Re: Cluster Size/Node Density > > Sorry, I am sure my questions were far too broad to answer. > > Let me *try* to ask more specific questions. Assuming all data requests are > cold (random reading pattern) and everything comes from the disks (no > block

Re: Cluster Size/Node Density

2010-12-17 Thread Wayne
Sorry, I am sure my questions were far too broad to answer. Let me *try* to ask more specific questions. Assuming all data requests are cold (random reading pattern) and everything comes from the disks (no block cache), what level of concurrency can HDFS handle? Almost all of the load is controlle

Re: Cluster Size/Node Density

2010-12-17 Thread Jean-Daniel Cryans
Hi Wayne, This question has such a large scope but is applicable to such a tiny subset of workloads (eg yours) that fielding all the questions in details would probably end up just wasting everyone's cycles. So first I'd like to clear up some confusion. > We would like some help with cluster sizi

Cluster Size/Node Density

2010-12-17 Thread Wayne
We would like some help with cluster sizing estimates. We have 15TB of currently relational data we want to store in hbase. Once that is replicated to a factor of 3 and stored with secondary indexes etc. we assume will have 50TB+ of data. The data is basically data warehouse style time series data