This is GREAT information folks. This is why I like open source communities -:) I will present this to management, but in the mean time, the management has thrown another *monkey* wrench. They want me to check the possibility of replacing Netezza with *something*. Of course, I want to propose replacing Netezza with HBase. Anyway, it's best if I start another email thread. Thanks again.
On Wed, Sep 7, 2011 at 10:27 PM, Andrew Purtell <apurt...@apache.org> wrote: > > While generalizations are dangerous, the one place when C++ code could > > shine over java (JVM really) is one does not have to fight the GC. > > Yes. > > > That being said, the folks working on hbase > > have been actively been addressing this problem to the extent possible > > in pure java by using unmanaged heap memory. Search for "mslab hbase" to > > learn more about it. > > And Cloudera's Li Pi has been working on using off heap memory as a > secondary cache in HBASE-4027 and related jiras: > https://issues.apache.org/jira/browse/HBASE-4027 . I believe this is > important work. This gets us a lot closer to behaving like a C++-ish "large > memory" process than we can under a JVM GC regime, until perhaps G1 is > stable in what people run in production. > > > Best regards, > > > - Andy > > Problems worthy of attack prove their worth by hitting back. - Piet Hein > (via Tom White) > > > >________________________________ > >From: Arvind Jayaprakash <w...@anomalizer.net> > >To: user@hbase.apache.org > >Sent: Thursday, September 8, 2011 2:49 AM > >Subject: Re: HBase Vs CitrusLeaf? > > > >On Sep 06, Something Something wrote: > >>Anyway, before I spent a lot of time on it, I thought I should check if > >>anyone has compared HBase against CitrusLeaf. If you've, I would greatly > >>appreciate it if you would share your experiences. > > > >Disclaimer: I was an early evaluator/tester of citrusleaf about a year > >ago when it was in its infancy. Though I am not affliated with them in > >any manner, I might be more benevolent to them than most readers of this > >mailing list. > > > >The short answer is that hbase & citrusleaf (called CL in remainder of > >the mail) are very different products. > > > >CL cares a lot more about predictable latencies than hbase does. This is > >manifested in two aspects of the design: > > > >* It is heavily optimized for large RAM + SSD usage. While hbase does > >a fair job of using RAM, I can say for sure that both the throughput and > >latency trends is much better with CL in cases where spinning disks are > >not used directly in the readwrite path. > > > >* Multiple machines can concurrently/actively handle requests for the > >same key, so the loss of one server does not mean that a range of keys > >is temporarily unavailable. A hbase cluster does have a partial, > >temporary outage when a region server dies. Things don't get back to > >normal immediately even when a new server takes over since not all > >region data may now be local disk reads. Even if they are, it won't be > >readily waiting for you in fast memory. > > > >* A third aspect that is more of a side-effect is that HDFS still has a > >SPOF in form the namenode does continue to be a cause for concern wrt > >overall uptime guarantees > > > > > >Here is where hbase would do much better: > > > >* It is designed for much larger data to the point where it is natural > >for the entire dataset to much larger than the total available RAM and > >the usage of hard disks as the primary storage medium is natural. > > > >* A bigtable implementation is also designed for both ranged scans and > >also full table scans. Last I recall, CL was more of a DHT and so ranged > >scans is infeasible and doing full scans would qualify as much more than > >shooting oneself in the foot. > > > > > >And here is where hbase has advantages in principle: > > > >* As others mentioned, there are "textbook" advantages of using an open > >source solution. > > > >* hbase definitely has run both longer and on larger clusters than CL > >possibly has. > > > > > >While generalizations are dangerous, the one place when C++ code could > >shine over java (JVM really) is one does not have to fight the GC. I'd > >personally be more confomtable with handing off say 48GB of memory to a > >good C/C++ code than the JVM. That being said, the folks working on hbase > >have been actively been addressing this problem to the extent possible > >in pure java by using unmanaged heap memory. Search for "mslab hbase" to > >learn more about it. > > > > > >My conclusion is that the two products address different problem spaces. > >So I'd urge you to spend time understanding your access patterns and see > >which one does it map to more closely. Feel free to contact me off list > >if you feel the need to ask anything that is not approrpiate for the > >mailing list but is relevant to this discussion. > > > > > > >