Build failed in Hudson: hbase-0.90 #31

2010-12-15 Thread Apache Hudson Server
See Changes: [stack] HBASE-3365 EOFE contacting crashed RS causes Master abort [jdcryans] HBASE-3363 ReplicationSink should batch delete doc fixes for replication -- [...truncated 2739 lines..

Hudson build is back to normal : hbase-0.90 #32

2010-12-15 Thread Apache Hudson Server
See

RE: Hypertable claiming upto >900% random-read throughput vs HBase

2010-12-15 Thread Chad Walters
I was really just trying to address this point that Ryan made: "- They are able to harness larger amounts of RAM, so they are really just testing that vs HBase" In cases where that actually makes a difference (i.e. there are significant amounts of RAM that can't be harnessed), the overhead of ad

Re: Hypertable claiming upto >900% random-read throughput vs HBase

2010-12-15 Thread Ed Kohlwey
Along the lines of Terracotta big memory, apparently what they are actually doing is just using the DirectByteBuffer class (see this forum post: http://forums.terracotta.org/forums/posts/list/4304.page) which is basically the same as using malloc - it gives you non-gc access to a giant pool of memo

Review Request: hbase-3362 If .META. offline between OPENING and OPENED, then wrong server location in .META. is possible

2010-12-15 Thread stack
--- This is an automatically generated e-mail. To reply, visit: http://review.cloudera.org/r/1298/ --- Review request for hbase and Jonathan Gray. Summary --- M src/main/java/or

Build failed in Hudson: hbase-0.90 #30

2010-12-15 Thread Apache Hudson Server
See Changes: [jdcryans] HBASE-3360 ReplicationLogCleaner is enabled by default in 0.90 -- causes NPE -- [...truncated 2734 lines...] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elaps

Re: Hypertable claiming upto >900% random-read throughput vs HBase

2010-12-15 Thread Andrew Purtell
> From: Ryan Rawson > Purtell has more, but he told me "no longer crashes, but minor pauses > between 50-250 ms". From 1.6_23. That's right. On EC2 m1.xlarge so that's a big caveat... per-test-iteration variance on EC2 in general is ~20%, and EC2 hardware is 2? generations back. Someone with

Re: Hypertable claiming upto >900% random-read throughput vs HBase

2010-12-15 Thread Andrew Purtell
> Does anybody have a recent report about how G1 is coming along? Not in general, but as pertains to HBase, tried it recently with 1.6.0u23 and ran a generic heavy write test without crashing any more, so that is something. But I have not tried stressing it at "production" workloads. Best regar

Re: Hypertable claiming upto >900% random-read throughput vs HBase

2010-12-15 Thread Ted Dunning
That isn't really the trade-off. The 10x is on an undocumented benchmark with apples to oranges tuning. Moreover, hbase has had massive speedups since then. Being able to set heap size actually lets me control memory use more precisely and running a single JVM lets me amortize JVM cost. Java do

Re: Hypertable claiming upto >900% random-read throughput vs HBase

2010-12-15 Thread Ryan Rawson
I've looked in to this a lot, and the summary is 'easier said than done'. If you look at Terracotta they are using serialization of data structures to off-heap ram, so it really is kind of like those EMM systems from ye olde dose days. Having done some prototypes of this, the most likely use case

Re: Hypertable claiming upto >900% random-read throughput vs HBase

2010-12-15 Thread Todd Lipcon
On Wed, Dec 15, 2010 at 12:27 PM, Vladimir Rodionov wrote: > Why do not you use off heap memory for this purpose? If its block cache (all > blocks are of equal sizes) > alloc/free algorithm is pretty much simple - you do not have to re-implement > malloc in Java. The block cache unfortunately i

RE: Hypertable claiming upto >900% random-read throughput vs HBase

2010-12-15 Thread Vladimir Rodionov
Why do not you use off heap memory for this purpose? If its block cache (all blocks are of equal sizes) alloc/free algorithm is pretty much simple - you do not have to re-implement malloc in Java. I think something like open source version of Terracotta BigMemory is a good candidate for Apache

RE: Hypertable claiming upto >900% random-read throughput vs HBase

2010-12-15 Thread Chad Walters
Sure, but if the tradeoff is being unable to use all the memory effectively and suffering 10x unfavorable benchmark comparisons, then running 2 or more JVMs with a regionserver per VM seems like a reasonable stopgap until the GC works better. Chad -Original Message- From: Ryan Rawson [

Re: Hypertable claiming upto >900% random-read throughput vs HBase

2010-12-15 Thread Ryan Rawson
Why do that? You reduce the cache effectiveness and up the logistical complexity. As a stopgap maybe, but not as a long term strategy. Sun just needs to fix their GC. Er, Oracle. -ryan On Wed, Dec 15, 2010 at 11:55 AM, Chad Walters wrote: > Why not run multiple JVMs per machine? > > Chad > >

RE: Hypertable claiming upto >900% random-read throughput vs HBase

2010-12-15 Thread Chad Walters
Why not run multiple JVMs per machine? Chad -Original Message- From: Ryan Rawson [mailto:ryano...@gmail.com] Sent: Wednesday, December 15, 2010 11:52 AM To: dev@hbase.apache.org Subject: Re: Hypertable claiming upto >900% random-read throughput vs HBase The malloc thing was pointing out

Re: Hypertable claiming upto >900% random-read throughput vs HBase

2010-12-15 Thread Ryan Rawson
The malloc thing was pointing out that we have to contend with Xmx and GC. So it makes it harder for us to maximally use all the available ram for block cache in the regionserver. Which you may or may not want to do for alternative reasons. At least with Xmx you can plan and control your deploym

Re: Hypertable claiming upto >900% random-read throughput vs HBase

2010-12-15 Thread Todd Lipcon
On Wed, Dec 15, 2010 at 11:44 AM, Gaurav Sharma wrote: > Thanks Ryan and Ted. I also think if they were using tcmalloc, it would have > given them a further advantage but as you said, not much is known about the > test source code. I think Hypertable does use tcmalloc or jemalloc (forget which)

Re: Hypertable claiming upto >900% random-read throughput vs HBase

2010-12-15 Thread Gaurav Sharma
Thanks Ryan and Ted. I also think if they were using tcmalloc, it would have given them a further advantage but as you said, not much is known about the test source code. On Wed, Dec 15, 2010 at 2:22 PM, Ryan Rawson wrote: > So if that is the case, I'm not sure how that is a fair test. One > sy

Re: Hypertable claiming upto >900% random-read throughput vs HBase

2010-12-15 Thread Ryan Rawson
Purtell has more, but he told me "no longer crashes, but minor pauses between 50-250 ms". From 1.6_23. Still not usable in a latency sensitive prod setting. Maybe in other settings? -ryan On Wed, Dec 15, 2010 at 11:31 AM, Ted Dunning wrote: > Does anybody have a recent report about how G1 is

Re: Hypertable claiming upto >900% random-read throughput vs HBase

2010-12-15 Thread Ted Dunning
Does anybody have a recent report about how G1 is coming along? On Wed, Dec 15, 2010 at 11:22 AM, Ryan Rawson wrote: > As G1 GC improves, I expect our ability to use larger and larger heaps > would blunt the advantage of a C++ program using malloc. >

Re: Hypertable claiming upto >900% random-read throughput vs HBase

2010-12-15 Thread Ryan Rawson
So if that is the case, I'm not sure how that is a fair test. One system reads from RAM, the other from disk. The results as expected. Why not test one system with SSDs and the other without? It's really hard to get apples/oranges comparison. Even if you are doing the same workloads on 2 divers

Re: Hypertable claiming upto >900% random-read throughput vs HBase

2010-12-15 Thread Ted Dunning
>From the small comments I have heard, the RAM versus disk difference is mostly what I have heard they were testing. On Wed, Dec 15, 2010 at 11:11 AM, Ryan Rawson wrote: > We dont have the test source code, so it isnt very objective. However > I believe there are 2 things which help them: > - T

Re: Hypertable claiming upto >900% random-read throughput vs HBase

2010-12-15 Thread Ryan Rawson
Hi, We dont have the test source code, so it isnt very objective. However I believe there are 2 things which help them: - They are able to harness larger amounts of RAM, so they are really just testing that vs HBase - There have been substantial performance improvements in HBase since the version

Hypertable claiming upto >900% random-read throughput vs HBase

2010-12-15 Thread Gaurav Sharma
Folks, my apologies if this has been discussed here before but can someone please shed some light on how Hypertable is claiming upto a 900% higher throughput on random reads and upto a 1000% on sequential reads in their performance evaluation vs HBase (modeled after the perf-eval test in section 7

Re: Review Request: Allow Observers to completely override base function

2010-12-15 Thread Ted Dunning
Indeed, if you look at the implementation of AtomicInteger, it is mostly just a volatile int. On Tue, Dec 14, 2010 at 6:36 PM, Ryan Rawson wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/CoprocessorHost.java > > >unless you need CA

Build failed in Hudson: hbase-0.90 #29

2010-12-15 Thread Apache Hudson Server
See Changes: [jdcryans] HBASE-3358 Recovered replication queue wait on themselves when terminating HBASE-3359 LogRoller not added as a WAL listener when replication is enabled -- [...trunca

Re: Review Request: Allow Observers to completely override base function

2010-12-15 Thread Andrew Purtell
--- This is an automatically generated e-mail. To reply, visit: http://review.cloudera.org/r/1295/#review2066 --- src/main/java/org/apache/hadoop/hbase/coprocessor/BaseRegionObserverC