Regarding HDFS-347, I believe the following to be true:
- The bastard option, i.e. Ryan's patch against 0.20 that just does local
reads via File, does lower latency enough to make a difference in HBase random
read latencies as measured. I forget the magnitude of the difference offhand
but
HDFS-941
The trunk has moved on so the patch won't apply. There has been significant
changes in HDFS lately, so it will require more than simple rebase/merge. If
the original assignee is busy, I am willing to help.
HDFS-347
The analysis is pointing out that local socket communication is
A competing project is out with intranode security
We are running secure HBase in production at Trend Micro now. And secure
ZooKeeper. This is full integration with Secure HDFS and MapReduce (including
auth tokens for MR), secure RPC, and policy enforcement of table/column ACLs
implemented as
I completely agree with Ryan. Most of the measurements in HDFS-347 are point
comparisions data rate over socket, single-threaded sequential read from
datanode, single-threaded random read form datanode, etc. These measurements
are good, but when you run the entire Hbase system at load, you
Thanks everybody for commenting on this thread.
We'd certainly like to lobby for movement on these two tickets, and although we
don't have anybody that is familiar with the source code we'd be happy to
perform some tests get some performance numbers.
Per Kihwal's comments, it sounds like
On Fri, Jun 3, 2011 at 12:50 PM, Doug Meil
doug.m...@explorysmedical.com wrote:
Thanks everybody for commenting on this thread.
We'd certainly like to lobby for movement on these two tickets, and although
we don't have anybody that is familiar with the source code we'd be happy to
perform
I have patches for HDFS-347 and HDFS-941 (and HDFS-918) for CDH3U0.
- Andy
From: Doug Meil doug.m...@explorysmedical.com
Subject: RE: HDFS-1599 status? (HDFS tickets to improve HBase)
To: dev@hbase.apache.org dev@hbase.apache.org
Date: Friday, June 3, 2011, 12:50 PM
Thanks everybody for
On Fri, Jun 3, 2011 at 3:38 PM, Andrew Purtell apurt...@apache.org wrote:
I have patches for HDFS-347 and HDFS-941 (and HDFS-918) for CDH3U0.
Does your 347 patch do security? or just the one where it sneaks around back?
Have you tested the others under real load for a couple days?
- Andy
Yes, and though I have patches, and I'm happy to provide them if you want...
Indeed, 347 doesn't do security or checksums so needs work to say the least. We
use it with HBase given a privileged role such that it shares group-readable
DFS data directories with the DataNodes. It works for us,
From: Todd Lipcon t...@cloudera.com
I have patches for HDFS-347 and HDFS-941 (and
HDFS-918) for CDH3U0.
Does your 347 patch do security? or just the one where it
sneaks around back?
Have you tested the others under real load for a couple
days?
We use the sneaky 347 and, sure, it's a
I think one'd need to checksum only once on the first file system
instantiation, or first access of the file? As mentioned in
HDFS-2004, HBase's usage of HDFS is outside of the initial design
motivation. Eg, the rules may need to be bent in order to enable
performant use of HBase with HDFS. The
An hdfs-347 that checksums is over in a the hadoop branch that fb
published over on github (Dhruba and Jon pointed me at it); i've been
meaning to put the patch up in the hdfs-347 issue.
St.Ack
On Fri, Jun 3, 2011 at 4:42 PM, Jason Rutherglen
jason.rutherg...@gmail.com wrote:
I think one'd
Not to be too mean and discouraging to everyone passing around patches
against CDH3 and/or 0.20-append, but just an FYI: there is no chance
that these things will get committed to an 0.20 branch without first
going through trunk. Sharing patches and testing them on real
workloads in 20 is a nice
I'm looking for a sample data set to benchmark the Lucene FST,
specifically the keys. I'm guessing a common key type for HBase users
is timestamp? Perhaps simply creating timestamps for 10's of millions
of keys would be a reasonable benchmark? Though synthetic it's also
easy to adjust (eg,
Thanks for the feedback Stack. Some inline responses:
On Thu, Jun 2, 2011 at 9:48 PM, Stack st...@duboce.net wrote:
High-level this sounds like a great.
Inline below is some feedback and a bit of history on how we got here
in case it helps:
On Thu, Jun 2, 2011 at 3:28 PM, Matt Corgan
Also the next thing to measure with the FST is the key lookup speed.
I'm not sure what that'd look like, or how to compare with HBase right
now?
On Fri, Jun 3, 2011 at 8:42 PM, Jason Rutherglen
jason.rutherg...@gmail.com wrote:
Here's a nice preliminary number with the FST, 50 million dates of
Jason - are you feeding it that whole string for each date? Input data is
17 bytes per record * 50mm records = 850MB, and that reduces to 984 bytes?
Is it possible to compress by that much? Maybe I'm missing something about
how the FST works.
Matt
On Fri, Jun 3, 2011 at 8:51 PM, Jason
Ah - I see. It's generating multiple duplicate timestamps per millisecond,
so there are fewer than 50mm unique strings. Duplicates just require
incrementing a counter. Agree it's very cool though!
sent from my phone
On Jun 3, 2011 9:02 PM, Jason Rutherglen jason.rutherg...@gmail.com
wrote:
That can't be true? (smile) How would you search a 'key' in the FST?
St.Ack
On Fri, Jun 3, 2011 at 9:01 PM, Jason Rutherglen
jason.rutherg...@gmail.com wrote:
Yeah it's truly super wild! Here's the code: http://pastebin.com/bnB53UQz
You can see the line that's adding the string:
19 matches
Mail list logo