CPU cache effectiveness

2011-06-04 Thread Matt Corgan
I mentioned a bunch of stuff in that prefix compression email about cache lines, prefetching, trie node sizes, etc... The gist of it all is that memory has become relatively slow to the point where you need to start thinking of it in similar ways as we think of disk/network. I dug up and cleaned

Re: Pluggable block index

2011-06-04 Thread Jason Rutherglen
> Oh BTW, you can't mmap anything in HBase unless you copy it to local > disk first. HDFS => no mmap. Right. I know that! Once the block index is pluggable, the FST would be an in heap byte[]. On Sat, Jun 4, 2011 at 3:49 PM, Ryan Rawson wrote: > Oh BTW, you can't mmap anything in HBase unless

Re: Pluggable block index

2011-06-04 Thread Ryan Rawson
Oh BTW, you can't mmap anything in HBase unless you copy it to local disk first. HDFS => no mmap. just thought you'd like to know. On Sat, Jun 4, 2011 at 3:41 PM, Jason Rutherglen wrote: >> It can be hard to know you have all the corner cases down and you >> won't find out in 6 months that ever

Re: Pluggable block index

2011-06-04 Thread Jason Rutherglen
> It can be hard to know you have all the corner cases down and you > won't find out in 6 months that every single piece of data you have > put in HBase is corrupt. Keeping it simple is one strategy. Isn't the block index separate from the actual data? So corruption in that case is unlikely. >

Re: Pluggable block index

2011-06-04 Thread Ryan Rawson
Also, dont break it :-) Part of the goal of HFile was to build something quick and reliable. It can be hard to know you have all the corner cases down and you won't find out in 6 months that every single piece of data you have put in HBase is corrupt. Keeping it simple is one strategy. I have pr

Re: Pluggable block index

2011-06-04 Thread Jason Rutherglen
> You'd have to change how the Scanner code works, etc. You'll find out. Nice! Sounds fun. On Sat, Jun 4, 2011 at 3:27 PM, Ryan Rawson wrote: > What are the specs/goals of a pluggable block index?  Right now the > block index is fairly tied deep in how HFile works. You'd have to > change how t

Re: Pluggable block index

2011-06-04 Thread Ryan Rawson
What are the specs/goals of a pluggable block index? Right now the block index is fairly tied deep in how HFile works. You'd have to change how the Scanner code works, etc. You'll find out. On Sat, Jun 4, 2011 at 3:17 PM, Stack wrote: > I do not know of one.  FYI hfile is pretty standalone re

Re: Pluggable block index

2011-06-04 Thread Stack
I do not know of one. FYI hfile is pretty standalone regards tests etc. There is even a perf testing class for hfile On Jun 4, 2011, at 14:44, Jason Rutherglen wrote: > I want to take a wh/hack at creating a pluggable block index, is there > an open issue for this? I looked and couldn't fi

Pluggable block index

2011-06-04 Thread Jason Rutherglen
I want to take a wh/hack at creating a pluggable block index, is there an open issue for this? I looked and couldn't find one.

Re: prefix compression

2011-06-04 Thread Stack
On Fri, Jun 3, 2011 at 7:03 PM, Matt Corgan wrote: >> Pluggable formats would help here so you could tune for mem vs cpu. More history. At the time of KV and hfile incubation, we thought about making these building blocks pluggable but it was thought that there would be a performance cost doing

Re: prefix compression

2011-06-04 Thread Jason Rutherglen
Here's some more data for the 10 mil dates: 68.1 MB random increment up to 1000 87.1 MB random increment up to 10,000 162.1 MB total not using the FST On Fri, Jun 3, 2011 at 10:57 PM, Stack wrote: > That can't be true?  (smile)  How would you search a 'key' in the FST? > St.Ack > > On Fri, Jun 3

Re: HDFS-1599 status? (HDFS tickets to improve HBase)

2011-06-04 Thread Andrew Purtell
> From: Todd Lipcon > Not to be too mean and discouraging to everyone passing around patches > against CDH3 and/or 0.20-append, but just an FYI: there is no chance > that these things will get committed to an 0.20 branch without first > going through trunk. Sharing patches and testing them on real

Re: prefix compression

2011-06-04 Thread Jason Rutherglen
I varied the ms increment randomly between 1-20, then created 10 mil dates. The FST was then 58,481,582 bytes, eg, 57 MB. Guess it's not perfect! 19,739,994 bytes, eg, 18.8 MB for random 1-5 increments. I think that's still pretty good. I need to try varying the long value stored alongside to