I mentioned a bunch of stuff in that prefix compression email about cache
lines, prefetching, trie node sizes, etc... The gist of it all is that
memory has become relatively slow to the point where you need to start
thinking of it in similar ways as we think of disk/network.
I dug up and cleaned
> Oh BTW, you can't mmap anything in HBase unless you copy it to local
> disk first. HDFS => no mmap.
Right. I know that! Once the block index is pluggable, the FST would
be an in heap byte[].
On Sat, Jun 4, 2011 at 3:49 PM, Ryan Rawson wrote:
> Oh BTW, you can't mmap anything in HBase unless
Oh BTW, you can't mmap anything in HBase unless you copy it to local
disk first. HDFS => no mmap.
just thought you'd like to know.
On Sat, Jun 4, 2011 at 3:41 PM, Jason Rutherglen
wrote:
>> It can be hard to know you have all the corner cases down and you
>> won't find out in 6 months that ever
> It can be hard to know you have all the corner cases down and you
> won't find out in 6 months that every single piece of data you have
> put in HBase is corrupt. Keeping it simple is one strategy.
Isn't the block index separate from the actual data? So corruption in
that case is unlikely.
>
Also, dont break it :-)
Part of the goal of HFile was to build something quick and reliable.
It can be hard to know you have all the corner cases down and you
won't find out in 6 months that every single piece of data you have
put in HBase is corrupt. Keeping it simple is one strategy.
I have pr
> You'd have to change how the Scanner code works, etc. You'll find out.
Nice! Sounds fun.
On Sat, Jun 4, 2011 at 3:27 PM, Ryan Rawson wrote:
> What are the specs/goals of a pluggable block index? Right now the
> block index is fairly tied deep in how HFile works. You'd have to
> change how t
What are the specs/goals of a pluggable block index? Right now the
block index is fairly tied deep in how HFile works. You'd have to
change how the Scanner code works, etc. You'll find out.
On Sat, Jun 4, 2011 at 3:17 PM, Stack wrote:
> I do not know of one. FYI hfile is pretty standalone re
I do not know of one. FYI hfile is pretty standalone regards tests etc. There
is even a perf testing class for hfile
On Jun 4, 2011, at 14:44, Jason Rutherglen wrote:
> I want to take a wh/hack at creating a pluggable block index, is there
> an open issue for this? I looked and couldn't fi
I want to take a wh/hack at creating a pluggable block index, is there
an open issue for this? I looked and couldn't find one.
On Fri, Jun 3, 2011 at 7:03 PM, Matt Corgan wrote:
>> Pluggable formats would help here so you could tune for mem vs cpu.
More history. At the time of KV and hfile incubation, we thought
about making these building blocks pluggable but it was thought that
there would be a performance cost doing
Here's some more data for the 10 mil dates:
68.1 MB random increment up to 1000
87.1 MB random increment up to 10,000
162.1 MB total not using the FST
On Fri, Jun 3, 2011 at 10:57 PM, Stack wrote:
> That can't be true? (smile) How would you search a 'key' in the FST?
> St.Ack
>
> On Fri, Jun 3
> From: Todd Lipcon
> Not to be too mean and discouraging to everyone passing around patches
> against CDH3 and/or 0.20-append, but just an FYI: there is no chance
> that these things will get committed to an 0.20 branch without first
> going through trunk. Sharing patches and testing them on real
I varied the ms increment randomly between 1-20, then created 10 mil
dates. The FST was then 58,481,582 bytes, eg, 57 MB. Guess it's not
perfect! 19,739,994 bytes, eg, 18.8 MB for random 1-5 increments. I
think that's still pretty good. I need to try varying the long value
stored alongside to
13 matches
Mail list logo