I didn't get a chance to investigate thoroughly or get any benchmarks. We
were looking for an alternate indexing
strategy to B+ trees (with JDBM2) since we knew the keys a priori, but
looking at the source I was a bit daunted at
porting it and the license wasn't something that we could use.
I sta
This looks great. Actually, more than BDZ, the intriguing part is CHM as
it's order preserving.
I guess how it behaves for unseen keys. Do you know about it?
What did you find more intriguing on this topic? :)
On 7/19/11 3:02 AM, Casey Stella wrote:
> I looked into MPH a while ago and came acros
I looked into MPH a while ago and came across Sebastiano's work, but was
even more intrigued by CMPH (http://cmph.sourceforge.net/),
which claims to work on the order of a billion keys. I attempted a java
port of BDZ (acyclic random 3-graphs FTW :) at one point, but gave up as I
found something
el
On Mon, Jul 18, 2011 at 9:22 AM, Claudio Martella
wrote:
> Yes, I had a look at it a while ago. For what I know perfect hashing
> doesn't work that good for many elements. With millions of items it
> should be computationally expensive and the probability of finding such
> a perfect hashing. Did y
On 7/18/11 6:05 PM, Stack wrote:
> On Mon, Jul 18, 2011 at 4:04 AM, Claudio Martella
> wrote:
>> No, you can have collisions, so the index is not perfect (which means
>> you can have buckets for colliding keys and empty unused entries in the
>> hashtable directory).
> Well, if a perfect index is w
On Mon, Jul 18, 2011 at 4:04 AM, Claudio Martella
wrote:
> No, you can have collisions, so the index is not perfect (which means
> you can have buckets for colliding keys and empty unused entries in the
> hashtable directory).
Well, if a perfect index is what you are after, you can generate
hashi
On 7/16/11 10:08 PM, Stack wrote:
> On Fri, Jul 15, 2011 at 10:06 AM, Claudio Martella
> wrote:
>> On 7/15/11 6:24 PM, Stack wrote:
>>> How do you figure the N in the below Claudio?
>> N is the total amount of pairs in the sequence file. You know that when
>> you finish flushing a memstore or comp
out there on Git Hub so you may want to
check them out.
HTH
-Mike
Date: Fri, 15 Jul 2011 14:32:50 +0200
From: claudio.marte...@tis.bz.it
To: user@hbase.apache.org
Subject: Hash indexing of HFiles
Hello list,
at SIGMOD this year i've seen a spreading of different storage files for
HBase, wit
On Fri, Jul 15, 2011 at 10:06 AM, Claudio Martella
wrote:
> On 7/15/11 6:24 PM, Stack wrote:
>> How do you figure the N in the below Claudio?
> N is the total amount of pairs in the sequence file. You know that when
> you finish flushing a memstore or compacting files.
So a perfect index? If thi
gt;> There are a couple of other projects out there on Git Hub so you may want to
>> check them out.
>>
>> HTH
>>
>> -Mike
>>
>>
>>> Date: Fri, 15 Jul 2011 14:32:50 +0200
>>> From: claudio.marte...@tis.bz.it
>>> To: user
a couple of other projects out there on Git Hub so you may want to
check them out.
HTH
-Mike
Date: Fri, 15 Jul 2011 14:32:50 +0200
From: claudio.marte...@tis.bz.it
To: user@hbase.apache.org
Subject: Hash indexing of HFiles
Hello list,
at SIGMOD this year i've seen a spreading of diff
mness without having to try to build a separate index.
>>> But we're still using the base key for the row. Its not like we're creating
>>> a secondary index on a column value.
>>>
>>> There are a couple of other projects out there on Git Hub so you may want
on a column value.
>>
>> There are a couple of other projects out there on Git Hub so you may want to
>> check them out.
>>
>> HTH
>>
>> -Mike
>>
>>
>>> Date: Fri, 15 Jul 2011 14:32:50 +0200
>>> From: claudio.marte.
>
> There are a couple of other projects out there on Git Hub so you may want to
> check them out.
>
> HTH
>
> -Mike
>
>
>> Date: Fri, 15 Jul 2011 14:32:50 +0200
>> From: claudio.marte...@tis.bz.it
>> To: user@hbase.apache.org
>> Subject: Hash indexing
l 2011 14:32:50 +0200
> From: claudio.marte...@tis.bz.it
> To: user@hbase.apache.org
> Subject: Hash indexing of HFiles
>
> Hello list,
>
> at SIGMOD this year i've seen a spreading of different storage files for
> HBase, with different techniques. My scenario and usage does
Hello list,
at SIGMOD this year i've seen a spreading of different storage files for
HBase, with different techniques. My scenario and usage doesn't really
require range queries, so I thought I'd take advantage of even faster
random i/o from hash indexing of data in each sequence file.
Does anybo
16 matches
Mail list logo