Re: Indexed Hashtables

2009-01-16 Thread Renaud Delbru
Tokyo Cabinet ? http://tokyocabinet.sourceforge.net/index.html -- Renaud Delbru Delip Rao wrote: Hi, I need to lookup a large number of key/value pairs in my map(). Is there any indexed hashtable available as a part of Hadoop I/O API? I find Hbase an overkill for my application; something on

Re: Indexed Hashtables

2009-01-16 Thread Delip Rao
Thanks everyone for the suggestions! I tried all options so far except Voldemort (Steve) and here's my evaluation: memcached (Sean) -- works very fast. Good option if used along with an existing slow index. MapFile (Peter) -- excellent option that is a part of Hadoop but works very slow for large

Re: Indexed Hashtables

2009-01-15 Thread Jim Twensky
Delip, Why do you think Hbase will be an overkill? I do something similar to what you're trying to do with Hbase and I haven't encountered any significant problems so far. Can you give some more info on the size of the data you have? Jim On Wed, Jan 14, 2009 at 8:47 PM, Delip Rao wrote: > Hi,

Re: Indexed Hashtables

2009-01-15 Thread pr-hadoop
Delip, what about Hadoop MapFile? http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/io/MapFile.html Regards, Peter

Re: Indexed Hashtables

2009-01-15 Thread Steve Loughran
Sean Shanny wrote: Delip, So far we have had pretty good luck with memcached. We are building a hadoop based solution for data warehouse ETL on XML based log files that represent click stream data on steroids. We process about 34 million records or about 70 GB data a day. We have to proce

Re: Indexed Hashtables

2009-01-14 Thread Sean Shanny
Delip, So far we have had pretty good luck with memcached. We are building a hadoop based solution for data warehouse ETL on XML based log files that represent click stream data on steroids. We process about 34 million records or about 70 GB data a day. We have to process dimensional da

Indexed Hashtables

2009-01-14 Thread Delip Rao
Hi, I need to lookup a large number of key/value pairs in my map(). Is there any indexed hashtable available as a part of Hadoop I/O API? I find Hbase an overkill for my application; something on the lines of HashStore (www.cellspark.com/hashstore.html) should be fine. Thanks, Delip