Hi,
AFAIU scaling fulltext search is usually done by processing partitions of
posting lists concurrently. That is essentially what you get with sharded
solr/katta/elasticsearch. I wonder how you would map things to HBase so that
this would be possible. HBase scales on the row key, so if you use
Brian,
Thanks for the response.
solr/katta/elasticsearch
These don't have a distributed solution for realtime search [yet].
Eg, a transaction log is required, and a place to store the versioned
documents, sounds a lot like HBase? The technique of query
sharding/partitioning is fairly trivial,
Thank you all for the great insight. Based on your thoughts I am going to try a
hybrid approach - that is split children into buckets based on id range and
store a bucket per row.
The row key then would be parent-id:bucket-id where bucket-id=child-id/n, and n
- bucket size chosen specifically
So in giving this a day of breathing room, it looks like HBase loads
values as it's scanning a column? I think that'd be a killer to some
Lucene queries, eg, we'd be loading entire/part-of posting lists just
for a linear scan of the terms dict? Or we'd probably instead want to
place the posting
Hi ,
We are going to port our production environment to 0.90 and I have couple
of questions:
1) We are using HTablePool which returned HTable in version 0.20.3 , but now
it returns HTableInterface
In our code we usedHTable class methods :
getStartEndKeys();
setAutoFlush(false);
I really think that putting update semantics into Katta would be much
easier.
Building the write-ahead log for the lucene case isn't all that hard. If
you follow the Zookeeper model of having a WAL thread that writes batches of
log entries you can get pretty high speed as well. The basic idea
Great questions.
For #2, I think hadoop append feature is for durability.
From master log, you would see:
2011-02-11 00:34:09,494 INFO
org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter: Using syncFs
-- HDFS-200
2011-02-11 00:34:09,495 INFO
Let me be clear about the amount of testing I did: extremely little. I
should also point out that at first I did not appreciate fully the meaning
of you earlier comment to Vijay saying this is a little off --- I now
realize you were in fact saying that Vijay told me to do things backward.
If you are taking the jar that we ship and slamming it in a hadoop
0.20.2 based distro that might work. I'm not sure if there are any
differences than pure code (which would then be expressed in the jar
only), so this approach might work.
You could also check out to the revision that we built