Hi, I know there are a lot of attempts to make lucene searches distributed but I haven't seen one that tries to implement a lucene Directory in HBase/ Hadoop, except one discussion in this article[1]. I've worked with HBase and I believe this is a good approach to combine the two.
The thing with this concept is that you could very easily build a distributed search by running multiple search slaves that could each search a part of the index and then aggregate the results. If you dig deep enough you could make those searches take advantage of data locality (run searches on the node/region server that has your index data) and then you really are in business. Also, a HBase/Hadoop solution is also possible: store some data in HBase and bigger parts directly in Hadoop inside a file structure to overcome HDFS small file issues. This could allow HBase queries to perform better but will complicate the design a bit. I'm interested in hearing your opinions on this and I also wish to propose this as GSoC idea that I'm interested in implementing. [1] http://www.infoq.com/articles/LuceneHbase -- Ioan Eugen Stan http://ieugen.blogspot.com/ --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org