Hi Lars, Yeah, maybe. Somewhere in the back of my head was a completely fuzzy idea that if one were to sneak in Lucene at that low level one could get that full-text search over HBase data that comes up periodically. Also, I was thinking, having Lucene down there could make it possible to get ad-hoc reports on data in HBase and one wouldn't have to figure out the key structure ahead of time.
But I think Jacques makes a good point - there are already ElasticSearch and Solr. They are full-text search engines, but people also use them for pure boolean matching, as key value stores, etc. Otis -- Search Analytics - http://sematext.com/search-analytics/index.html Performance Monitoring - http://sematext.com/spm/index.html On Fri, Oct 5, 2012 at 5:11 AM, Lars George <[email protected]> wrote: > Hi Otis, > > My initial reaction was, "interesting idea". On second thoughts though I do > not see how this makes more sense compared to what we have now. HFiles > combined with Bloom filters are fast to look up anyways. Adding Lucene as > another "Storage Engine" (getting us close to Voldemort or MySQL with > replaceable storage backends) does seem to not add any value, and more so, > might even have a few drawbacks. Especially range scans will suffer, as > HFiles and their block oriented layout plus caching makes for really fast > I/O. Lucene is for search, not xyzbytes of data transfers. And simply > replacing the block index and Blooms with Lucene is also I think overkill. > Just saying. > > Lars > > On Oct 5, 2012, at 5:34 AM, Otis Gospodnetic <[email protected]> > wrote: > >> Hi, >> >> Has anyone attempted using Lucene instead of HFiles (see >> https://twitter.com/otisg/status/254047978174701568 )? >> >> Is that a completely crazy, bad, would-never-work, >> don't-bother-trying-this-at-home, it's-too-late-go-to-sleep idea? Or >> not? >> >> Thanks, >> Otis >> -- >> Search Analytics - http://sematext.com/search-analytics/index.html >> Performance Monitoring - http://sematext.com/spm/index.html >
