Re: Lucene instead of HFiles?

Renaud Delbru Fri, 05 Oct 2012 01:49:42 -0700

Hi,

With respect to point 3, I know there is a new codec in Lucene 4.0 forappend-only filesystem such as hdfs (LUCENE-2373)

Also, it would also depend on the use case. At the moment, for storingdata, I would expect HFile to be much more efficient in term ofcompression than Lucene file system (in fact, there is no realcomnpression, apart by compressing yourself the field byte stream beforestoring it). There is some work to try to make Lucene more efficient forsmall and medium sized fields (LUCENE-4226 - block-style compression andstoring), but I think HFile is far more optimised for this task.In fact, another interesting idea would be to investigate the use ofHFile as a StoredFieldFormat in Lucene. Efficient storage of data inLucene is imho quite a missing feature.


my2c
Regards
--
Renaud Delbru

On 05/10/12 07:36, Adrien Mogenet wrote:

"Don't bother trying this in production" ;-)

1. Are you sure lookup by key are faster ?
2. Updating Lucene files in a lock-free maneer and ensuring good
concurrency can be a bit tricky
3. AFAIK, Lucene files don't fit in HDFS and thus another distributed
storage is required. Katta does not look as powerful as Hadoop.

On Fri, Oct 5, 2012 at 5:34 AM, Otis Gospodnetic
<[email protected]> wrote:

Hi,

Has anyone attempted using Lucene instead of HFiles (see
https://twitter.com/otisg/status/254047978174701568 )?

Is that a completely crazy, bad, would-never-work,
don't-bother-trying-this-at-home, it's-too-late-go-to-sleep idea? Or
not?

Thanks,
Otis
--
Search Analytics - http://sematext.com/search-analytics/index.html
Performance Monitoring - http://sematext.com/spm/index.html

Re: Lucene instead of HFiles?

Reply via email to