Hi Renaud, On Fri, Oct 5, 2012 at 4:48 AM, Renaud Delbru <[email protected]> wrote: > Hi, > > With respect to point 3, I know there is a new codec in Lucene 4.0 for > append-only filesystem such as hdfs (LUCENE-2373)
Yeah. Though I think nobody wants to search indices directly in HDFS for performance reasons. > Also, it would also depend on the use case. At the moment, for storing data, > I would expect HFile to be much more efficient in term of compression than > Lucene file system (in fact, there is no real comnpression, apart by > compressing yourself the field byte stream before storing it). There is some > work to try to make Lucene more efficient for small and medium sized fields > (LUCENE-4226 - block-style compression and storing), but I think HFile is > far more optimised for this task. I wouldn't know... though I was under the impression there has been other work around packing things tightly both on disk and in memory. Check http://www.slideshare.net/lucenerevolution/willnauer-simon-doc-values-column-stride-fields-in-lucene ... slide 16, etc. > In fact, another interesting idea would be to investigate the use of HFile > as a StoredFieldFormat in Lucene. Efficient storage of data in Lucene is > imho quite a missing feature. Otis -- Search Analytics - http://sematext.com/search-analytics/index.html Performance Monitoring - http://sematext.com/spm/index.html > On 05/10/12 07:36, Adrien Mogenet wrote: >> >> "Don't bother trying this in production" ;-) >> >> 1. Are you sure lookup by key are faster ? >> 2. Updating Lucene files in a lock-free maneer and ensuring good >> concurrency can be a bit tricky >> 3. AFAIK, Lucene files don't fit in HDFS and thus another distributed >> storage is required. Katta does not look as powerful as Hadoop. >> >> On Fri, Oct 5, 2012 at 5:34 AM, Otis Gospodnetic >> <[email protected]> wrote: >>> >>> Hi, >>> >>> Has anyone attempted using Lucene instead of HFiles (see >>> https://twitter.com/otisg/status/254047978174701568 )? >>> >>> Is that a completely crazy, bad, would-never-work, >>> don't-bother-trying-this-at-home, it's-too-late-go-to-sleep idea? Or >>> not? >>> >>> Thanks, >>> Otis >>> -- >>> Search Analytics - http://sematext.com/search-analytics/index.html >>> Performance Monitoring - http://sematext.com/spm/index.html >> >> >> >> >
