[jira] Updated: (LUCENE-1799) Unicode compression

2010-07-28 Thread Yonik Seeley (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yonik Seeley updated LUCENE-1799: - Attachment: Benchmark.java OK, hopefully the right Benchmark.java this time ;-) > Unicode compr

[jira] Updated: (LUCENE-1799) Unicode compression

2010-07-28 Thread Yonik Seeley (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yonik Seeley updated LUCENE-1799: - Attachment: Benchmark.java > Unicode compression > --- > > Key:

[jira] Updated: (LUCENE-1799) Unicode compression

2010-07-28 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-1799: Attachment: Benchmark.java attached is my benchmark for english text. UTF-8: 15530ms BOCU-1: 1568

[jira] Updated: (LUCENE-1799) Unicode compression

2010-07-28 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-1799: Attachment: LUCENE-1799.patch here it is with first stab at decoder (its correct against random ic

[jira] Updated: (LUCENE-1799) Unicode compression

2010-07-28 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-1799: Attachment: LUCENE-1799.patch oops, forgot a check in the surrogate case. > Unicode compression >

[jira] Updated: (LUCENE-1799) Unicode compression

2010-07-28 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-1799: Attachment: LUCENE-1799.patch i optimized the surrogate case here, moving it into the 'prev' calcu

[jira] Updated: (LUCENE-1799) Unicode compression

2010-07-27 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-1799: Attachment: LUCENE-1799.patch removed some ifs for the positive unrolled cases. > Unicode compres

[jira] Updated: (LUCENE-1799) Unicode compression

2010-07-27 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-1799: --- Attachment: LUCENE-1799.patch Inlines/unwinds the 3-byte cases. I think we can leav

[jira] Updated: (LUCENE-1799) Unicode compression

2010-07-27 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-1799: --- Attachment: LUCENE-1799.patch Just inlines the 2-byte diff case. > Unicode compress

[jira] Updated: (LUCENE-1799) Unicode compression

2010-07-27 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-1799: --- Attachment: LUCENE-1799.patch Duh -- that was some ancient wrong patch. This one sh

[jira] Updated: (LUCENE-1799) Unicode compression

2010-07-27 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-1799: --- Attachment: LUCENE-1779.patch Slightly more optimized version of BOCU1 encode (but i

[jira] Updated: (LUCENE-1799) Unicode compression

2010-07-27 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-1799: Attachment: LUCENE-1799.patch attached is a patch for the start of a "BOCUUtil' with unicodeutil l

[jira] Updated: (LUCENE-1799) Unicode compression

2010-07-22 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-1799: Attachment: LUCENE-1799_big.patch attached is a really really rough patch that sets bocu-1 as the

[jira] Updated: (LUCENE-1799) Unicode compression

2010-07-21 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-1799: -- Attachment: LUCENE-1799.patch A new patch that completely separates the BOCU factory from the

[jira] Updated: (LUCENE-1799) Unicode compression

2010-07-21 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-1799: -- Attachment: LUCENE-1799.patch Here is a 100% legally valid implementation: - Linking to icu4j

[jira] Updated: (LUCENE-1799) Unicode compression

2010-07-20 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-1799: -- Attachment: LUCENE-1799.patch > Unicode compression > --- > >

[jira] Updated: (LUCENE-1799) Unicode compression

2010-07-20 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-1799: -- Attachment: (was: LUCENE-1799.patch) > Unicode compression > --- > >

[jira] Updated: (LUCENE-1799) Unicode compression

2010-07-20 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-1799: -- Attachment: (was: LUCENE-1799.patch) > Unicode compression > --- > >

[jira] Updated: (LUCENE-1799) Unicode compression

2010-07-20 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-1799: -- Attachment: LUCENE-1799.patch The last one that could be used with any charset > Unicode comp

[jira] Updated: (LUCENE-1799) Unicode compression

2010-07-20 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-1799: -- Attachment: LUCENE-1799.patch > Unicode compression > --- > >

[jira] Updated: (LUCENE-1799) Unicode compression

2010-07-20 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-1799: -- Attachment: LUCENE-1799.patch Here a heavy reusing variant. > Unicode compression > -

[jira] Updated: (LUCENE-1799) Unicode compression

2010-07-20 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-1799: -- Attachment: LUCENE-1799.patch One more violation. Now its correct! > Unicode compression > --

[jira] Updated: (LUCENE-1799) Unicode compression

2010-07-20 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-1799: -- Attachment: (was: LUCENE-1799.patch) > Unicode compression > --- > >

[jira] Updated: (LUCENE-1799) Unicode compression

2010-07-20 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-1799: -- Attachment: LUCENE-1799.patch Here the policed one :-) In my opinion something is better than

[jira] Updated: (LUCENE-1799) Unicode compression

2010-07-20 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-1799: Attachment: LUCENE-1799.patch attached is a simple prototype for encoding terms as BOCU-1 So whil