[
https://issues.apache.org/jira/browse/LUCENE-1799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Yonik Seeley updated LUCENE-1799:
-
Attachment: Benchmark.java
OK, hopefully the right Benchmark.java this time ;-)
> Unicode compr
[
https://issues.apache.org/jira/browse/LUCENE-1799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Yonik Seeley updated LUCENE-1799:
-
Attachment: Benchmark.java
> Unicode compression
> ---
>
> Key:
[
https://issues.apache.org/jira/browse/LUCENE-1799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Robert Muir updated LUCENE-1799:
Attachment: Benchmark.java
attached is my benchmark for english text.
UTF-8: 15530ms
BOCU-1: 1568
[
https://issues.apache.org/jira/browse/LUCENE-1799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Robert Muir updated LUCENE-1799:
Attachment: LUCENE-1799.patch
here it is with first stab at decoder (its correct against random ic
[
https://issues.apache.org/jira/browse/LUCENE-1799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Robert Muir updated LUCENE-1799:
Attachment: LUCENE-1799.patch
oops, forgot a check in the surrogate case.
> Unicode compression
>
[
https://issues.apache.org/jira/browse/LUCENE-1799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Robert Muir updated LUCENE-1799:
Attachment: LUCENE-1799.patch
i optimized the surrogate case here, moving it into the 'prev' calcu
[
https://issues.apache.org/jira/browse/LUCENE-1799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Robert Muir updated LUCENE-1799:
Attachment: LUCENE-1799.patch
removed some ifs for the positive unrolled cases.
> Unicode compres
[
https://issues.apache.org/jira/browse/LUCENE-1799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael McCandless updated LUCENE-1799:
---
Attachment: LUCENE-1799.patch
Inlines/unwinds the 3-byte cases. I think we can leav
[
https://issues.apache.org/jira/browse/LUCENE-1799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael McCandless updated LUCENE-1799:
---
Attachment: LUCENE-1799.patch
Just inlines the 2-byte diff case.
> Unicode compress
[
https://issues.apache.org/jira/browse/LUCENE-1799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael McCandless updated LUCENE-1799:
---
Attachment: LUCENE-1799.patch
Duh -- that was some ancient wrong patch. This one sh
[
https://issues.apache.org/jira/browse/LUCENE-1799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael McCandless updated LUCENE-1799:
---
Attachment: LUCENE-1779.patch
Slightly more optimized version of BOCU1 encode (but i
[
https://issues.apache.org/jira/browse/LUCENE-1799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Robert Muir updated LUCENE-1799:
Attachment: LUCENE-1799.patch
attached is a patch for the start of a "BOCUUtil' with unicodeutil l
[
https://issues.apache.org/jira/browse/LUCENE-1799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Robert Muir updated LUCENE-1799:
Attachment: LUCENE-1799_big.patch
attached is a really really rough patch that sets bocu-1 as the
[
https://issues.apache.org/jira/browse/LUCENE-1799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Uwe Schindler updated LUCENE-1799:
--
Attachment: LUCENE-1799.patch
A new patch that completely separates the BOCU factory from the
[
https://issues.apache.org/jira/browse/LUCENE-1799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Uwe Schindler updated LUCENE-1799:
--
Attachment: LUCENE-1799.patch
Here is a 100% legally valid implementation:
- Linking to icu4j
[
https://issues.apache.org/jira/browse/LUCENE-1799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Uwe Schindler updated LUCENE-1799:
--
Attachment: LUCENE-1799.patch
> Unicode compression
> ---
>
>
[
https://issues.apache.org/jira/browse/LUCENE-1799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Uwe Schindler updated LUCENE-1799:
--
Attachment: (was: LUCENE-1799.patch)
> Unicode compression
> ---
>
>
[
https://issues.apache.org/jira/browse/LUCENE-1799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Uwe Schindler updated LUCENE-1799:
--
Attachment: (was: LUCENE-1799.patch)
> Unicode compression
> ---
>
>
[
https://issues.apache.org/jira/browse/LUCENE-1799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Uwe Schindler updated LUCENE-1799:
--
Attachment: LUCENE-1799.patch
The last one that could be used with any charset
> Unicode comp
[
https://issues.apache.org/jira/browse/LUCENE-1799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Uwe Schindler updated LUCENE-1799:
--
Attachment: LUCENE-1799.patch
> Unicode compression
> ---
>
>
[
https://issues.apache.org/jira/browse/LUCENE-1799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Uwe Schindler updated LUCENE-1799:
--
Attachment: LUCENE-1799.patch
Here a heavy reusing variant.
> Unicode compression
> -
[
https://issues.apache.org/jira/browse/LUCENE-1799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Uwe Schindler updated LUCENE-1799:
--
Attachment: LUCENE-1799.patch
One more violation. Now its correct!
> Unicode compression
> --
[
https://issues.apache.org/jira/browse/LUCENE-1799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Uwe Schindler updated LUCENE-1799:
--
Attachment: (was: LUCENE-1799.patch)
> Unicode compression
> ---
>
>
[
https://issues.apache.org/jira/browse/LUCENE-1799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Uwe Schindler updated LUCENE-1799:
--
Attachment: LUCENE-1799.patch
Here the policed one :-)
In my opinion something is better than
[
https://issues.apache.org/jira/browse/LUCENE-1799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Robert Muir updated LUCENE-1799:
Attachment: LUCENE-1799.patch
attached is a simple prototype for encoding terms as BOCU-1
So whil
25 matches
Mail list logo