On Sun, Jan 27, 2013 at 10:43 AM, Ivan Raikov <ivan.g.rai...@gmail.com>wrote:
> > Hi Alex, > > Yes, I would have thought that more people would be interested in > having UTF-8 support in core Chicken (or at least wide-char compatible > srfi-14). I have changed the title of this thread to reflect the subject > more accurately :-) > > Personally, I think that adding UTF-8 in core is much better than the > hacks I had to do in mbox, and is a no brainer considering the benchmark > results you have below. But I am sure that opinions vary on this subject... > > Can you post your bounds-check patches to srfi-14 on the mailing list, > and/or create a ticket for it? Hopefully there will be more responses this > time. > Well, I'm not necessarily proposing UTF-8 support in the core. I understand that has pros and cons and opinions may differ. I was just pointing out that we're already got 3 char-set implementations, 2 of them in the core distribution, and there are no real cons to simplifying this and replacing srfi-14 with one of the Unicode-capable implementations. The simplest change I made was replacing: (define-inline (si=0? s i) (zero? (%char->latin1 (string-ref s i)))) (define-inline (si=1? s i) (not (si=0? s i))) with: (define-inline (si=0? s i) (if (>= i 256) #t (zero? (%char->latin1 (string-ref s i))))) (define-inline (si=1? s i) (and (< i 256) (eq? 1 (%char->latin1 (string-ref s i))))) which is actually faster and while it doesn't support wide char-sets, at least gives the correct answers when passed wide chars. -- Alex > Ivan > > On Sat, Jan 26, 2013 at 1:42 PM, Alex Shinn <alexsh...@gmail.com> wrote: > >> On Wed, Jan 23, 2013 at 5:09 PM, Alex Shinn <alexsh...@gmail.com> wrote: >> >>> On Wed, Jan 23, 2013 at 3:45 PM, Ivan Raikov <ivan.g.rai...@gmail.com>wrote: >>> >>>> Yes, I ran into this when I was adding UTF-8 support to mbox... If you >>>> were to add wide char support in srfi-14, is there a way to quantify the >>>> performance penalty? >>>> >>> >>> To add the bounds check so it doesn't error? Practically >>> nothing. >>> >>> To branch to a separate path for a wide-char table if >>> the bounds check fails? Same cost if the input is ASCII. >>> >>> For efficient handling in the case of Unicode input... >>> how small/fast do you want it? >>> >> >> I've never met such stony silence in response to an offer to do work... >> >> I ran the following simple char-set-contains? benchmark with >> a few variations: >> >> (time >> (do ((i 0 (+ i 1))) >> ((= i 10000)) >> (do ((j 0 (+ j 1))) >> ((= j 256)) >> (char-set-contains? char-set:letter (integer->char j))))) >> >> This is what most people are concerned about for speed, as >> the boolean and construction operations are less common. >> >> The results: >> >> ;; reference implementation >> ;; 0.312s CPU time, 1/2059 GCs (major/minor) >> >> ;; "fixed" reference implementation (no error but no support for >> non-latin-1) >> ;; 0.257s CPU time, 1/1706 GCs (major/minor) >> >> ;; utf8-srfi-14 with full Unicode char-set:letter >> ;; 0.243s CPU time, 0/1526 GCs (major/minor) >> >> ;; utf8-srfi-14 with ASCII-only char-set:letter >> ;; 0.242s CPU time, 0/1526 GCs (major/minor) >> >> I was able to add the check and make the reference >> implementation faster because I fixed the common case - >> it was optimized for checking for 0 instead of 1. >> >> Even with the enormous and complex definition of a >> Unicode "letter", utf8-srfi-14 is faster than srfi-14. >> >> As for what we want in Chicken, the answer depends >> on what you're optimizing for. utf8-srfi-14 will always >> win for space, and generally for speed as well. >> >> If the biggest concern is code-size, then you might want >> to borrow the char-set definition from irregex and use >> that as a "fallback" for non-latin-1 chars in the srfi-14 >> reference impl. This would have the same perf as >> srfi-14 for latin-1, yet still support full Unicode and not >> increase the size of the Chicken distribution. >> >> -- >> Alex >> >> >
_______________________________________________ Chicken-users mailing list Chicken-users@nongnu.org https://lists.nongnu.org/mailman/listinfo/chicken-users