Hi Behdad,
> Well, there's a bit more to it. Just because some bytes in a file are invalid
> acording to the spec doesn't mean your text editor should refuse to open the
> file. While g_utf8_get_char() and friends do assume valid UTF-8 data, it's an
> unwritten assumption that for invalid bytes
Hi,
Am Samstag, den 27.03.2010, 18:04 -0400 schrieb Behdad Esfahbod:
> Sure, I wasn't referring to valid data. In valid UTF-8, there is no 5byte or
> 6byte sequences either.
True, but that was a post-hoc restriction imposed afterwards, when
Unicode was redefined as a 21-bit character set, presu
Hi,
Am Samstag, den 27.03.2010, 17:40 -0400 schrieb Behdad Esfahbod:
> On 03/27/2010 05:21 PM, Daniel Elstner wrote:
> > Well, I assume that ints are at least 32 bit wide on any platform
> > supported by GLib. But if you meant to say that it would break with
> > larger ints,
Hi,
Am Samstag, den 27.03.2010, 16:51 -0400 schrieb Behdad Esfahbod:
> On 03/27/2010 04:27 PM, Daniel Elstner wrote:
> >
> > It is not meant to check for errors.
>
> Good point.
>
> > I think it is totally arbitrary to handle some potential errors but not
> &
Hi,
Am Samstag, den 27.03.2010, 16:12 -0400 schrieb Behdad Esfahbod:
> Err, you're right. My bad. It's still broken though since it doesn't check
> that the fragment bytes all start with the bits 10. Missing error checking.
It is not meant to check for errors.
I think it is totally arbitrary
Hi again,
Am Freitag, den 26.03.2010, 22:43 +0100 schrieb Daniel Elstner:
> Am Freitag, den 26.03.2010, 13:25 -0400 schrieb Behdad Esfahbod:
>
> > * The construct borrowed from glibmm, as beautiful as it is, is WRONG
> > for
> > 6-byte-long UTF-8. It just doesn&
Hi Behdad,
Am Freitag, den 26.03.2010, 13:25 -0400 schrieb Behdad Esfahbod:
> * The construct borrowed from glibmm, as beautiful as it is, is WRONG for
> 6-byte-long UTF-8. It just doesn't work. We historically support those
> sequences.
What? In what way exactly is it wrong?
--Daniel
Hi,
Am Dienstag, den 16.03.2010, 23:51 +0100 schrieb Mathieu Lacage:
> loading offsets are usually randomized once in a while and the whole
> system is prelinked with these randomized offsets so that all further
> loads do use the same 'random' (per-machine) offset until the next
> offset randomi
Hi,
Am Mittwoch, den 17.03.2010, 00:17 +0200 schrieb Mikhail Zabaluev:
> Yes, though we are already in the buffer overflow territory with all
> implementations of g_utf8_get_char considered so far.
Only read past the end, thus no security implications beyond a potential
for DoS in the unlikely e
Hi,
Am Dienstag, den 16.03.2010, 23:18 +0200 schrieb Mikhail Zabaluev:
> Umm. I had the conception of a DSO being one position-independent blob
> with all references made relative, even if basic ELF allows different
> segments loaded independently.
Impossible. There are no relative function poi
Hi,
Am Dienstag, den 16.03.2010, 22:52 +0200 schrieb Mikhail Zabaluev:
> I could try that, after I take your one to good internal use where it
> already shows more effect. But my current tests do not account for any
> hidden costs of inlining longish and branched code.
Addendum: It's actually no
Hi,
Am Dienstag, den 16.03.2010, 22:52 +0200 schrieb Mikhail Zabaluev:
> I already made some minor changes to restrict what it produces (like,
> c & 0x3f is safer than c - 0x80),
No -- this was on purpose! Using addition and subtraction here instead
of bitwise-and and bitwise-or allows the two
Hi,
Am Dienstag, den 16.03.2010, 21:05 +0200 schrieb Mikhail Zabaluev:
> I have tested your solution as applied to mainline g_utf8_get_char(),
> and not inlined except any intra-file optimizations. The results are
> for ARM this time.
[...]
> From the looks of it, some lesser oomph from the non-i
Hi,
Am Dienstag, den 16.03.2010, 14:09 -0400 schrieb Behdad Esfahbod:
> That's one of the worst ideas as far as software goes. If an operation takes
> 1% of your application time and you make it 1000 times faster, you know how
> much total faster your application would run? 1.01x faster...
Yes,
Hi,
Am Dienstag, den 16.03.2010, 19:49 +0200 schrieb Mikhail Zabaluev:
> I'm wary of inlining non-trivial code which has some branching in it,
> for the same reasons of cache pressure, killing branch prediction, and
> so on.
Well yes. That's why I would have liked numbers. :-) In any case, I
e
Hi,
Am Dienstag, den 16.03.2010, 19:38 +0200 schrieb Mikhail Zabaluev:
> 2010/3/16 Daniel Elstner :
>
> > Also, do you realize that have just single-handedly introduced 256 (!)
> > address references that need to be resolved by the dynamic linker at
> > library load
Hi,
Am Dienstag, den 16.03.2010, 13:01 -0400 schrieb Behdad Esfahbod:
> >
> > I've made a glib branch where I tried to optimize the UTF-8 decoding
> > routines:
> > http://git.collabora.co.uk/?p=user/zabaluev/glib.git;a=shortlog;h=refs/heads/fast-utf8
>
> Before any changes are made, can you pr
Hi again,
Am Dienstag, den 16.03.2010, 18:47 +0200 schrieb Daniel Elstner:
> Am Dienstag, den 16.03.2010, 17:20 +0200 schrieb Mikhail Zabaluev:
>
> > The new code uses a table of unrolled functions to decode byte
> > sequences, dispatched by the first character. g_utf8
Hi,
Am Dienstag, den 16.03.2010, 17:20 +0200 schrieb Mikhail Zabaluev:
> I've made a glib branch where I tried to optimize the UTF-8 decoding routines:
> http://git.collabora.co.uk/?p=user/zabaluev/glib.git;a=shortlog;h=refs/heads/fast-utf8
>
> The new code uses a table of unrolled functions to
Am Freitag, den 10.04.2009, 14:08 +0200 schrieb Christian Dywan:
> For the sake of demonstration, it took me 2 minutes to write a simple
> substring function in C that does what you want, have a look how it
> works. :)
It doesn't. Your function allocates memory using a byte count but then
uses t
Am Mittwoch, den 03.06.2009, 23:10 -0700 schrieb Brian J. Tarricone:
> On 06/03/2009 05:36 PM, Paul LeoNerd Evans wrote:
>
> > Yes; we messed up 30 years ago and said "k" when we
> > meant "Ki". Oops. Sorry about that.
>
> Well, no, 30 years ago there was no "Ki". So people did the logical
> th
Am Montag, den 13.04.2009, 21:26 -0400 schrieb Behdad Esfahbod:
> On 04/13/2009 05:00 AM, Butrus Damaskus wrote:
> > Hi!
> >
> > This page: http://bjoern.hoehrmann.de/utf-8/decoder/dfa/ claims to
> > have better (quicker and smaller?) utf8 decoder. Maybe it would be
> > worth to look at it?
>
> Fu
Am Freitag, den 17.11.2006, 11:57 -0500 schrieb Matthew Barnes:
> On Fri, 2006-11-17 at 17:32 +0100, Murray Cumming wrote:
> > This seems similar to a class we have in glibmm, Glib::Dispatcher:
> > http://www.gtkmm.org/docs/glibmm-2.4/docs/reference/html/classGlib_1_1Dispatcher.html#_details
> > th
23 matches
Mail list logo