Re: [fltk.development] Unicode character display page

2011-02-27 Thread Bill Spitzak
On 04/28/2010 01:36 AM, Duncan Gibson wrote: The first problem here is that the various macros to handle extended mappings, ie. ERRORS_TO_ISO8859_1, ERRORS_TO_CP1252 and STRICT_RFC3629 only apply to the fl_utf8decode() function in fl_utf.c [from FLTK2 ?] The other functions there, e.g.

Re: [fltk.development] Unicode character display page

2011-02-27 Thread Bill Spitzak
On 04/28/2010 01:55 AM, MacArthur, Ian (SELEX GALILEO, UK) wrote: OK - yes, this is a mess. I think the assumption was always that we were (somehow) going to make the input text utf8 clean when we read it, then the majority of the functions and methods would never have to worry about this

Re: [fltk.development] Unicode character display page

2010-04-30 Thread Duncan Gibson
Me: I've just opened STR 2348: http://www.fltk.org/str.php?L2348 test/editor fails to display misc/cp1252.txt and can hang I've implemented an fl_utf8len(const char* src) function that uses fl_utf8decode() internally to get the number of bytes in the next character sequence starting at src.

Re: [fltk.development] Unicode character display page

2010-04-30 Thread Matthias Melcher
On 30.04.2010, at 21:45, Duncan Gibson wrote: Me: I've just opened STR 2348: http://www.fltk.org/str.php?L2348 test/editor fails to display misc/cp1252.txt and can hang I've implemented an fl_utf8len(const char* src) function that uses fl_utf8decode() internally to get the number of bytes

Re: [fltk.development] Unicode character display page

2010-04-28 Thread Duncan Gibson
Bill: You need to remove the function that takes a single byte and says how long the UTF-8 character is. This was mostly removed in fltk2.0 which is why it works better, but this was not finished. The text editor has a lot of api that takes a byte rather than a pointer making fixing this

Re: [fltk.development] Unicode character display page

2010-04-28 Thread MacArthur, Ian (SELEX GALILEO, UK)
In the misc/cp1252.txt example, these 0x80-0x9f bytes appear as standalone bytes in the text. The Fl_Text_{Buffer,Display,Editor} code iterates through arrays of bytes. If the top bit is not set, it is plain old ascii: tabs, C0 control codes 0x01-0x1f and DEL 0x7f Hmmm - is Fl_Text_* using

Re: [fltk.development] Unicode character display page

2010-04-28 Thread MacArthur, Ian (SELEX GALILEO, UK)
As well as being able to count forward by the correct number of bytes, in order to move the cursor right by one character for example, we also need to be able to count backward by the correct number of bytes, eg to move the cursor left. And counting backwards now seems to be a more

Re: [fltk.development] Unicode character display page

2010-04-28 Thread Duncan Gibson
Me: In the misc/cp1252.txt example, these 0x80-0x9f bytes appear as standalone bytes in the text. The Fl_Text_{Buffer,Display,Editor} code iterates through arrays of bytes. If the top bit is not set, it is plain old ascii: tabs, C0 control codes 0x01-0x1f and DEL 0x7f Ian: Hmmm - is

Re: [fltk.development] Unicode character display page

2010-04-28 Thread MacArthur, Ian (SELEX GALILEO, UK)
I was about to answer The problem is... but then I though of some more and felt a Monty Python Spanish Inquisition moment coming on :-) So long as I get the comfy chair, then... The first problem here is that the various macros to handle extended mappings, ie. ERRORS_TO_ISO8859_1,

Re: [fltk.development] Unicode character display page

2010-04-28 Thread Duncan Gibson
Me: As well as being able to count forward by the correct number of bytes, in order to move the cursor right by one character for example, we also need to be able to count backward by the correct number of bytes, eg to move the cursor left. And counting backwards now seems to be a more

Re: [fltk.development] Unicode character display page

2010-04-28 Thread Duncan Gibson
Me: I will look at the fl_utf8fwd() and fl_utf8back() function tonight, if I have time, to see how they handle the following byte sequences: a a a^123 X a a a a a 123 X^a a a a X X^X X a a where 'a' is an ascii byte (0x01-0x7f), 'X' is a CP1252 byte (0x8-0x9f) and '1' is a utf-8 header

Re: [fltk.development] Unicode character display page

2010-04-28 Thread Duncan Gibson
Me: The first problem here is that the various macros to handle extended mappings, ie. ERRORS_TO_ISO8859_1, ERRORS_TO_CP1252 and STRICT_RFC3629 only apply to the fl_utf8decode() function in fl_utf.c [from FLTK2 ?] The other functions there, e.g. fl_utf8fwd() and fl_utf8back() assume they

Re: [fltk.development] Unicode character display page

2010-04-27 Thread Sparkaround
Duncan Gibson wrote: BTW.: did you see the new files (and README) in the misc/ directory? Particularly the *-utf8.txt files are interesting for FLTK-1.3 tests. Yes, thanks for those, they clarified a lot of the CP1252 discussion. I verified that Fl_Text_* handled them correctly after

Re: [fltk.development] Unicode character display page

2010-04-27 Thread Duncan Gibson
Me: If I kick off the both test/editor to read misc/cp1252*.txt I get: 1.3: cp1252.txt: missing columns 128 onward, ie no 8-bit chars, and missing the right wall of the table too [I've switched workspaces and it has hung too] Ian: Well, for the record, I get the same effect,

Re: [fltk.development] Unicode character display page

2010-04-27 Thread MacArthur, Ian (SELEX GALILEO, UK)
PS. To revert or not to revert, that is still the question! I don't know. I'm inclined to keep going forwards (as the clean-up seemed sensible) but I'm not actually doing anything constructive to make that happen so... -- Ian SELEX Galileo Ltd Registered Office: Sigma House, Christopher

Re: [fltk.development] Unicode character display page

2010-04-27 Thread Duncan Gibson
Me: I've just opened STR 2348: http://www.fltk.org/str.php?L2348 test/editor fails to display misc/cp1252.txt and can hang Unfortunately, it looks like this problem didn't exist back in r7400 before the big refactoring, but was in the last snapshot, r7513, so it looks like something got

Re: [fltk.development] Unicode character display page

2010-04-23 Thread Duncan Gibson
Me: If I kick off the both test/editor to read misc/cp1252*.txt I get: 1.3: cp1252.txt: missing columns 128 onward, ie no 8-bit chars, and missing the right wall of the table too [I've switched workspaces and it has hung too] Ian: Well, for the record, I get the same effect,

Re: [fltk.development] Unicode character display page

2010-04-22 Thread Albrecht Schlosser
On 21.04.2010, at 18:32, MacArthur, Ian (SELEX GALILEO, UK) wrote: I just tested this with my Ubuntu/firefox, too. After setting the default character set to UTF-8, everything in cp1252_utf-8.txt displays okay, except 0xAD (U+00AD), which is the soft hyphen. I'd say that it's okay for a

Re: [fltk.development] Unicode character display page

2010-04-22 Thread Albrecht Schlosser
Duncan Gibson wrote: So, the question would be: should we display a soft hyphen or not? Currently we do, and mk_wcwidth() is consistent (width=1). In my explorations of Fl_Text_{Buffer,Display} so far, I haven't been looking at, or for, advanced features such as optional hyphenation. We

Re: [fltk.development] Unicode character display page

2010-04-22 Thread MacArthur, Ian (SELEX GALILEO, UK)
I also tried FLTK 2.0 with cp1252_utf-8.txt, and it works like a charm. Even 0x98 (U+02DC, SMALL TILDE) is displayed and handled correctly (WRT cursor movement). No line length problems as in FLTK 1.3 - maybe we should have a look at FLTK 2's implementation? Well, there's a

Re: [fltk.development] Unicode character display page

2010-04-22 Thread Duncan Gibson
I also tried FLTK 2.0 with cp1252_utf-8.txt, and it works like a charm. Even 0x98 (U+02DC, SMALL TILDE) is displayed and handled correctly (WRT cursor movement). No line length problems as in FLTK 1.3 - maybe we should have a look at FLTK 2's implementation? Here at work I'm on a 64-bit

Re: [fltk.development] Unicode character display page

2010-04-22 Thread Michael Sweet
On Apr 22, 2010, at 1:06 AM, Albrecht Schlosser wrote: On 21.04.2010, at 18:32, MacArthur, Ian (SELEX GALILEO, UK) wrote: I just tested this with my Ubuntu/firefox, too. After setting the default character set to UTF-8, everything in cp1252_utf-8.txt displays okay, except 0xAD (U+00AD),

Re: [fltk.development] Unicode character display page

2010-04-22 Thread Duncan Gibson
If I kick off the both test/editor to read misc/cp1252*.txt I get: 1.3: cp1252.txt: missing columns 128 onward, ie no 8-bit chars, and missing the right wall of the table too [I've switched workspaces and it has hung too] Just repeated this at home on a 32-bit Lunar Linux

Re: [fltk.development] Unicode character display page

2010-04-22 Thread imacarthur
On 22 Apr 2010, at 18:55, Duncan Gibson wrote: If I kick off the both test/editor to read misc/cp1252*.txt I get: 1.3: cp1252.txt: missing columns 128 onward, ie no 8-bit chars, and missing the right wall of the table too [I've switched workspaces and it has hung too]

Re: [fltk.development] Unicode character display page

2010-04-21 Thread Duncan Gibson
BTW.: did you see the new files (and README) in the misc/ directory? Particularly the *-utf8.txt files are interesting for FLTK-1.3 tests. Yes, thanks for those, they clarified a lot of the CP1252 discussion. I verified that Fl_Text_* handled them correctly after adding the fl_wcwidth() stuff

Re: [fltk.development] Unicode character display page

2010-04-21 Thread Albrecht Schlosser
Duncan Gibson wrote: BTW.: did you see the new files (and README) in the misc/ directory? Particularly the *-utf8.txt files are interesting for FLTK-1.3 tests. Yes, thanks for those, they clarified a lot of the CP1252 discussion. I verified that Fl_Text_* handled them correctly after adding

Re: [fltk.development] Unicode character display page

2010-04-21 Thread Duncan Gibson
Albrecht: My test case is to open them in test/editor. I noticed that they all display correctly on Linux (Ubuntu), but there are a few problems on Windows: (1) 0x98 / U+02DC draws okay, but doesn't seem to advance the cursor correctly. [...] (2) calculation of line widths seems

Re: [fltk.development] Unicode character display page

2010-04-21 Thread MacArthur, Ian (SELEX GALILEO, UK)
I just tested this with my Ubuntu/firefox, too. After setting the default character set to UTF-8, everything in cp1252_utf-8.txt displays okay, except 0xAD (U+00AD), which is the soft hyphen. I'd say that it's okay for a browser to hide the soft hyphen, isn't it? MK's wcwidth() says: