On Sat, 2017 Apr 22 23:26+0000, Thorsten Glaser wrote: > > >Oh, so you mean like if(c=='[') and such? That is certainly > >reasonable. The program would be tied to the compile-time codepage no > >worse than most other programs. > > Right. So either something like -DMKSH_EBCDIC_CP=1047 or limiting > EBCDIC support to precisely one codepage.
I don't think the former sort of directive should be necessary. There is enough auto-conversion magic going on that it should be possible to piggyback on that... where it all "just works" when you compile the code. > >(If you could do everything in terms of character literals, without > >depending on constructs like if(c>='A'&&c<='Z'), your code would be > >pretty much EBCDIC-proof.) > > Yesss… but… > > ① not all characters are in every codepage, and True, but ASCII should be a given. (There are some older EBCDIC codepages that lack certain common characters, I forget which ones, but no one will want to use those anyway.) > ② I need strictly monotonous ordering for all 256 possible octets > for e.g. sorting strings in some cases and for [a-z] ranges That sounds no worse than what is usually done for LC_COLLATE and such... > OK, I can live with that, so I just need to swap the conversion tables > I got (which map 15 to NEL and 25 to LF). Always thought it was funny that it's the weirdo mainframe platform that has a proper "newline" character instead of pressing LF into service as one ^_^ > > #pragma convert("ISO8859-1") > […] > >That may or may not be useful. Of course, the pragma would need to be > > Interesting, but I can’t think of where that would be useful at the > moment. But good to know. > > Hmm. Can this be used to construct the table? > > Something like running this at configure time: > > main() { > int i = 1; > > printf("#pragma convert(\"ISO8859-1\")\n"); > printf("static const unsigned char map[] = \""); > while (i <= 255) > printf("%c", i++); > printf("\";\n"); > } > > And then feed its output into the compiling, and have > some code generating the reverse map like: > > i = 0; > while (i < 255) > revmap[map[i]] = i + 1; > > But this reeks of fragility compared with supporting a known-good hand- > edited set of codepages. Probably easier just to use etoa(), or atoe()? I don't think explicit hand-edited tables should be needed for EBCDIC, unless you're already doing those for other encodings. > (Not to say we can’t do this manually once in order to actually _get_ > those mappings.) Certainly the above code would either need some tweaking, or the output some massaging, so the odd characters (especially '"') don't throw off the compiler. > >Let me know if I can help any more! > > Okay, sure, thanks. I must admit I’m not actively working on this > still but I’m considering making a separate branch on which we can try > things until they work, then merge it back. I'm happy to test iterations of this, as long as it doesn't need much diagnosing... > But first, the character class changes themselves. That turned out to > be quite a bit more effort than I had estimated and will keep me busy > for another longish hacking session. Ugh. Oh well. But on the plus > side, this will make support much nicer as *all* constructs like “(c > >= '0' && c <= '9')” will go away and even the OS/2 TEXTMODE line > endings (where CR+LF is also supported) need less cpp hackery. Sounds great! That'll certainly make EBCDIC easier to deal with. I might suggest looking at Gnulib, specifically lib/c-ctype.h, for inspiration. I helped them get their ctype implementation in order on z/OS (and at one point we were even trying to deal with *signed* EBCDIC chars, where 'A' has a negative value!), and it works solidly now. They've got a good design for dealing with non-ASCII weirdness; they were clearly thinking of that from the start. Happy hacking, --Daniel -- Daniel Richard G. || sk...@iskunk.org My ASCII-art .sig got a bad case of Times New Roman.