On Tue, Jan 2, 2024 at 4:28 PM Rob Landley <r...@landley.net> wrote: > > On 1/2/24 17:21, enh wrote: > >> > if you really care, not even icu4c (my usual answer to such > >> > questions, and something bionic regularly forwards such questions to), > >> > you want to talk to something like > >> > https://en.wikipedia.org/wiki/HarfBuzz instead --- this shit gets > >> > weird, fast. > >> > >> Yes, but that's not really the question I'm asking. > > > > no, but it's the question you actually _need_ to ask if you're worried > > about doing something _useful_ > > I'm worried about implementing unicode-aware interactive line editing for > toysh, > which may someday get retrofitted onto the vi implementation but for now > that's > not my problem. > > The way I _thought_ fold worked is how line editing has to work: backspace > undoes the previous character, including jumping back to the start of variable > width tabs, so I've got to checkpoint the previous position for backspace to > return to. > > There are various horrible alternatives, including send the ansi position > query > after every keystroke or jumping to the left edge and rewriting the entire > line > each time with "clear to end of line" sequence at the end, but I'd rather use > a > solution that ISN'T crazy. > > > --- it's probably better to think of > > some scripts as "nothing but combining characters". > > Then what do they combine _with_?
https://github.com/n8willis/opentype-shaping-documents/blob/master/opentype-shaping-arabic.md > I tried putting an umlaut on low ascii characters. It didn't even work with > "tab"... > > >> How often do new unicode > >> tables come out and do they ever really make big changes? > > > > "about one/year" [citation needed? > > https://en.wikipedia.org/wiki/Unicode#Versions] > > > >> There are only 1.1 > >> million possible values, this is not a big table of numbers in a modern > >> computing context, and there presumably ARE answers? > > > > my point is that it's the _combinations_ that are interesting. that's > > why i mentioned harfbuzz. > > https://harfbuzz.github.io/why-do-i-need-a-shaping-engine.html is a > > good high-level intro (the paragraph containing the word "arabic" in > > particular). > > Um... if combining characters change the width of the base character, I think > I'm just plain gonna get the fontmetrics wrong there. I don't see how I can > avoid it. > > >> Anyway, why is this NOT a couple bitmaps for 0 and 1 and an if/else > >> staircase > >> for oddballs, else size 2. I'm aware the xfce terminal isn't exactly > >> cannonical, > >> and maybe it's printing something when it shouldn't, but this is the > >> question > >> I'm trying to ask with wcwidth(). When I print this, how many columns does > >> that > >> consume on the terminal? It's giving a width to these characters. > > > > (see the harfbuzz documentation for why "character width" isn't a > > meaningful concept for all the world's scripts :-) ) > > Then I can't support all the world's scripts. > > The perfect is the enemy of the good. I want to figure out the subset I _can_ > support. And right now, it's not handling japanese. > > If I have to make simplifying assumptions, then "low ascii is weird", and > every > other unicode codepoint is either 0, 1, or 2 characters, and maybe I need to > handle the right to left direction switching codepoints but I'm not entirely > sure how. > > It sounds like getting this perfect is a full-time job for a dedicated domain > expert, and even they can't package it up in a useful fashion so people who > AREN'T domain experts can ask simple questions that get answers. (If the > unicode > consortium produced a mess that goes non-euclidian in places, I only have so > much brain to try to understand the results with.) right, but then i'm back to "why don't you just trust wcwidth() and move on with your life?" :-) isn't that all the competition is doing? (i actually have no idea --- i don't speak any rtl languages, so korean is the most exotic thing i've ever done at the prompt, and that's not really any more complicated than german in this sense.) > Rob _______________________________________________ Toybox mailing list Toybox@lists.landley.net http://lists.landley.net/listinfo.cgi/toybox-landley.net