* Dan Jacobson wrote:
And tidy -utf8 miscalculates how wide Chinese Unicode characters are.
The are three bytes wide, but that doesn't matter.
What matters is that they are two characters wide on the screen.
I think it is perfectly reasonable to treat all characters the same in
this regard. You
Bug confirmed in version 20051018-1 too. For now using worrisome workaround:
$ tidy -utf8 Chinese.html|
perl -nwe 'BEGIN{use Text::Wrap; $Text::Wrap::huge=overflow;
$Text::Wrap::unexpand=0} if(/pre/../\/pre/){print;next}
$l=$_; s/[^[:ascii:]]{3}/12/g; if(length80){
print wrap(, ,$l)}else{print
tidy -utf8 -raw -q will change (qp-encoded bytes:)
=E8=A9=B1=E5=B1=80=E7=A2=BC=E6=A0=B8=E9=85=8D=E7=
=E8=A9=B1=E5=B1=80=E7=A2=BC=E6 =B8=E9=85=8D=E7=8F=
I.e., changing 0xA0 to SPC., and probably just in special
parts of certain HTML files, just to be stealthy :-(
And tidy -utf8 miscalculates how
# su - nobody
$ wget -O - http://jidanni.org/geo/taipower/sunriver/index.html|
iconv -f big5 -t utf8 i.html
$ tidy -q -utf8 -raw i.html|perl -nwe 'print if length80'|wc
4 5 419 #and they are just a tiny bit over 80. OK.
$ tidy -q -raw -utf8 i.html|perl -nwe 'print if
4 matches
Mail list logo