Bug#354640: [Tidy-dev] Bug#354640: tidy -utf8 -raw or else wrapping wacko on wide chars

2006-03-25 Thread Bjoern Hoehrmann
* Dan Jacobson wrote: And tidy -utf8 miscalculates how wide Chinese Unicode characters are. The are three bytes wide, but that doesn't matter. What matters is that they are two characters wide on the screen. I think it is perfectly reasonable to treat all characters the same in this regard. You

Bug#354640: [Tidy-dev] Bug#354640: tidy -utf8 -raw or else wrapping wacko on wide chars

2006-03-02 Thread Dan Jacobson
Bug confirmed in version 20051018-1 too. For now using worrisome workaround: $ tidy -utf8 Chinese.html| perl -nwe 'BEGIN{use Text::Wrap; $Text::Wrap::huge=overflow; $Text::Wrap::unexpand=0} if(/pre/../\/pre/){print;next} $l=$_; s/[^[:ascii:]]{3}/12/g; if(length80){ print wrap(, ,$l)}else{print

Bug#354640: [Tidy-dev] Bug#354640: tidy -utf8 -raw or else wrapping wacko on wide chars

2006-03-01 Thread Dan Jacobson
tidy -utf8 -raw -q will change (qp-encoded bytes:) =E8=A9=B1=E5=B1=80=E7=A2=BC=E6=A0=B8=E9=85=8D=E7= =E8=A9=B1=E5=B1=80=E7=A2=BC=E6 =B8=E9=85=8D=E7=8F= I.e., changing 0xA0 to SPC., and probably just in special parts of certain HTML files, just to be stealthy :-( And tidy -utf8 miscalculates how

Bug#354640: [Tidy-dev] Bug#354640: tidy -utf8 -raw or else wrapping wacko on wide chars

2006-02-27 Thread Dan Jacobson
# su - nobody $ wget -O - http://jidanni.org/geo/taipower/sunriver/index.html| iconv -f big5 -t utf8 i.html $ tidy -q -utf8 -raw i.html|perl -nwe 'print if length80'|wc 4 5 419 #and they are just a tiny bit over 80. OK. $ tidy -q -raw -utf8 i.html|perl -nwe 'print if