Re: [PATCH] gitweb: Measure offsets against UTF-8 flagged string

2018-05-03 Thread Junio C Hamano
Jakub Narebski writes: > I think the problem is not with aligning, otherwise we would simply get > bad aling, and not visible corruption. The ACTUAL PROBLEM is most > probably because of concatenating strings marked as UTF-8 and strings > not marked as UTF-8. Strange things

Re: [PATCH] gitweb: Measure offsets against UTF-8 flagged string

2018-05-03 Thread Shin Kojima
> One solution would be to force conversion to UTF-8 on input via "open" > pragma (e.g. "use open ':encoding(UTF-8)';"). But there is no > UTF-8-with_fallback encoding available - we would have to write one, and > install it as module (or fake it via Perl trickery). This mechanism is > almost

Re: [PATCH] gitweb: Measure offsets against UTF-8 flagged string

2018-05-03 Thread Jakub Narebski
Junio C Hamano writes: > Shin Kojima writes: > >> Offset positions should not be counted by byte length, but by actual >> character length. >> ... >> # escape tabs (convert tabs to spaces) >> sub untabify { >> -my $line = shift; >> +my $line =

Re: [PATCH] gitweb: Measure offsets against UTF-8 flagged string

2018-05-02 Thread Shin Kojima
> ideally we should be able to say "function X takes non-UTF8 and > works on it", "function Y takes UTF8 and works on it", and "function > Z takes non-UTF8 and gives UTF8 data back" for each functions > clearly, not "function W can take either UTF8 or any other garbage > and tries to return UTF8".

Re: [PATCH] gitweb: Measure offsets against UTF-8 flagged string

2018-05-02 Thread Junio C Hamano
Shin Kojima writes: > Offset positions should not be counted by byte length, but by actual > character length. > ... > # escape tabs (convert tabs to spaces) > sub untabify { > - my $line = shift; > + my $line = to_utf8(shift); > > while ((my $pos =

[PATCH] gitweb: Measure offsets against UTF-8 flagged string

2018-05-01 Thread Shin Kojima
Offset positions should not be counted by byte length, but by actual character length. >5183 # We need to untabify lines before split()'ing them; >5184 # otherwise offsets would be invalid. Horizontal tab is not the only case we need to consider. Please excuse me for using