Re: [PATCH] diff-highlight: Fix broken multibyte string

2015-04-04 Thread Jeff King
On Fri, Apr 03, 2015 at 03:24:09PM -0700, Kyle J. McKay wrote: I thought that meant we could also optimize out the map call entirely, and just use the first split (with *) to end up with a list of $COLOR chunks and single characters, but it does not seem to work. So maybe I am misreading

Re: [PATCH] diff-highlight: Fix broken multibyte string

2015-04-03 Thread Jeff King
On Thu, Apr 02, 2015 at 06:59:50PM -0700, Kyle J. McKay wrote: It should work as well as the original did for any 1-byte encoding. That is, if it's not valid UTF-8 it should pass through unchanged and any single byte encoding should just work. But, as you point out, multibyte encodings

Re: [PATCH] diff-highlight: Fix broken multibyte string

2015-04-03 Thread Kyle J. McKay
On Apr 3, 2015, at 15:08, Jeff King wrote: Doing: diff --git a/contrib/diff-highlight/diff-highlight b/contrib/diff- highlight/diff-highlight index 08c88bb..1c4b599 100755 --- a/contrib/diff-highlight/diff-highlight +++ b/contrib/diff-highlight/diff-highlight @@ -165,7 +165,7 @@ sub

Re: [PATCH] diff-highlight: Fix broken multibyte string

2015-04-03 Thread Jeff King
On Fri, Apr 03, 2015 at 11:19:24AM +0900, Yi, EungJun wrote: I timed this one versus the existing diff-highlight. It's about 7% slower. That's not great, but is acceptable to me. The String::Multibyte version was a lot faster, which was nice (but I'm still unclear on _why_). I think

Re: [PATCH] diff-highlight: Fix broken multibyte string

2015-04-02 Thread Jeff King
On Thu, Apr 02, 2015 at 05:49:24PM -0700, Kyle J. McKay wrote: Subject: [PATCH v2] diff-highlight: do not split multibyte characters When the input is UTF-8 and Perl is operating on bytes instead of characters, a diff that changes one multibyte character to another that shares an initial

Re: [PATCH] diff-highlight: Fix broken multibyte string

2015-04-02 Thread Kyle J. McKay
On Apr 2, 2015, at 18:24, Jeff King wrote: On Thu, Apr 02, 2015 at 05:49:24PM -0700, Kyle J. McKay wrote: Subject: [PATCH v2] diff-highlight: do not split multibyte characters When the input is UTF-8 and Perl is operating on bytes instead of characters, a diff that changes one multibyte

Re: [PATCH] diff-highlight: Fix broken multibyte string

2015-04-02 Thread Kyle J. McKay
On Mar 30, 2015, at 15:16, Jeff King wrote: Yeah, I agree the current output is not ideal, and this should address the problem. I was worried that multi-byte splitting would make things slower, but in my tests, it actually speeds things up! [...] Unfortunately, String::Multibyte is not a

Re: [PATCH] diff-highlight: Fix broken multibyte string

2015-04-02 Thread Yi, EungJun
I timed this one versus the existing diff-highlight. It's about 7% slower. That's not great, but is acceptable to me. The String::Multibyte version was a lot faster, which was nice (but I'm still unclear on _why_). I think the reason is here: sub split_line { local $_ = shift;

Re: [PATCH] diff-highlight: Fix broken multibyte string

2015-03-30 Thread Jeff King
On Tue, Mar 31, 2015 at 12:55:33AM +0900, Yi EungJun wrote: From: Yi EungJun eungjun...@navercorp.com Highlighted string might be broken if the common subsequence is a proper subset of a multibyte character. For example, if the old string is 진 and the new string is 지, then we expect the

[PATCH] diff-highlight: Fix broken multibyte string

2015-03-30 Thread Yi EungJun
From: Yi EungJun eungjun...@navercorp.com Highlighted string might be broken if the common subsequence is a proper subset of a multibyte character. For example, if the old string is 진 and the new string is 지, then we expect the diff is rendered as follows: -진 +지 but actually it