RE: SVN Blame Returns Corrupt Data

Bob Archer Fri, 11 Oct 2013 08:20:14 -0700

> On 11.10.2013 16:55, Bob Archer wrote:
> >> On 11.10.2013 15:58, Bob Archer wrote:
> >>>> On Thu, Oct 10, 2013 at 5:49 PM, Bob Archer <[email protected]>
> >> wrote:
> >>>> I assume he was asking how to "fix" the blame. Cause, sure, he
> >>>> could open the file, convert it back to UTF-8 with CRLF line
> >>>> endings... and commit it... of course, now blame is going to show
> >>>> him on every line, since he just changed every line.
> >>>>
> >>>> That's exactly what I meant.  You're correct with how the blame is
> >>>> handled.  I committed the UTF-8 copy to a test branch, diff'd, and
> >>>> it showed every line as being changed.  Unfortunately it looks like
> >>>> this is our
> >> best option.
> >>> Yep, we have done the same thing. As a matter of fact, I just over
> >>> the past
> >> few days rescripted all our database scripts to be UTF-8 since
> >> merging them just doesn't work correctly when they are UTF-16 even if
> >> you remove the binary mime type.
> >>>> On Thu, Oct 10, 2013 at 7:07 PM, Ben Reser <[email protected]> wrote:
> >>>> At current blame is not UTF-16 aware.
> >>> It's not just blame that isn't... the diff engine, or whatever
> >>> detects file
> >> types always considers UTF-16 files to be binary. If you "add" a
> >> UTF-16 file you see that svn adds the application/octet-stream mime
> >> type.  There is an issue in the bug database about this from when I
> >> reported/complained about it... however it hasn't been addressed. I'm
> >> surprised still at this time that svn still can't support UTF-16 text 
> >> files as
> text wrt adding, diffing, blaming, etc.
> >>
> >> It's quite simple: no-one has written the necessary code. While I can
> >> understand it's an interesting feature for Windows users, most
> >> Subversion developers have other things to do. This being a volunteer
> >> project, and most of us do not use Windows, you can hardly expect
> >> anyone to spend several weeks on solving a problem that has a
> >> perfectly simple workaround. Since
> >> UFT-8 and UTF-16 can be interchanged without data loss, there are
> >> other, much more important things to do in Subversion.
> > I appreciate all that you said. I didn't expect that UTF-16 was so uncommon
> in non-Windows OSes. A large number of dev tools that I work with on
> Windows, especially the Microsoft tools default to creating UTF-16 files.
> >
> > I disagree with your "can be converted without data loss". If you need UTF-
> 16 then you need it. Also, if you are working in an international team and you
> have developers with other language Oss which have different code pages
> then what you see when you look at a UTF-8 file might be different than
> what I see.
> 
> I don't follow. Both UTF-16 and UTF-8 are complete representations of the
> Unicode character set. Exactly the same code sequences can be represented
> in both encodings. You can convert from UTF-16 to UTF-8 and back and get
> exactly the same sequence of bytes.
>


Ok, I have to back pedal here a bit.  You are correct, UTF-8 is a Unicode 
format and can store all characters. It's not a UTF-8 vs UTF-16 issue (Friday 
senior moment). What I recall being told by one of the subversion developers 
was that subversion only supported the ASCII character set and while UTF-8 was 
compatible with ASCII it didn't truly support Unicode files. 

However, this blog entry seems to dispute that:

http://rhubbarb.wordpress.com/2012/04/28/svn-unicode/

Would adding that mime-type to this file fix the blame issues this user is 
seeing?

BOb

RE: SVN Blame Returns Corrupt Data

Reply via email to