#define MBE multi-byte encoding
#defien SBE single-byte encoding

Stefan Sperling wrote on Tue, May 31, 2011 at 01:07:02 +0200:
> On Tue, May 31, 2011 at 01:41:54AM +0300, Daniel Shahaf wrote:
> > How would you handle a repository that contains the following
> > nodes/fspaths:
> > 
> > /foo/bår    (in UTF-8)
> > /foo/bår    (in latin1)
> > 
> > ?
> > 
> > 
> > How would you handle a repository that contains:
> > /foo/barÉ   (in latin1)
> > /foo/barŠ   (in latin2)
> > 
> > ?
> 
> All the ISO-8859 (latin) encodings are single-byte encodings.
> It's not possible to know what the encoding is supposed to be if
> paths in different ISO-8859 encodings entered the repository.
> They all decode to different but valid strings of characters.
> 
> In the first iteration of this feature I would simply assume one
> user-specified source encoding and try to convert data that isn't
> UTF-8 from the source encoding to UTF-8.
> In case multiple single-byte encodings are present this means that some
> characters will be wrong but the repository will work again without
> manual intervention. In case multiple multi-byte encodings other than
> UTF-8 are present this approach can fail and might require manual fixing
> (no worse than the current situation).
> This could still be improved upon if necessary.

True, I had overlooked these points.

One thing that jumps to mind is to have a list of encodings to
try --- i.e.,

   svnadmin load --recode-paths-from=MBE1,MBE2,SBE

would attempt to interpret paths as UTF-8, failing that as MBE1, failing
that as MBE2, failing that as SBE.

(I know you use vim, so: compare the 'fencs' option in vim).

Reply via email to