On 2010-08-10 20:59:00 +0200, Stefan Sperling wrote: > On Tue, Aug 10, 2010 at 07:44:35PM +0200, Vincent Lefevre wrote: > > This is easy (at least from the specification point of view): once the > > encoding has been determined[*], typically at checkout time, store the > > encoding in the WC metadata (with the current WC layout, that would be > > some file under the .svn directory), so that the next time the svn > > client is used for this WC, the same encoding will be used, avoiding > > inconsistencies (such as currently obtained by two "svn up" under two > > different locales). > > I doubt this can be made to work properly. A feature like that is just > asking people to shoot themselves in the foot.
I don't see any problem with it. If you want another method, then fine, but in any case, a command like "svn up" should not fail just because it is executed under locales unexpected by the client. > People simply should not mix character sets like that in their > working copies. It seems that you didn't understand what I proposed. My proposal is just to *avoid* mixing character sets in filenames (contrary to what svn currently does), i.e. to use a single character set, defined at checkout time (for instance). > There should be a project-wide convention about the encoding used for > filenames, and everyone should be using that encoding For the repository, of course, but it is already the case: UTF-8. For working copies, if a single encoding must be defined, it should be UTF-8 too, in particular to be sure to be able to represent all the filenames that can occur. > (unless there > really is a project-specific need to have filenames in multiple encodings > for some reason, but that's really rare -- and whoever does this should be > smart enough to deal with the consequences). > > Right now, if the filename cannot be represented in the current locale, > you get this error: "svn: Can't convert string from 'UTF-8' to native > encoding" which is bad and prevents users from writing POSIX-conforming scripts using svn, i.e. under the POSIX locale (except on systems where the POSIX locale uses UTF-8, but I don't know any). > The native encoding is determined by the locale, but that does not matter. > The point is that, wherever encoding configuration happens to come from, > if the configured encoding cannot represent the character string stored > as UTF-8 in the repository, what is Subversion supposed to do? It cannot > really do anything with a filename it cannot represent in the character > set configured by the user, other than throwing an error. For filenames stored on disk, they (all of them) can be encoded using UTF-8. Remember, filenames on a POSIX system are just sequences of bytes. For what is output to the terminal, non-representable characters can be displayed by a replacement characters such as "?". This can still be better than an error. > The filename conversion to UTF-8 and back must not be lossy. Because > to uniquely identify a file the client needs to send the same UTF-8 byte > sequence it got from the server back to the server. And it needs to keep > doing so for backwards compatibility. This is biting us on Mac OS X by the > way, because some characters have multiple representations in UTF-8, > see http://subversion.tigris.org/issues/show_bug.cgi?id=2464 This problem is due to the fact that Subversion doesn't enforce a canonical representation (either NFC or NFD). Anyway there would still be problems with case-insensitive filesystems for instance. > > [*] There are several ways to do that, such as: > > 1. Use a charset specified by the user in the svn config file. > > That provides no advantage over checking the current locale. The advantage is that the user doesn't need to remember to use a UTF-8 based locale for the checkout. This would also allow the user to do checkout by portable POSIX scripts (i.e. with LC_ALL=POSIX). > > 2. Use the current locale. > > That's what's being done. But we're not writing the information down in the > working copy meta data, and doing so is quite pointless as described above. It's not pointless, or at least, something else needs to be done. Currently "svn up" fails to work, and that's a problem. -- Vincent Lefèvre <vinc...@vinc17.net> - Web: <http://www.vinc17.net/> 100% accessible validated (X)HTML - Blog: <http://www.vinc17.net/blog/> Work: CR INRIA - computer arithmetic / Arénaire project (LIP, ENS-Lyon)