On Thu, May 23, 2013 at 10:06 PM, Philip Martin <philip.mar...@wandisco.com> wrote: > Dongsheng Song <dongsheng.s...@gmail.com> writes: > >> On Thu, May 23, 2013 at 9:28 PM, Philip Martin >> <philip.mar...@wandisco.com> wrote: >>> Dongsheng Song <dongsheng.s...@gmail.com> writes: >>> >>>> On Thu, May 23, 2013 at 9:11 PM, Philip Martin >>>> <philip.mar...@wandisco.com> wrote: >>>>> Philip Martin <philip.mar...@wandisco.com> writes: >>>>> >>>>>> So it appears the UTF8 to native conversion is missing from >>>>>> repos_notify_handler. I think repos_notify_handler should be using >>>>>> svn_stream_printf_from_utf8 rather than svn_stream_printf. >>>>> >>>>> I've fixed trunk to use svn_cmdline_cstring_from_utf8 and proposed it >>>>> for 1.8. >>>>> >>>> >>>> As GETTEXT(3) man pages said, If and only if >>>> defined(HAVE_BIND_TEXTDOMAIN_CODESET), >>>> your commit is OK. >>>> >>>> So you should check HAVE_BIND_TEXTDOMAIN_CODESET when you use >>>> svn_cmdline_cstring_from_utf8. >>> >>> Are you saying there is a problem with my change? If there is a problem >>> doesn't already apply to all other uses of svn_cmdline_cstring_from_utf8? >>> >> >> I thinks so. In the subversion/libsvn_subr/nls.c file: >> >> #ifdef HAVE_BIND_TEXTDOMAIN_CODESET >> bind_textdomain_codeset(PACKAGE_NAME, "UTF-8"); >> #endif /* HAVE_BIND_TEXTDOMAIN_CODESET */ >> >> bind_textdomain_codeset only called when HAVE_BIND_TEXTDOMAIN_CODESET >> defined. In this case, you can assume GETTEXT(3) returned string is >> UTF-8 encoded. > > I still don't understand if you are claiming my change has a problem or > if there is a problem in all uses of svn_cmdline_cstring_from_utf8. > > I recall a related thread from last year: > > http://svn.haxx.se/dev/archive-2012-08/index.shtml#34 > http://mail-archives.apache.org/mod_mbox/subversion-dev/201208.mbox/%3Cop.wilcelggnngjn5@tortoise%3E > > I think we assume that the translations are UTF-8. > > Is there some code change you think we should make? >
Even ALL the translations are UTF-8, GETTEXT(3) still return the string encoded by the ***current locale's codeset***. Here is sniped from the GETTEXT(3) man pages: In both cases, the functions also use the LC_CTYPE locale facet in order to convert the translated message from the translator's codeset to the ***current locale's codeset***, unless overridden by a prior call to the bind_textdomain_codeset function. So svn_cmdline_printf SHOULD NOT assume the input string is UTF-8 coded, it it encoded to the ***current locale's codeset***. I think the best solution is: DO NOTconvert the GETTEXT(3) returned messages, write it ***AS IS***, since GETTEXT(3) already do the correct conversion for us.