On 2020-06-29 12:11, Manuel Jacob wrote:
Hi,

In a Python application, I want to convert a path (as native Unix
bytes) to a file URL (and later probably also other paths between the
"file system encoding" and UTF-8). There are functions for this in the
Subversion binding. However, for the sake of being able to deal with
the familiar Python exceptions, I’d like to do the decoding/encoding
in Python. For that, I need to find out the encoding that Subversion
uses for converting UTF-8 to the "file system encoding".

Subversion seems to use the encoding returned by
apr_os_locale_encoding(), which is however not exposed by the Python
bindings.

lib = ctypes.CDLL(libsvn._core.__file__)
lib.apr_os_locale_encoding.argtypes = [ctypes.c_void_p]
lib.apr_os_locale_encoding.restype = ctypes.c_char_p
with util.with_lc_ctype():

I forgot to mention what `with util.with_lc_ctype()` does. It calls `setlocale(LC_CTYPE, '')` before the block and resets it to what it was before after the block. I put it around all calls to the Subversion bindings to ensure that Subversion works correcly while locale-dependent str methods on Python 2 stay unchanged.

es = lib.apr_os_locale_encoding(int(svn.core.application_pool.this))
fsencoding = codecs.lookup(es).name

Is there an easier way? I could emulate what apr_os_locale_encoding()
is doing, which is calling nl_langinfo() and falling back to
ISO-8859-1 on systems which are supported by Python. Is it reasonable
to assume that this logic will stay? Or, asked differently, what has
the least chance of stopping to give the "file system encoding"? The
ctypes code or using nl_langinfo (falling back to ISO-8859-1)?

Thanks,
Manuel

Reply via email to