[
https://issues.apache.org/jira/browse/THRIFT-395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12694247#action_12694247
]
Jonathan Ellis commented on THRIFT-395:
---------------------------------------
> The consistency is that in every Thrift language, we use the native "string"
> type to represent the Thrift "string" type.
Then you should be honest and just use binary everywhere, because native string
types are not at all cross-platform.
> We do not try to force Unicode semantics on languages where they are
> non-idiomatic.
I've explained what modern Python idiom is: strings may be ascii `str` or any
`unicode`. Binary data is also represented as `str` but that does not make it
a "string."
So I'm very skeptical of this appeal to idiom when the current behavior is NOT
idimatic for Python any time since the unicode type was added. (2.0, october
2000.)
> For what it's worth, protocol buffers use a blob type for strings in C++.
See http://code.google.com/apis/protocolbuffers/docs/proto.html. "A string
must always contain UTF-8 encoded or 7-bit ASCII text."
> It gives application writers the option of putting unicode objects in their
> Thrift structures
to be read out as str? Doing half of encode/decode is worse than not doing it
at all.
> We do: str
You just admitted that when you write unicode it reads back as str.
---
"if you have code that is using the Thrift string type when it should be
binary, s/string/binary/ in your IDL is a virtually painless change to make."
Assuming for the sake of argument that strings should be utf8 (which includes
ascii!), do you agree with the above statement?
> Python library + compiler does not support unicode strings
> ----------------------------------------------------------
>
> Key: THRIFT-395
> URL: https://issues.apache.org/jira/browse/THRIFT-395
> Project: Thrift
> Issue Type: Bug
> Components: Compiler (Python), Library (Python)
> Reporter: Jonathan Ellis
> Assignee: Jonathan Ellis
> Priority: Blocker
> Fix For: 0.1
>
> Attachments:
> 0001-python-Minor-cleanup-of-protocols-don-t-use-str.patch,
> 0002-THRIFT-395.-python-Phase-One-of-support-for-unicode.patch,
> 0003-THRIFT-395.-python-Phase-Two-of-support-for-unicode.patch,
> 0004-python-Remove-ridiculous-semicolons-from-gen-code.patch,
> python-utf8-v2.patch, python-utf8.patch
>
>
> Effectively, all strings in the python bindings are treated as binary strings
> -- no encoding/decoding to UTF-8 is done. So if a unicode object is passed
> to a (regular, non-binary) string, an exception is raised.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.