[
https://issues.apache.org/jira/browse/THRIFT-395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12693976#action_12693976
]
Jonathan Ellis commented on THRIFT-395:
---------------------------------------
You are right, I didn't know the history here. But citing the (outdated)
whitepaper which you have already violated when convenient isn't a very
convincing argument.
Both the old way ("there is no string, only binary") or the Java and C# way
(strings are utf-8; binary is byte[]) are self-consistent and make sense. But
"there is string, and binary, and sometimes the former is utf8-encoded, but not
always" is not.
Personally I think the Java / C# way is better, since it solves a common
problem across languages which is one of the reasons to bother using thrift.
But if you want to argue the other way, fine, let's file bugs against Java and
C# and remove the misleading type. (I would argue that `string` is the
misleading one and `binary` is the proper name for its behavior.)
(For what it's worth, protocol buffers defines `string` and `bytes` types,
corresponding to the behavior of `string` and `binary` in what we are calling
the "java and C# way" here.)
> Python library + compiler does not support unicode strings
> ----------------------------------------------------------
>
> Key: THRIFT-395
> URL: https://issues.apache.org/jira/browse/THRIFT-395
> Project: Thrift
> Issue Type: Bug
> Components: Compiler (Python), Library (Python)
> Reporter: Jonathan Ellis
> Assignee: Jonathan Ellis
> Priority: Blocker
> Fix For: 0.1
>
> Attachments:
> 0001-python-Minor-cleanup-of-protocols-don-t-use-str.patch,
> 0002-THRIFT-395.-python-Phase-One-of-support-for-unicode.patch,
> 0003-THRIFT-395.-python-Phase-Two-of-support-for-unicode.patch,
> 0004-python-Remove-ridiculous-semicolons-from-gen-code.patch,
> python-utf8-v2.patch, python-utf8.patch
>
>
> Effectively, all strings in the python bindings are treated as binary strings
> -- no encoding/decoding to UTF-8 is done. So if a unicode object is passed
> to a (regular, non-binary) string, an exception is raised.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.