[ 
https://issues.apache.org/jira/browse/THRIFT-395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12693976#action_12693976
 ] 

Jonathan Ellis commented on THRIFT-395:
---------------------------------------

You are right, I didn't know the history here.  But citing the (outdated) 
whitepaper which you have already violated when convenient isn't a very 
convincing argument.

Both the old way ("there is no string, only binary") or the Java and C# way 
(strings are utf-8; binary is byte[]) are self-consistent and make sense.  But 
"there is string, and binary, and sometimes the former is utf8-encoded, but not 
always" is not.

Personally I think the Java / C# way is better, since it solves a common 
problem across languages which is one of the reasons to bother using thrift.  
But if you want to argue the other way, fine, let's file bugs against Java and 
C# and remove the misleading type.  (I would argue that `string` is the 
misleading one and `binary` is the proper name for its behavior.)

(For what it's worth, protocol buffers defines `string` and `bytes` types, 
corresponding to the behavior of `string` and `binary` in what we are calling 
the "java and C# way" here.)

> Python library + compiler does not support unicode strings
> ----------------------------------------------------------
>
>                 Key: THRIFT-395
>                 URL: https://issues.apache.org/jira/browse/THRIFT-395
>             Project: Thrift
>          Issue Type: Bug
>          Components: Compiler (Python), Library (Python)
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>            Priority: Blocker
>             Fix For: 0.1
>
>         Attachments: 
> 0001-python-Minor-cleanup-of-protocols-don-t-use-str.patch, 
> 0002-THRIFT-395.-python-Phase-One-of-support-for-unicode.patch, 
> 0003-THRIFT-395.-python-Phase-Two-of-support-for-unicode.patch, 
> 0004-python-Remove-ridiculous-semicolons-from-gen-code.patch, 
> python-utf8-v2.patch, python-utf8.patch
>
>
> Effectively, all strings in the python bindings are treated as binary strings 
> -- no encoding/decoding to UTF-8 is done.  So if a unicode object is passed 
> to a (regular, non-binary) string, an exception is raised.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to