[ 
https://issues.apache.org/jira/browse/THRIFT-5303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jens Geyer resolved THRIFT-5303.
--------------------------------
    Fix Version/s: 0.14.0
         Assignee: Quanlong Huang
       Resolution: Fixed

> Unicode decode errors in _fast_decode
> -------------------------------------
>
>                 Key: THRIFT-5303
>                 URL: https://issues.apache.org/jira/browse/THRIFT-5303
>             Project: Thrift
>          Issue Type: Bug
>          Components: Python - Library
>    Affects Versions: 0.11.0
>         Environment: Ubuntu 16.04.6 LTS
>            Reporter: Quanlong Huang
>            Assignee: Quanlong Huang
>            Priority: Major
>             Fix For: 0.14.0
>
>
> Impala currently uses thrift-0.11.0 on client side and thrift-0.9.3 on server 
> side (server side upgrade is blocked by some issues). We encountered an issue 
> in decoding utf8 bytes on the client side. The result has a partial utf8 code 
> point. But thrift is not handling the error elegantly. The stacktrace:
> {code:java}
> Traceback (most recent call last):
>   File "/home/quanlong/workspace/Impala/shell/impala_client.py", line 1210, 
> in _do_beeswax_rpc
>     ret = rpc()
>   File "/home/quanlong/workspace/Impala/shell/impala_client.py", line 1113, 
> in <lambda>
>     self.fetch_size))
>   File 
> "/home/quanlong/workspace/Impala/shell/build/thrift-11-gen/gen-py/beeswaxd/BeeswaxService.py",
>  line 254, in fetch
>     return self.recv_fetch()
>   File 
> "/home/quanlong/workspace/Impala/shell/build/thrift-11-gen/gen-py/beeswaxd/BeeswaxService.py",
>  line 275, in recv_fetch
>     result.read(iprot)
>   File 
> "/home/quanlong/workspace/Impala/shell/build/thrift-11-gen/gen-py/beeswaxd/BeeswaxService.py",
>  line 1410, in read
>     iprot._fast_decode(self, iprot, [self.__class__, self.thrift_spec])
> UnicodeDecodeError: 'utf8' codec can't decode byte 0xe6 in position 3: 
> unexpected end of data {code}
> This is similar to THRIFT-2087, but the error happens in the boundary between 
> Python and C++ codes. Just like THRIFT-2087, we need to provide an error 
> handling behavior of decoding utf-8 bytes in 
> {{TBinaryProtocolAccelerated._fast_decode}}. The related codes are 
> [https://github.com/apache/thrift/blob/0.11.0/lib/py/src/ext/protocol.tcc#L708]
> {code:c++}
>   case T_STRING: {
>     char* buf = NULL;
>     int len = impl()->readString(&buf);
>     if (len < 0) {
>       return NULL;
>     }
>     if (isUtf8(typeargs)) {
>       return PyUnicode_DecodeUTF8(buf, len, 0);  <--- Needs to provide an 
> error handling method here
>     } else {
>       return PyBytes_FromStringAndSize(buf, len);
>     }
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to