Thanks for the report. I filed KUDU-2845 to track the issue.

On Tue, Jun 11, 2019 at 9:44 AM Todd Lipcon <t...@cloudera.com> wrote:
>
> I guess the issue is that we use rapidjson's 'String' support to write out 
> C++ strings, which are binary data, not valid UTF8. That's somewhat incorrect 
> of us, and we should be base64-encoding such binary data.
>
> Fixing this is a little bit incompatible, but for something like partition 
> keys I think we probably should do it anyway and release note it, considering 
> partition keys are quite likely to be invalid UTF8.
>
> -Todd
>
> On Tue, Jun 11, 2019 at 6:08 AM Pavel Martynov <mr.xk...@gmail.com> wrote:
>>
>> Hi, guys!
>>
>> We trying to use an output of "kudu cluster ksck master -ksck_format 
>> json_compact" for integration with our monitoring system and hit a little 
>> strange. Some part of output can't be read as UTF-8 with Python 3:
>> $ kudu cluster ksck master -ksck_format json_compact > kudu.json
>> $ python
>> with open(' kudu.json', mode='rb') as file:
>>   bs = file.read()
>>   bs.decode('utf-8')
>> UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 705196: 
>> invalid start byte
>>
>> There how SublimeText shows this block of text: 
>> https://yadi.sk/i/4zpWKZ37iP8OEA
>> As you can see kudu tool encodes zeros as \u0000, but don't encode some 
>> other non-text bytes.
>>
>> What do you think about it?
>>
>> --
>> with best regards, Pavel Martynov
>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera

Reply via email to