Thanks for the report. I filed KUDU-2845 to track the issue.
On Tue, Jun 11, 2019 at 9:44 AM Todd Lipcon <t...@cloudera.com> wrote: > > I guess the issue is that we use rapidjson's 'String' support to write out > C++ strings, which are binary data, not valid UTF8. That's somewhat incorrect > of us, and we should be base64-encoding such binary data. > > Fixing this is a little bit incompatible, but for something like partition > keys I think we probably should do it anyway and release note it, considering > partition keys are quite likely to be invalid UTF8. > > -Todd > > On Tue, Jun 11, 2019 at 6:08 AM Pavel Martynov <mr.xk...@gmail.com> wrote: >> >> Hi, guys! >> >> We trying to use an output of "kudu cluster ksck master -ksck_format >> json_compact" for integration with our monitoring system and hit a little >> strange. Some part of output can't be read as UTF-8 with Python 3: >> $ kudu cluster ksck master -ksck_format json_compact > kudu.json >> $ python >> with open(' kudu.json', mode='rb') as file: >> bs = file.read() >> bs.decode('utf-8') >> UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 705196: >> invalid start byte >> >> There how SublimeText shows this block of text: >> https://yadi.sk/i/4zpWKZ37iP8OEA >> As you can see kudu tool encodes zeros as \u0000, but don't encode some >> other non-text bytes. >> >> What do you think about it? >> >> -- >> with best regards, Pavel Martynov > > > > -- > Todd Lipcon > Software Engineer, Cloudera