[ https://issues.apache.org/jira/browse/IMPALA-11325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Joe McDonnell resolved IMPALA-11325. ------------------------------------ Fix Version/s: Impala 4.2.0 Assignee: Joe McDonnell Resolution: Fixed > Impala-shell hits UnicodeDecodeError when outputting Unicode via --output_file > ------------------------------------------------------------------------------ > > Key: IMPALA-11325 > URL: https://issues.apache.org/jira/browse/IMPALA-11325 > Project: IMPALA > Issue Type: Bug > Components: Clients > Affects Versions: Impala 4.2.0 > Reporter: Joe McDonnell > Assignee: Joe McDonnell > Priority: Blocker > Fix For: Impala 4.2.0 > > > When running impala-shell and trying to output Unicode to a fail via --output > file, it fails: > {noformat} > ishell -B -q "select '引'" --output_file=joetest3.txt > /home/joe/view2/Impala/shell/option_parser.py:359: UnicodeWarning: Unicode > equal comparison failed to convert both arguments to Unicode - interpreting > them as being unequal > if '--live_progress' in sys.argv and '--disable_live_progress' in sys.argv: > /home/joe/view2/Impala/shell/option_parser.py:363: UnicodeWarning: Unicode > equal comparison failed to convert both arguments to Unicode - interpreting > them as being unequal > if '--strict_hs2_protocol' in sys.argv: > /home/joe/view2/Impala/shell/option_parser.py:369: UnicodeWarning: Unicode > equal comparison failed to convert both arguments to Unicode - interpreting > them as being unequal > if '--verbose' in sys.argv and '--quiet' in sys.argv: > Starting Impala Shell with no authentication using Python 2.7.16 > Warning: live_progress only applies to interactive shell sessions, and is > being skipped for now. > Opened TCP connection to localhost:21050 > Connected to localhost:21050 > Server version: impalad version 4.1.0-SNAPSHOT DEBUG (build > 4236c307b971881a3b1d85068db5b053a9c34cfa) > Query: select '引' > Query submitted at: 2022-05-31 08:31:50 (Coordinator: > http://joemcdonnell:25000) > Query progress can be monitored at: > http://joemcdonnell:25000/query_plan?query_id=2347462fe8a18544:bbeedc1800000000 > UnicodeDecodeError : 'ascii' codec can't decode byte 0xe5 in position 0: > ordinal not in range(128) > Please check for columns containing binary data to find the possible source > of the error. > Could not execute command: select '引'{noformat} > This is specific to file output. This same query works if outputting to the > console. > This line seems to be the problem: > {noformat} > with open(self.filename, 'ab') as out_file: > # Note that instances of this class do not persist, so it's fine to > # close the we close the file handle after each write. > out_file.write(formatted_data.encode('utf-8')) # file opened in > binary mode <-------- > out_file.write(b'\n') > {noformat} > [https://github.com/apache/impala/blob/master/shell/shell_output.py#L115] > It seems to work if we remove the .encode('utf-8'). -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org