[ https://issues.apache.org/jira/browse/CASSANDRA-11030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15107129#comment-15107129 ]
Paulo Motta commented on CASSANDRA-11030: ----------------------------------------- There are two issues at play here. The first is that the default Windows terminal encoding is not {{utf-8}}, so in order to display/input {{utf-8}} characters you must set the terminal encoding (code page in Windows nomenclature) to {{cp65001}}, by issuing the command {{chcp 65001}} before starting cqlsh. The second issue is that there is no codec for {{cp65001}} in python < 3.3 (this was fixed in issue [13216|https://bugs.python.org/issue13216] in Python [3.3+|https://docs.python.org/dev/whatsnew/3.3.html#codecs]). A known workaround is to register a copy of the {{utf-8}} codec to encode/decode {{cp65001}}. So, if the platform is native windows (the issue doesn't happen on cygwin), and the encoding is set to {{utf-8}} but the terminal encoding is not {{cp65001}}, a warning is print for the user to change its codepoint to {{cp65001}} to support {{utf-8}} encoding. Furthermore, if the {{cp650001}} is the default encoding and the python version is less than 3.3, the {{utf-8}} codec is registered as {{cp65001}}. ||2.2||3.0||3.3||trunk|| |[branch|https://github.com/apache/cassandra/compare/cassandra-2.2...pauloricardomg:2.2-11030]|[branch|https://github.com/apache/cassandra/compare/cassandra-3.0...pauloricardomg:3.0-11030]|[branch|https://github.com/apache/cassandra/compare/cassandra-3.3...pauloricardomg:3.3-11030]|[branch|https://github.com/apache/cassandra/compare/trunk...pauloricardomg:trunk-11030]| |[testall|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-2.2-11030-testall/lastCompletedBuild/testReport/]|[testall|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-3.0-11030-testall/lastCompletedBuild/testReport/]|[testall|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-3.3-11030-testall/lastCompletedBuild/testReport/]|[testall|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-trunk-11030-testall/lastCompletedBuild/testReport/]| |[dtest|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-2.2-11030-dtest/lastCompletedBuild/testReport/]|[dtest|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-3.0-11030-dtest/lastCompletedBuild/testReport/]|[dtest|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-3.3-11030-dtest/lastCompletedBuild/testReport/]|[dtest|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-trunk-11030-dtest/lastCompletedBuild/testReport/]| Below is a sample execution with different encoding variations (default vs utf-8/cp65001): {noformat} C:\Users\Paulo\Repositories\cassandra [cassandra-2.2 +8 ~1 -0 !]> bin\cqlsh.bat Connected to test at 127.0.0.1:9042. [cqlsh 5.0.1 | Cassandra 2.2.4-SNAPSHOT | CQL spec 3.3.1 | Native protocol v4] Use HELP for help. cqlsh> select * from bla.test; bla -------------- joπo ßlcides bla nπoτ (3 rows) cqlsh> select * from bla.test where bla = 'nãoç'; bla ----- (0 rows) cqlsh> exit; C:\Users\Paulo\Repositories\cassandra [cassandra-2.2 +8 ~1 -0 !]> bin\cqlsh.bat --encoding utf-8 WARNING: console codepage must be set to cp65001 to support utf-8 encoding on Windows platforms. If you experience encoding problems, change your console codepage with 'chcp 65001' before starting cqlsh. Connected to test at 127.0.0.1:9042. [cqlsh 5.0.1 | Cassandra 2.2.4-SNAPSHOT | CQL spec 3.3.1 | Native protocol v4] Use HELP for help. cqlsh> select * from bla.test; bla -------------- jo├úo ├ílcides bla n├úo├º (3 rows) cqlsh> select * from bla.test where bla = 'nãoç'; Traceback (most recent call last): File "C:\Users\Paulo\Repositories\cassandra\bin\\cqlsh.py", line 1044, in get_input_line self.lastcmd = raw_input(prompt).decode(self.encoding) File "C:\tools\python2\lib\encodings\utf_8.py", line 16, in decode return codecs.utf_8_decode(input, errors, True) UnicodeDecodeError: 'utf8' codec can't decode byte 0x87 in position 39: invalid start byte WARNING: console codepage must be set to cp65001 to support utf-8 encoding on Windows platforms. If you experience encoding problems, change your console codepage with 'chcp 65001' before starting cqlsh. cqlsh> exit; C:\Users\Paulo\Repositories\cassandra [cassandra-2.2 +8 ~1 -0 !]> chcp 65001 Active code page: 65001 C:\Users\Paulo\Repositories\cassandra [cassandra-2.2 +8 ~1 -0 !]> bin\cqlsh.bat --encoding utf-8 Connected to test at 127.0.0.1:9042. [cqlsh 5.0.1 | Cassandra 2.2.4-SNAPSHOT | CQL spec 3.3.1 | Native protocol v4] Use HELP for help. cqlsh> select * from bla.test; bla -------------- joão álcides bla nãoç (3 rows) cqlsh> select * from bla.test where bla = 'nãoç'; bla ------ nãoç (1 rows) cqlsh> insert into bla.test (bla ) VALUES ( 'ãnothér' ); cqlsh> select * from bla.test where bla = 'ãnothér'; bla --------- ãnothér (1 rows) cqlsh> exit; {noformat} [~Stefania] would you mind reviewing? Would you have a Windows10 box to test it? I tested only on win7 and it works correctly. > non-ascii characters incorrectly displayed/inserted on cqlsh on Windows > ----------------------------------------------------------------------- > > Key: CASSANDRA-11030 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11030 > Project: Cassandra > Issue Type: Bug > Reporter: Paulo Motta > Assignee: Paulo Motta > Priority: Minor > Labels: cqlsh, windows > > {noformat} > C:\Users\Paulo\Repositories\cassandra [2.2-10948 +6 ~1 -0 !]> .\bin\cqlsh.bat > --encoding utf-8 > Connected to test at 127.0.0.1:9042. > [cqlsh 5.0.1 | Cassandra 2.2.4-SNAPSHOT | CQL spec 3.3.1 | Native protocol v4] > Use HELP for help. > cqlsh> INSERT INTO bla.test (bla ) VALUES ('não') ; > cqlsh> select * from bla.test; > bla > ----- > n?o > (1 rows) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)