Loke created COUCHDB-3173: ----------------------------- Summary: Views return corrupt data for text fields containing non-BMP characters Key: COUCHDB-3173 URL: https://issues.apache.org/jira/browse/COUCHDB-3173 Project: CouchDB Issue Type: Bug Components: View Server Support Reporter: Loke
When inserting non-BMP character (i.e. characters with a Unicode codepoint above {{U+FFFF}}), the content gets corrupted after reading it from a view. Every instance of these characters are returned with an appended {{U+FFFD REPLACEMENT CHARACTER}}. To reproduce, use the following commands. Create the document containing a field with the character {{U+1F604 SMILING FACE WITH OPEN MOUTH AND SMILING EYES}}: {noformat} $ curl -X PUT -d '{"type":"foo","value":"😄"}' http://localhost:5984/foo/foo2 {"ok":true,"id":"foo2","rev":"1-d7da3cd352ef74f6391cc13601081214"} {noformat} Get the document to ensure that it was saved properly: {noformat} curl -X GET http://localhost:5984/foo/foo2 {"_id":"foo2","_rev":"1-d7da3cd352ef74f6391cc13601081214","type":"foo","value":"😄"} {noformat} Create a view that will return that document: {noformat} $ curl --user user:password -X PUT -d '{"language":"javascript","views":{"v":{"map":"function(doc){if(doc.type===\"foo\")emit(doc._id,doc);}"}}}' http://localhost:5984/foo/_design/bugdemo {"ok":true,"id":"_design/bugdemo","rev":"1-817af2dafecb4cf8213aa7063551daac"} {noformat} Get the document from the view: {noformat} $ curl -X GET http://localhost:5984/foo/_design/bugdemo/_view/v {"total_rows":1,"offset":0,"rows":[ {"id":"foo2","key":"foo2","value":{"_id":"foo2","_rev":"1-d7da3cd352ef74f6391cc13601081214","type":"foo","value":"😄�"}} ]} {noformat} Now we can see that the field {{value}} now contains two characters. The original character as well as {{U+FFFD}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)