On Wed, Jun 8, 2011 at 12:32 PM, MK <[email protected]> wrote: > Is there any intention to fix couch's handling of "unusual" unicode > characters? One of the "unusual" characters is the right single quote > (226,128,153) which is a valid utf8 character and also not very > "unusual" IMO. > > I have an interface which allows users to add and edit text in a db > document (again, not very unusual) and this one came up because of > someone cutting and pasting some text from a source which used the > right single quote as an apostrophe (which is just plain common -- in > fact they are used in the online "Definitive Guide"). > > So I am having to maintain a switch statement which filters out these > characters and replaces them with html entities before they get sent > to couch, which is okay in my case since the documents are just being > used as html pages anyway. > > But it's an awkward and unnecessary solution: individual > developers should not have to be dealing with this, proper utf8 > handling should be hard coded into couch. For one thing, it means that > anyone worried about such "unusual" possibilities cannot use > couchapp or couch directly -- data has to be filtered first server side. > Although spidermonkey handles utf8 fine, depending on client side > filtering is not always an alternative. > > Sincerely, MK > > -- > "Enthusiasm is not the enemy of the intellect." (said of Irving Howe) > "The angel of history[...]is turned toward the past." (Walter Benjamin) > >
What version of CouchDB are you using and what is an actual request look like? A recent check on trunk shows both decoders handle your case fine: 1> mochijson2:decode(<<"\"", 226,128,153, "\"">>). <<226,128,153>> 2> ejson:decode(<<"\"", 226,128,153, "\"">>). <<226,128,153>> 3>
