One problem that often bites me - someone forgets to include the UTF-8 charset in the Content-Type header. Missing that can often mangle the handling of high byte characters.
When setting your Content-Type with curl this is often done something like: curl -H "Content-Type: application/json; charset=utf-8" .... Jim Klo Senior Software Engineer Center for Software Engineering SRI International On Jun 8, 2011, at 9:35 AM, Paul Davis wrote: > On Wed, Jun 8, 2011 at 12:32 PM, MK <[email protected]> wrote: >> Is there any intention to fix couch's handling of "unusual" unicode >> characters? One of the "unusual" characters is the right single quote >> (226,128,153) which is a valid utf8 character and also not very >> "unusual" IMO. >> >> I have an interface which allows users to add and edit text in a db >> document (again, not very unusual) and this one came up because of >> someone cutting and pasting some text from a source which used the >> right single quote as an apostrophe (which is just plain common -- in >> fact they are used in the online "Definitive Guide"). >> >> So I am having to maintain a switch statement which filters out these >> characters and replaces them with html entities before they get sent >> to couch, which is okay in my case since the documents are just being >> used as html pages anyway. >> >> But it's an awkward and unnecessary solution: individual >> developers should not have to be dealing with this, proper utf8 >> handling should be hard coded into couch. For one thing, it means that >> anyone worried about such "unusual" possibilities cannot use >> couchapp or couch directly -- data has to be filtered first server side. >> Although spidermonkey handles utf8 fine, depending on client side >> filtering is not always an alternative. >> >> Sincerely, MK >> >> -- >> "Enthusiasm is not the enemy of the intellect." (said of Irving Howe) >> "The angel of history[...]is turned toward the past." (Walter Benjamin) >> >> > > What version of CouchDB are you using and what is an actual request look like? > > A recent check on trunk shows both decoders handle your case fine: > > 1> mochijson2:decode(<<"\"", 226,128,153, "\"">>). > <<226,128,153>> > 2> ejson:decode(<<"\"", 226,128,153, "\"">>). > <<226,128,153>> > 3>
smime.p7s
Description: S/MIME cryptographic signature
