Thanks Jim, nice tip which I was not aware of!
A+ Dave On 9 June 2011 07:28, Jim Klo <[email protected]> wrote: > One problem that often bites me - someone forgets to include the UTF-8 > charset in the Content-Type header. Missing that can often mangle the > handling of high byte characters. > When setting your Content-Type with curl this is often done something like: > curl -H "Content-Type: application/json; charset=utf-8" .... > Jim Klo > Senior Software Engineer > Center for Software Engineering > SRI International > > > > On Jun 8, 2011, at 9:35 AM, Paul Davis wrote: > > On Wed, Jun 8, 2011 at 12:32 PM, MK <[email protected]> wrote: > > Is there any intention to fix couch's handling of "unusual" unicode > > characters? One of the "unusual" characters is the right single quote > > (226,128,153) which is a valid utf8 character and also not very > > "unusual" IMO. > > I have an interface which allows users to add and edit text in a db > > document (again, not very unusual) and this one came up because of > > someone cutting and pasting some text from a source which used the > > right single quote as an apostrophe (which is just plain common -- in > > fact they are used in the online "Definitive Guide"). > > So I am having to maintain a switch statement which filters out these > > characters and replaces them with html entities before they get sent > > to couch, which is okay in my case since the documents are just being > > used as html pages anyway. > > But it's an awkward and unnecessary solution: individual > > developers should not have to be dealing with this, proper utf8 > > handling should be hard coded into couch. For one thing, it means that > > anyone worried about such "unusual" possibilities cannot use > > couchapp or couch directly -- data has to be filtered first server side. > > Although spidermonkey handles utf8 fine, depending on client side > > filtering is not always an alternative. > > Sincerely, MK > > -- > > "Enthusiasm is not the enemy of the intellect." (said of Irving Howe) > > "The angel of history[...]is turned toward the past." (Walter Benjamin) > > > > What version of CouchDB are you using and what is an actual request look > like? > > A recent check on trunk shows both decoders handle your case fine: > > 1> mochijson2:decode(<<"\"", 226,128,153, "\"">>). > <<226,128,153>> > 2> ejson:decode(<<"\"", 226,128,153, "\"">>). > <<226,128,153>> > 3> > >
