Re: when will utf8 handling be fixed?

Paul Davis Wed, 08 Jun 2011 09:37:30 -0700

On Wed, Jun 8, 2011 at 12:32 PM, MK <[email protected]> wrote:
> Is there any intention to fix couch's handling of "unusual" unicode
> characters?  One of the "unusual" characters is the right single quote
> (226,128,153) which is a valid utf8 character and also not very
> "unusual" IMO.
>
> I have an interface which allows users to add and edit text in a db
> document (again, not very unusual) and this one came up because of
> someone cutting and pasting some text from a source which used the
> right single quote as an apostrophe (which is just plain common -- in
> fact they are used in the online "Definitive Guide").
>
> So I am having to maintain a switch statement which filters out these
> characters and replaces them with html entities before they get sent
> to couch, which is okay in my case since the documents are just being
> used as html pages anyway.
>
> But it's an awkward and unnecessary solution: individual
> developers should not have to be dealing with this, proper utf8
> handling should be hard coded into couch.   For one thing, it means that
> anyone worried about such "unusual" possibilities cannot use
> couchapp or couch directly -- data has to be filtered first server side.
> Although spidermonkey handles utf8 fine, depending on client side
> filtering is not always an alternative.
>
> Sincerely, MK
>
> --
> "Enthusiasm is not the enemy of the intellect." (said of Irving Howe)
> "The angel of history[...]is turned toward the past." (Walter Benjamin)
>
>


What version of CouchDB are you using and what is an actual request look like?

A recent check on trunk shows both decoders handle your case fine:

1> mochijson2:decode(<<"\"", 226,128,153, "\"">>).
<<226,128,153>>
2> ejson:decode(<<"\"", 226,128,153, "\"">>).
<<226,128,153>>
3>

Re: when will utf8 handling be fixed?

Reply via email to