Hey, Aidan.

Give it another try?  After looking long and hard at our code, I  
discovered that the problem was in fact not in our code. :)  Perl's  
Encode library was outdated on our servers.  I verified your test  
case passed after upgrading Encode on a QA machine, and then we  
pushed the upgrade out to the rest of the site.  If we're lucky, you  
shouldn't encounter that problem ever again.  That's not to say you  
won't encounter other problems.  If you do, please let us know.

Also, thanks for the detailed problem report.  UTF-8 can be a tricky  
bastard, so the more information the better.  Especially since I've  
got to track these things down. :)

-- 
Rocco Caputo - [EMAIL PROTECTED]

On May 2, 2007, at 15:57, Aidan Kehoe wrote:

>
> Ar an dara lá de mí Bealtaine, scríobh Joshua Schachter:
>
> > Are you submitting non-utf8 stuff to begin with?
>
> This is the standard web interface. No code I’ve written is  
> involved.
>
> If Firefox were submitting non-UTF-8 (because it thought the form  
> took Latin
> 1, for example), the [Φ] would have been encoded as [Φ] [1] -- 
> try
> pasting it (phi) into the form here;
>
> http://www.parhasard.net/latin-1-form.php
>
> to see that behaviour.
>
> The tags from the POST string below:
>
> tags=japanese+language+%5B%CE%A6%5D+%5Bh%5D+%5B%C3%A7%5D+from-peter- 
> t-daniels+pasta.cantbedone.org
>
> The non-alphanumeric-ASCII tags as hex:
>
> %5B%CE%A6%5D+%5Bh%5D+%5B%C3%A7%5D
>
> Some Emacs Lisp:
>
> (decode-coding-string "\x5B\xCE\xA6\x5D+\x5Bh\x5D+\x5B\xC3\xA7\x5D"  
> 'utf-8)
> => "[Φ]+[h]+[ç]"
>
> I see the same behaviour with Firefox 2.0.0.3 on Mac OS X 10.4.9,  
> and with
> Safari 2.0.4 on the same OS.
>
> If you can’t reproduce this, try looking at the same entry from a  
> page you
> haven’t previously viewed today--the caching is pretty aggressive  
> right now,
> as far as I can work out.
>
> Again, I’ve made a UTF-8 encoded version of this mail available  
> here:
>
> http://parhasard.net/[EMAIL PROTECTED]
>
> Please look at that if you find anything unclear in this mail. I can
> guarantee that the file at that address is served as UTF-8, is  
> encoded as
> UTF-8, and conveys what I intend, but guaranteeing that Yahoo  
> Groups will
> not make my message contradict itself is beyond my powers.
>
> Bye,
> Aidan
>
> [1] That is, in case Yahoo chooses to do its own thing with this  
> mail, left
> square bracket, ampersand, hash-mark-otherwise-known-as-North-American
> pound-sign, digit nine, digit three, digit four, semicolon, right  
> square
> bracket.
>
> > > -----Original Message-----
> > > From: [email protected]
> > > [mailto:[EMAIL PROTECTED] On Behalf Of aeohek
> > > Sent: Wednesday, May 02, 2007 4:24 AM
> > > To: [email protected]
> > > Subject: [ydn-delicious] Latin 1, and only Latin 1, lost on tag  
> edit
> > >
> > > Hi,
> > >
> > > When I edit the tags on:
> > >
> > > http://del.icio.us/url/4720413157054e2059cf355a64300a3c
> > >
> > > (my username is aidan , right now I'm the only person to have
> > > posted that
> > > URI), any Latin-1 (excluding ASCII) in a tag means that tag
> > > is truncated
> > > from the last non-Latin-1 character onwards. This happens both  
> with an
> > > Ajax
> > > edit and with a full-screen edit.
> > >
> > > In detail; currently my tags are:
> > >
> > > japanese language [Φ] [h] [ç] from-peter-t-daniels
> > > pasta.cantbedone.org
> > >
> > > (you can't see them on the URL page right now, it appears to be
> > > cached; try
> > > http://del.icio.us/tag/%5B%CE%A6%D5 if the cache hasn't expired  
> by the
> > > time
> > > you read this). Φ is a Greek character, U+30A6, ç is a
> > > Latin-1 character,
> > > U+00E7. The rest of the characters are US-ASCII. Browser is  
> Firefox
> > > 1.5.0.11, platform Windows XP.
> > >
> > > I edit those tags--I click on edit, then full-screen edit, and add
> > > hi-there
> > > as a tag, such that the displayed text is now:
> > >
> > > japanese language [Φ] [h] [ç] from-peter-t-daniels
> > > pasta.cantbedone.org hi-there
> > >
> > > I then click save, and it redirects away from that tag. But when I
> > > examine
> > > the URL details again, it no longer has a [ç] tag, but it has a  
> new [
> > > tag. It still has the [Φ] tag. The same happens when I
> > > edit other tags
> > > containing Latin 1, or when I create new entries with tags  
> containing
> > > Latin
> > > 1.
> > >
> > > Using the Live HTTP Headers extension, I see that the POST request
> > > submitted
> > > was:
> > >
> > > POST /aidan/%5B%C3%A7%5D?779822
> > > url=http%3A%2F%2Fpasta.cantbedone.org%2Fpages%2FPXXU7p.htm&old
> > > url=http%3A%2F%2Fpasta.cantbedone.org%2Fpages%2FPXXU7p.htm&des
> > > cription=%5B%CE%A6%5D%2C+%5Bh%5D+in+Japanese&notes=Cute%3B+Jap
> > anese+went+through+a+historical+development+similar+to+the+%2Ff% 
> 2F&tags=japanese+language+%5B%> CE%A6%5D+%5Bh%5D+%5B%C3%A7%5D+from- 
> peter-t-daniels+pasta.cantb
> > > edone.org+&jump=no&date=2007-03-10T14%3A31%3A10Z&key=1540afd50
> > > 2e4ce4af3cb2bac8df225d1
> > >
> > > which, when URL-decoded and converted to UTF-8, gives this:
> > >
> > > POST /aidan?312757
> > > url=http://pasta.cantbedone.org/pages/PXXU7p.htm&oldurl=http:/
> > > /pasta.cantbedone.org/pages/PXXU7p.htm&description=[Φ],+[
> > > h]+in+Japanese&notes=Cute;+Japanese+went+through+a+historical+
> > > development+similar+to+the+/f/&tags=japanese+language+[Φ]
> > > +[h]+[ç]+from-peter-t-daniels+pasta.cantbedone.org+&jump=no&da
> > > te=2007-03-10T14:31:10Z&key=1540afd502e4ce4af3cb2bac8df225d
> > >
> > > Now, the tags CGI variable is correct there, so this seems to be a
> > > server-side problem. I can work around it by renaming the tag
> > > [ to [ç], or
> > > espa to español.
> > >
> > > I've made a UTF-8 encoded version of this email at
> > > http://www.parhasard.net/del.icio.us-latin-1-problem.txt ,  
> since Yahoo
> > > Groups appears to prefer to treat it as Latin 1.
> > >
> > > Best regards, and please tell me if I should report this
> > > somewhere else.
> > >
> > > Aidan

Reply via email to