2010/7/23 Ian Fette (イアンフェッティ) <ife...@google.com>: > http://code.google.com/apis/safebrowsing/developers_guide_v2.html#Canonicalization > lists > some interesting cases we've come across on the anti-phishing team in > Google. To the extent you're concerned with / interested in > canonicalizaiton, it may be worth taking a look at (not to suggest you > follow that in determining how to parse/canonicalize URLs, but rather to > make sure that you have some "correct" way of handling the listed URLs).
Thanks. That's helpful. > BTW, are you covering canonicalization? Yes. The three main things I'm hoping to cover are parsing, canonicalization, and resolving relative URLs. Adam > On Fri, Jul 23, 2010 at 9:02 PM, Boris Zbarsky <bzbar...@mit.edu> wrote: >> On 7/23/10 11:59 PM, Silvia Pfeiffer wrote: >>> Is that URLs as values of attributes in HTML or is that URLs as pasted >>> into the address bar? I believe their processing differs... >> >> It certainly does in Firefox (the latter have a lot more fixup done to >> them, and there are also differences in terms of how character encodings are >> handled). >> >> I would be particularly interested in data on this last, across different >> browsers, operating systems, and locales... There seem to be servers out >> there expecting their URIs in UTF-8 and others expecting them in ISO-8859-1, >> and it's not clear to me how to make things work with them all. >> >> -Boris > >