Scott, thanks for the reference. It looks like that encoding is the intent of quoteFIlenames and the fix I checked in this a.m. should handle edge cases that were causing an error in some testing we were doing.
The remaining issue in UTF-8 handling is another error in search.py that apparently Philip has a fix for, but hasn't checked in [1]. According to a note in that ticket (dated 1/26), Philip has it fixed in his wikis branch but was planning to port it to trunk. Maybe it would be better for the project if Philip would check in his wikis branch "as is" and then we could work on merging it as a community. Philip, do you have some thoughts on that? I'll move on to other bugs until I hear back. ---------------- [1] http://sycamore.devjavu.com/projects/sycamore/ticket/17 On 5/18/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: > >> There are a couple of bugs in trac related to UTF-8. It looks like > >> all file names and URLs are run through the pretty restrictive > >> quoteFilename in wikiutil.py. This recodes all characters that aren't > >> in (A-Z,a-z,1-9). In a UTF-8 environment, it doesn't work on UTF-8 > >> URLs. > > > > It looks like[1] only these ascii characters are allowed in a URI: > > > > Unreserved Characters (no encoding needed) > > A-Z (uppercase letters) > > a-z (lowercase letters) > > 0-9 (numbers) > > - (dash) > > _ (underscore) > > . (period) > > ~ (tilde) > > > > Reserved Characters (allowed only if encoded) > > ! = %21 > > * = %2A > > ' = %27 > > ( = %28 > > ) = %29 > > ; = %3B > > : = %3A > > @ = %40 > > & = %26 > > = = %3D > > + = %2B > > $ = %24 > > , = %2C > > / = %2F > > ? = %3F > > % = %25 > > # = %23 > > [ = %5B > > ] = %5D > > > > If the filename is meant to be displayed in the browser it make sense to > > encode it using percent encoding. > > To clarify[1]... > > "For worldwide interoperability, URIs have to be encoded uniformly. To map > the wide range of characters used worldwide into the 60 or so allowed > characters in a URI, a two-step process is used: > > * Convert the character string into a sequence of bytes using the > UTF-8 encoding > * Convert each byte that is not an ASCII letter or digit to %HH, where > HH is the hexadecimal value of the byte" > > Scott > -------- > [1] http://www.w3.org/International/O-URL-code.html > > _______________________________________________ > Sycamore-Dev mailing list > [EMAIL PROTECTED] > http://www.projectsycamore.org/ > https://tools.cernio.com/mailman/listinfo/sycamore-dev > _______________________________________________ Sycamore-Dev mailing list [EMAIL PROTECTED] http://www.projectsycamore.org/ https://tools.cernio.com/mailman/listinfo/sycamore-dev