ID: 28646 Comment by: papercrane at reversefold dot com Reported By: php at richardneill dot org Status: Open Bug Type: Feature/Change Request PHP Version: 4.3.6 New Comment:
If you're really worried about magic_quotes (which I don't use anyway...), then str_demoroniser should be magic_quotes aware, escaping quotes only if magic_quotes_runtime is on. Or perhaps it should be a second parameter to the function to escape quotes or not. Making it do one or the other would break *someone's* scripts. Previous Comments: ------------------------------------------------------------------------ [2004-06-06 00:59:16] php at richardneill dot org For safety's sake, it's probably wiser to have < > \` \' \" as the replacements. Otherwise, we have a nice big security hole, since magic_quotes gets bypassed. ------------------------------------------------------------------------ [2004-06-05 23:55:34] php at richardneill dot org Description: ------------ Feature request: str_demoronise() On my website, I often find users pasting content that was written in Microsoft Word, and which contains undisplayable "ASCII" characters where there should be single/double quotes. Anyone viewing the result on a non-MS platform gets to see rectangles instead of quotes. The problem has been solved in perl here: http://www.fourmilab.ch/webtools/demoroniser/ I quote: ============ Microsoft use their own "extension" to Latin-1, in which a variety of characters which do not appear in Latin-1 are inserted in the range 0x82 through 0x95--this having the merit of being incompatible with both Latin-1 and Unicode, which reserve this region for additional control characters. ============= I'd like to suggest the addition of a str_demoronise() function which fixes these wrong characters, and replaces them by the correct ASCII. Reproduce code: --------------- >From the source of demoroniser, here are the substitutions made. The MS column is what Microsoft use (in Hex); the FIX column is the replacement: MS FIX 0x82 , 0x83 <em>f</em> 0x84 ,, 0x85 ... 0x88 ^ 0x89 ' °/°°' <-- whitsepace; no '' quotes 0x8B < 0x8C Oe 0x91 ` 0x92 ' 0x93 " 0x94 " 0x95 * 0x96 - 0x97 -- 0x98 <sup>~</sup> 0x99 <sup>TM</sup> 0x9B > 0x9C oe ------------------------------------------------------------------------ -- Edit this bug report at http://bugs.php.net/?id=28646&edit=1