You'll get more predictable results by focusing on what you should allow than by focusing on what you should strip.
Richard On Sat 12 December 2009 11:14:48 AJ ONeal <[email protected]> wrote: > If I have an untrusted byte-stream (mp3 and mp4 tags) which I'm using to > create file names I should do this before allowing the file to be created: > 1. truncate the name to 255 characters > 2. replace the characters '\', '\t', '\n', '\r', ':', and '/' with a > substitute character, say '_' > 3. replace non-utf8 characters with a substitute character, say '_' > > I'm basing that process on these assumptions: > 1. A *nix file system (ext2 , ext3, ext4, zfs) stores a name as an array of > bytes. > 2. Of those bytes, the only one that CANNOT be part of the file name is '/' > (and ':' on mac and zfs, I believe, and windows doesn't like '\\' but will > handle '*', '?', '`', etc) > 3. \\, \t, \n, \r, the bell sound of the terminal, although annoying, are > all valid byte strings for a filename. > 4. The filesystem will try to decode the filename as utf8 if it can, but > otherwise just show <?> to signify an unrecognized byte-sequence. > 5. *, ?, ', ", `, !, #, all other characters are all valid bytes for a > filename > > AJ ONeal -------------------- BYU Unix Users Group http://uug.byu.edu/ The opinions expressed in this message are the responsibility of their author. They are not endorsed by BYU, the BYU CS Department or BYU-UUG. ___________________________________________________________________ List Info (unsubscribe here): http://uug.byu.edu/mailman/listinfo/uug-list
