On 12 July 2012 19:28, Michael Della Bitta <michael.della.bi...@appinions.com> wrote: > Perhaps they're being displayed as question marks, but the actual > character is different?
This is very likely to be the case, as messing up the encoding can leave one with entirely unexpected characters. The '?' probably corresponds to a non-displayable character. One way to find the actual characters is to examine the content in a capable Unicode editor. In my experience, yudit ( http://yudit.org ) has the best Unicode support, and will show you the actual hex code-point even for non-displayable characters. The bad news is that there is probably now way to recover the actual encoding. Plus, it might be difficult to have a generic way of identifying such documents. Regards, Gora