That would probably be best, yes. I didn't see how to do so. ons. 16. sep. 2015 15.31 skrev Sebastien Bacher <seb...@ubuntu.com>:
> Thanks, that should probably be subscribed upstream for review... > > -- > You received this bug notification because you are subscribed to the bug > report. > https://bugs.launchpad.net/bugs/1470032 > > Title: > libpst / readpst incorrectly decodes latin1 contacts, etc. > > Status in libpst package in Ubuntu: > New > > Bug description: > After a client of ours moved from Exchange 2003 to Office 365 we had > to get some data out of PST-files, which mostly worked well, but > apparently Contacts and some Tasks have a tendency på be incorrectly > decoded into gibberish. > > As far as I can tell, the problem is that the data is interpreted to > be UTF16 that needs to be converted to UTF8 and the charset I defined > on the commandline for readpst is not consulted in this transaction. > > When inspecting the debug log, it is clear to human eyes that this > conversion is incorrect and if anything should have been from the > charset I specified to UTF8 and not from UTF16 to UTF8. > > As far as I can tell, the problem occurs in the 'pst_vb_utf16to8', > which seems to be called indescrimately, and it seems that the charset > I specify to readpst is rarely used, if ever. > > I wonder if it would be possible to have a switch to present the user > with the unconverted version and possibly a couple of encoding and let > the user decide the proper one. There are several contacts that are > fine, but over 200 that suffer from this garbling of the data. > Unfortunately it is more or less impossible to get from the utf8 > version of the non-utf16 data back to latin1, as far as I can tell. > > This is a sample contact that has the issue (Most are totally illegible, > but a few had some text I could search for): > FN:Ballerup Politi > N:汋獯整�;潊湨祮;;; > EMAIL:慂汬牥灵倠汯瑩<U+2069>䨨䡃灀汯瑩<U+2E69>此� > ADR;TYPE=work:;;;;;; > LABEL;TYPE=work:汇<U+202E><U+E552>桤獵敶<U+206A>㤱\n慂汬牥灵 㜲〵\n慄浮牡� > TEL;TYPE=work,voice:㤳㔠‴㐱㐠‸潬慫<U+206C>㐠㌲� > TEL;TYPE=cell,voice: 72 58 78 29 (20 90 98 02) > TITLE:楖散潰楬楴潫浭獩狦 > NOTE:Gladsaxe Politi (kredsen) 3969 1448\n > VERSION: 3.0 > END:VCARD > > Attached is debug version of the parsing of this contact. > > ProblemType: Bug > DistroRelease: Ubuntu 14.04 > Package: pst-utils 0.6.59-1build1 > ProcVersionSignature: Ubuntu 3.13.0-24.47-generic 3.13.9 > Uname: Linux 3.13.0-24-generic x86_64 > ApportVersion: 2.14.1-0ubuntu3.11 > Architecture: amd64 > CurrentDesktop: X-Cinnamon > Date: Tue Jun 30 11:05:51 2015 > EcryptfsInUse: Yes > InstallationDate: Installed on 2014-07-27 (337 days ago) > InstallationMedia: Linux Mint 17 "Qiana" - Release amd64 20140624 > ProcEnviron: > SHELL=/bin/bash > TERM=xterm > PATH=(custom, no user) > LANG=da_DK.UTF-8 > XDG_RUNTIME_DIR=<set> > SourcePackage: libpst > UpgradeStatus: No upgrade log present (probably fresh install) > > To manage notifications about this bug go to: > > https://bugs.launchpad.net/ubuntu/+source/libpst/+bug/1470032/+subscriptions > -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1470032 Title: libpst / readpst incorrectly decodes latin1 contacts, etc. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/libpst/+bug/1470032/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs