-------------------------------------------- On Fri, 11/22/13, Steven D'Aprano <st...@pearwood.info> wrote:
Subject: Re: [Tutor] Is there a package to "un-mangle" characters? To: tutor@python.org Date: Friday, November 22, 2013, 4:30 PM On Thu, Nov 21, 2013 at 12:04:19PM -0800, Albert-Jan Roskam wrote: > Hi, > > Today I had a csv file in utf-8 encoding, but part of the accented > characters were mangled. The data were scraped from a website and it > turned out that at least some of the data were mangled on the website > already. Bits of the text were actually cp1252 (or cp850), I think, > even though the webpage was in utf-8 Is there any package that helps > to correct such issues? Python has superpowers :-) http://blog.luminoso.com/2012/08/20/fix-unicode-mistakes-with-python/ ====> Cool website! Love the corny terminology he uses. The function he created may be useful in situations where chardet, charset and icu may not be useful: a small amount of textual data that's a total mess. _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor