Re: UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 10442: character maps to

2018-10-21 Thread Marko Rauhamaa
pjmcle...@gmail.com: > not sure why utf-8 gives an error when thats the most wide all caracters > inclusive right?/ Not all sequences of bytes are legal in UTF-8. For example, >>> b'\x80'.decode("utf-8") Traceback (most recent call last): File "", line 1, in UnicodeDecodeError: 'u

Re: UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 10442: character maps to

2018-10-21 Thread pjmclenon
On Saturday, October 20, 2018 at 1:23:50 PM UTC-4, Terry Reedy wrote: > On 10/20/2018 8:24 AM, pjmcle...@gmail.com wrote: > > On Saturday, October 13, 2018 at 7:24:14 PM UTC-4, MRAB wrote: > > > i have a sort of decode error > > UnicodeDecodeError; 'utf-8' can't decode byte 0xb0 in position 83064:

Re: UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 10442: character maps to

2018-10-21 Thread pjmclenon
) > File "C:\Python30\lib\encodings\cp1252.py", line 23, in decode > return codecs.charmap_decode(input,self.errors,decoding_table)[0] > UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position > 10442: character maps to > > The string at position 10442

Re: UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 10442: character maps to

2018-10-20 Thread Terry Reedy
On 10/20/2018 8:24 AM, pjmcle...@gmail.com wrote: On Saturday, October 13, 2018 at 7:24:14 PM UTC-4, MRAB wrote: i have a sort of decode error UnicodeDecodeError; 'utf-8' can't decode byte 0xb0 in position 83064: invalid start byte * and it seems to refer to my code line:

Re: UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 10442: character maps to

2018-10-20 Thread MRAB
On 2018-10-20 13:47, Peter J. Holzer wrote: On 2018-10-20 05:24:37 -0700, pjmcle...@gmail.com wrote: On Saturday, October 13, 2018 at 7:24:14 PM UTC-4, MRAB wrote: > with open(join("docs", path), encoding="utf-8") as f: hello MRAB and google forum I feel somewhat excluded by this salutaton, a

Re: UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 10442: character maps to

2018-10-20 Thread Peter J. Holzer
On 2018-10-20 05:24:37 -0700, pjmcle...@gmail.com wrote: > On Saturday, October 13, 2018 at 7:24:14 PM UTC-4, MRAB wrote: > > with open(join("docs", path), encoding="utf-8") as f: > > hello MRAB and google forum I feel somewhat excluded by this salutaton, as I'm not MRAB and I don't read this on

Re: UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 10442: character maps to

2018-10-20 Thread pjmclenon
On Saturday, October 13, 2018 at 7:24:14 PM UTC-4, MRAB wrote: > On 2018-10-14 00:13, pjmcle...@gmail.com wrote: > > On Wednesday, June 13, 2018 at 7:14:06 AM UTC-4, INADA Naoki wrote: > >> ​> 1st is this script is from a library module online open source > >> > >> If it's open source, why didn't

Re: UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 10442: character maps to

2018-10-15 Thread pjmclenon
On Saturday, October 13, 2018 at 7:24:14 PM UTC-4, MRAB wrote: > On 2018-10-14 00:13, pjmcle...@gmail.com wrote: > > On Wednesday, June 13, 2018 at 7:14:06 AM UTC-4, INADA Naoki wrote: > >> ​> 1st is this script is from a library module online open source > >> > >> If it's open source, why didn't

Re: UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 10442: character maps to

2018-10-13 Thread MRAB
On 2018-10-14 00:13, pjmcle...@gmail.com wrote: On Wednesday, June 13, 2018 at 7:14:06 AM UTC-4, INADA Naoki wrote: ​> 1st is this script is from a library module online open source If it's open source, why didn't you show the link to the soruce? I assume your code is this: https://github.com/

Re: UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 10442: character maps to

2018-10-13 Thread pjmclenon
On Wednesday, June 13, 2018 at 7:14:06 AM UTC-4, INADA Naoki wrote: > ​> 1st is this script is from a library module online open source > > If it's open source, why didn't you show the link to the soruce? > I assume your code is this: > > https://github.com/siddharth2010/String-Search/blob/6770c7

Re: UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 10442: character maps to

2018-10-05 Thread pjmclenon
On Wednesday, June 13, 2018 at 7:14:06 AM UTC-4, INADA Naoki wrote: > ​> 1st is this script is from a library module online open source > > If it's open source, why didn't you show the link to the soruce? > I assume your code is this: > > https://github.com/siddharth2010/String-Search/blob/6770c7

Re: UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 10442: character maps to

2018-08-30 Thread pjmclenon
On Thursday, August 30, 2018 at 2:05:16 PM UTC-4, pjmc...@gmail.com wrote: > On Thursday, August 30, 2018 at 1:29:48 PM UTC-4, MRAB wrote: > > On 2018-08-30 17:57, pjmcle...@gmail.com wrote: > > > On Thursday, August 30, 2018 at 9:28:09 AM UTC-4, Steven D'Aprano wrote: > > >> On Thu, 30 Aug 2018 05

Re: UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 10442: character maps to

2018-08-30 Thread pjmclenon
On Thursday, August 30, 2018 at 1:29:48 PM UTC-4, MRAB wrote: > On 2018-08-30 17:57, pjmcle...@gmail.com wrote: > > On Thursday, August 30, 2018 at 9:28:09 AM UTC-4, Steven D'Aprano wrote: > >> On Thu, 30 Aug 2018 05:21:30 -0700, pjmclenon wrote: > >> > >> > my question is ... at the moment i can

Re: UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 10442: character maps to

2018-08-30 Thread MRAB
On 2018-08-30 17:57, pjmcle...@gmail.com wrote: On Thursday, August 30, 2018 at 9:28:09 AM UTC-4, Steven D'Aprano wrote: On Thu, 30 Aug 2018 05:21:30 -0700, pjmclenon wrote: > my question is ... at the moment i can only run it on windows cmd prompt > with a multiple line entry as so:: > > pyth

Re: UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 10442: character maps to

2018-08-30 Thread pjmclenon
On Thursday, August 30, 2018 at 9:28:09 AM UTC-4, Steven D'Aprano wrote: > On Thu, 30 Aug 2018 05:21:30 -0700, pjmclenon wrote: > > > my question is ... at the moment i can only run it on windows cmd prompt > > with a multiple line entry as so:: > > > > python createIndex_tfidf.py stopWords.dat t

Re: UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 10442: character maps to

2018-08-30 Thread Steven D'Aprano
On Thu, 30 Aug 2018 05:21:30 -0700, pjmclenon wrote: > my question is ... at the moment i can only run it on windows cmd prompt > with a multiple line entry as so:: > > python createIndex_tfidf.py stopWords.dat testCollection.dat > testIndex.dat titleIndex.dat > > and then to query and use the n

Re: UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 10442: character maps to

2018-08-30 Thread pjmclenon
On Thursday, August 30, 2018 at 8:21:47 AM UTC-4, pjmc...@gmail.com wrote: > On Wednesday, June 13, 2018 at 7:14:06 AM UTC-4, INADA Naoki wrote: > > ​> 1st is this script is from a library module online open source > > > > If it's open source, why didn't you show the link to the soruce? > > I assu

Re: UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 10442: character maps to

2018-08-30 Thread pjmclenon
On Wednesday, June 13, 2018 at 7:14:06 AM UTC-4, INADA Naoki wrote: > ​> 1st is this script is from a library module online open source > > If it's open source, why didn't you show the link to the soruce? > I assume your code is this: > > https://github.com/siddharth2010/String-Search/blob/6770c7

Re: UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 10442: character maps to

2018-06-13 Thread bellcanadardp
On Wednesday, 13 June 2018 09:12:32 UTC-4, Steven D'Aprano wrote: > On Wed, 13 Jun 2018 03:55:58 -0700, bellcanadardp wrote: > > > the collFile has to be like a variable that would refer to the file > > Collection.dat..thats my best guess also in the error line , it doesnt > > actually open the f

Re: UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 10442: character maps to

2018-06-13 Thread bellcanadardp
On Wednesday, 13 June 2018 07:14:06 UTC-4, INADA Naoki wrote: > ​> 1st is this script is from a library module online open source > > If it's open source, why didn't you show the link to the soruce? > I assume your code is this: > > https://github.com/siddharth2010/String-Search/blob/6770c7a1e81

Re: UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 10442: character maps to

2018-06-13 Thread Steven D'Aprano
On Wed, 13 Jun 2018 03:55:58 -0700, bellcanadardp wrote: > the collFile has to be like a variable that would refer to the file > Collection.dat..thats my best guess also in the error line , it doesnt > actually open the file ... The file has to be opened if you are reading from it. If it isn't op

Re: UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 10442: character maps to

2018-06-13 Thread Steven D'Aprano
On Wed, 13 Jun 2018 04:01:24 -0700, bellcanadardp wrote: > for line in self.collFile.decode("utf-8"): > i actually write.encode...then i tried the decode but both dont have any > effect Raising AttributeError isn't an effect? py> f = open("/tmp/x") py> f.write.decode Traceback (most recent call

Re: UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 10442: character maps to

2018-06-13 Thread INADA Naoki
​> 1st is this script is from a library module online open source If it's open source, why didn't you show the link to the soruce? I assume your code is this: https://github.com/siddharth2010/String-Search/blob/6770c7a1e811a5d812e7f9f7c5c83a12e5b28877/createIndex.py And self.collFile is opened h

Re: UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 10442: character maps to

2018-06-13 Thread bellcanadardp
On Sunday, 10 June 2018 17:29:59 UTC-4, Cameron Simpson wrote: > On 10Jun2018 13:04, bellcanada...@gmail.com wrote: > >here is the full error once again > >to summarize, my script works fine in python2 > >i get this error trying to run it in python3 > >plz see below after the error, my settings f

Re: UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 10442: character maps to

2018-06-13 Thread bellcanadardp
On Sunday, 10 June 2018 17:29:59 UTC-4, Cameron Simpson wrote: > On 10Jun2018 13:04, bellcanada...@gmail.com wrote: > >here is the full error once again > >to summarize, my script works fine in python2 > >i get this error trying to run it in python3 > >plz see below after the error, my settings f

Re: UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 10442: character maps to

2018-06-10 Thread bellcanadardp
On Sunday, 10 June 2018 17:29:59 UTC-4, Cameron Simpson wrote: > On 10Jun2018 13:04, bellcanada...@gmail.com wrote: > >here is the full error once again > >to summarize, my script works fine in python2 > >i get this error trying to run it in python3 > >plz see below after the error, my settings f

Re: UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 10442: character maps to

2018-06-10 Thread Cameron Simpson
On 10Jun2018 13:04, bellcanada...@gmail.com wrote: here is the full error once again to summarize, my script works fine in python2 i get this error trying to run it in python3 plz see below after the error, my settings for python 2 and python 3 for me it seems i need to change some settings to '

Re: UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 10442: character maps to

2018-06-10 Thread Chris Angelico
On Mon, Jun 11, 2018 at 2:49 AM, wrote: > > excuse but sorry > i took the time to manually write the code error from the traceback as you > said > and thats because i cant seem to find a way to attach files here..which would > make it so easier for me and also i could attach snippets of the act

Re: UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 10442: character maps to

2018-06-10 Thread bellcanadardp
On Friday, 8 June 2018 18:26:28 UTC-4, Cameron Simpson wrote: > On 05Jun2018 06:42, bellcanada...@gmail.com wrote: > >On Sunday, 3 June 2018 20:11:43 UTC-4, Steven D'Aprano wrote: > >> Don't retype a summary of what you think the error is. "character > >> undefieed" is not a thing, and there is

Re: UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 10442: character maps to

2018-06-10 Thread bellcanadardp
On Sunday, 10 June 2018 10:23:47 UTC-4, Steven D'Aprano wrote: > Do you enjoy wasting your own time (as well as ours) by failing to follow > instructions? > > We can't read your mind to see the code you are using, and I am getting > frustrated from telling you the same thing again and again. >

Re: UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 10442: character maps to

2018-06-10 Thread Steven D'Aprano
Do you enjoy wasting your own time (as well as ours) by failing to follow instructions? We can't read your mind to see the code you are using, and I am getting frustrated from telling you the same thing again and again. PLEASE PLEASE PLEASE PLEASE help us to help you. Start by reading this: h

Re: UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 10442: character maps to

2018-06-10 Thread bellcanadardp
On Friday, 8 June 2018 07:42:34 UTC-4, Steven D'Aprano wrote: > On Fri, 08 Jun 2018 03:35:12 -0700, bellcanadardp wrote: > > > hello steven are you there?? > > i posted the full error message... > > No you didn't. > > I saw your post, and ignored it, because you didn't follow instructions. > I

Re: UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 10442: character maps to

2018-06-08 Thread Steven D'Aprano
On Sat, 09 Jun 2018 08:26:10 +1000, Cameron Simpson wrote: > It is possible that Python 2 is just glossing over the problem; Python 3 > has a more rigorous view of character data. I would say that is more than just possible, it is almost certain. -- Steven D'Aprano "Ever since I learned about

Re: UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 10442: character maps to

2018-06-08 Thread Cameron Simpson
On 05Jun2018 06:42, bellcanada...@gmail.com wrote: On Sunday, 3 June 2018 20:11:43 UTC-4, Steven D'Aprano wrote: Don't retype a summary of what you think the error is. "character undefieed" is not a thing, and there is no such thing as "byte 1x09". You need to COPY AND PASTE the EXACT error t

Re: UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 10442: character maps to

2018-06-08 Thread Steven D'Aprano
On Fri, 08 Jun 2018 03:35:12 -0700, bellcanadardp wrote: > hello steven are you there?? > i posted the full error message... No you didn't. I saw your post, and ignored it, because you didn't follow instructions. I told you we need to see the *full* traceback, starting from the line beginning

Re: UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 10442: character maps to

2018-06-08 Thread bellcanadardp
On Sunday, 3 June 2018 20:11:43 UTC-4, Steven D'Aprano wrote: > On Sun, 03 Jun 2018 16:36:12 -0700, bellcanadardp wrote: > > > hello peter ...how exactly would i solve this issue .i have a script > > that works in python 2 but not pytho3..i did 2 to 3.py ...but i still > > get the errro...cha

Re: UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 10442: character maps to

2018-06-05 Thread bellcanadardp
On Sunday, 3 June 2018 20:11:43 UTC-4, Steven D'Aprano wrote: > On Sun, 03 Jun 2018 16:36:12 -0700, bellcanadardp wrote: > > > hello peter ...how exactly would i solve this issue .i have a script > > that works in python 2 but not pytho3..i did 2 to 3.py ...but i still > > get the errro...cha

Re: UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 10442: character maps to

2018-06-04 Thread Peter J. Holzer
On 2018-06-03 16:36:12 -0700, bellcanada...@gmail.com wrote: > On Tuesday, 22 May 2018 17:23:55 UTC-4, Peter J. Holzer wrote: > > On 2018-05-20 15:43:54 +0200, Karsten Hilbert wrote: > > > On Sun, May 20, 2018 at 04:59:12AM -0700, bellcanada...@gmail.com wrote: > > > > thank you for the reply, but

Re: UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 10442: character maps to

2018-06-03 Thread Steven D'Aprano
On Sun, 03 Jun 2018 16:36:12 -0700, bellcanadardp wrote: > hello peter ...how exactly would i solve this issue .i have a script > that works in python 2 but not pytho3..i did 2 to 3.py ...but i still > get the errro...character undefieed..unicode decode error cant decode > byte 1x09 in line 74

Re: UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 10442: character maps to

2018-06-03 Thread bellcanadardp
On Tuesday, 22 May 2018 17:23:55 UTC-4, Peter J. Holzer wrote: > On 2018-05-20 15:43:54 +0200, Karsten Hilbert wrote: > > On Sun, May 20, 2018 at 04:59:12AM -0700, bellcanada...@gmail.com wrote: > > > > > On Saturday, 19 May 2018 19:48:20 UTC-4, Skip Montanaro wrote: > > > > As Chris indicated,

Re: UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 10442: character maps to

2018-05-30 Thread Peter J. Holzer
On 2018-05-29 16:20:36 +, Steven D'Aprano wrote: > On Tue, 29 May 2018 14:04:19 +0200, Peter J. Holzer wrote: > > > The OP has one file. > > We don't know that. All we know is that he had one file which he was > unable to read. For all we know, he has a million files, and this was > merely

Re: UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 10442: character maps to

2018-05-29 Thread Steven D'Aprano
On Tue, 29 May 2018 14:04:19 +0200, Peter J. Holzer wrote: > The OP has one file. We don't know that. All we know is that he had one file which he was unable to read. For all we know, he has a million files, and this was merely the first of many failures. > He wants to read it. The very fact

Re: UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 10442: character maps to

2018-05-29 Thread Steven D'Aprano
On Tue, 29 May 2018 10:34:50 +0200, Peter J. Holzer wrote: > On 2018-05-23 06:03:38 +, Steven D'Aprano wrote: >> On Wed, 23 May 2018 00:31:03 +0200, Peter J. Holzer wrote: >> > On 2018-05-23 07:38:27 +1000, Chris Angelico wrote: >> >> You can find an encoding which is capable of decoding a fil

Re: UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 10442: character maps to

2018-05-29 Thread Peter J. Holzer
On 2018-05-29 21:13:43 +1000, Chris Angelico wrote: > You can always solve a subset of problems. Using your own knowledge of > German, you are able to better solve problems involving German text. > But that doesn't make you any better than chardet at validating > Chinese text, or Korean text, or Kl

Re: UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 10442: character maps to

2018-05-29 Thread Chris Angelico
On Tue, May 29, 2018 at 8:59 PM, Peter J. Holzer wrote: > On 2018-05-29 20:28:54 +1000, Chris Angelico wrote: >> Sure, but you're describing a set of rules. If you can define a set of >> rules that pin down the encoding, you could teach chardet to follow >> those rules. If you can't teach chardet

Re: UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 10442: character maps to

2018-05-29 Thread Peter J. Holzer
On 2018-05-29 20:28:54 +1000, Chris Angelico wrote: > On Tue, May 29, 2018 at 8:09 PM, Peter J. Holzer wrote: > > On 2018-05-29 19:46:24 +1000, Chris Angelico wrote: > >> That's basically what the chardet module does, and its error rate is > >> far FAR higher than that. If you think it's easy to d

Re: UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 10442: character maps to

2018-05-29 Thread Chris Angelico
On Tue, May 29, 2018 at 8:09 PM, Peter J. Holzer wrote: > On 2018-05-29 19:46:24 +1000, Chris Angelico wrote: >> On Tue, May 29, 2018 at 6:15 PM, Peter J. Holzer wrote: >> > So if the text is German it will contain more words with >> > umlauts and each byte which is part of a correctly spelled Ge

Re: UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 10442: character maps to

2018-05-29 Thread Peter J. Holzer
On 2018-05-29 19:47:37 +1000, Chris Angelico wrote: > On Tue, May 29, 2018 at 6:34 PM, Peter J. Holzer wrote: > > On 2018-05-23 06:03:38 +, Steven D'Aprano wrote: > >> Mojibake is especially difficult to deal with when you are dealing with > >> short text snippets like file names or user names

Re: UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 10442: character maps to

2018-05-29 Thread Peter J. Holzer
On 2018-05-29 19:46:24 +1000, Chris Angelico wrote: > On Tue, May 29, 2018 at 6:15 PM, Peter J. Holzer wrote: > > So if the text is German it will contain more words with > > umlauts and each byte which is part of a correctly spelled German word > > when interpreted according to ISO-8859-1 increas

Re: UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 10442: character maps to

2018-05-29 Thread Chris Angelico
On Tue, May 29, 2018 at 6:34 PM, Peter J. Holzer wrote: > On 2018-05-23 06:03:38 +, Steven D'Aprano wrote: >> Mojibake is especially difficult to deal with when you are dealing with >> short text snippets like file names or user names which can contain >> arbitrary characters, where there is r

Re: UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 10442: character maps to

2018-05-29 Thread Chris Angelico
On Tue, May 29, 2018 at 6:15 PM, Peter J. Holzer wrote: > So if the text is German it will contain more words with > umlauts and each byte which is part of a correctly spelled German word > when interpreted according to ISO-8859-1 increases the probability that > decoding with ISO-8859-1 will prod

Re: UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 10442: character maps to

2018-05-29 Thread Peter J. Holzer
On 2018-05-23 06:03:38 +, Steven D'Aprano wrote: > On Wed, 23 May 2018 00:31:03 +0200, Peter J. Holzer wrote: > > On 2018-05-23 07:38:27 +1000, Chris Angelico wrote: > >> You can find an encoding which is capable of decoding a file. That's > >> not the same thing. > > > > If the result is corr

Re: UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 10442: character maps to

2018-05-29 Thread Peter J. Holzer
On 2018-05-23 08:43:02 +1000, Chris Angelico wrote: > On Wed, May 23, 2018 at 8:31 AM, Peter J. Holzer wrote: > > On 2018-05-23 07:38:27 +1000, Chris Angelico wrote: > >> > 1) For any given file it is almost always possible to find the correct > >> >encoding (or *a* correct encoding, as there

Re: UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 10442: character maps to

2018-05-23 Thread Chris Angelico
On Thu, May 24, 2018 at 6:48 AM, Dan Stromberg wrote: > On Sat, May 19, 2018 at 3:58 PM, wrote: >> On Thursday, 29 January 2009 12:09:29 UTC-5, Anjanesh Lekshminarayanan >> wrote: >>> > It does auto-detect it as cp1252- look at the files in the traceback and >>> > you'll see lib\encodings\cp12

Re: UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 10442: character maps to

2018-05-23 Thread Dan Stromberg
On Sat, May 19, 2018 at 3:58 PM, wrote: > On Thursday, 29 January 2009 12:09:29 UTC-5, Anjanesh Lekshminarayanan wrote: >> > It does auto-detect it as cp1252- look at the files in the traceback and >> > you'll see lib\encodings\cp1252.py. Since cp1252 seems to be the wrong >> > encoding, try ope

Re: UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 10442: character maps to

2018-05-22 Thread Steven D'Aprano
On Wed, 23 May 2018 00:31:03 +0200, Peter J. Holzer wrote: > On 2018-05-23 07:38:27 +1000, Chris Angelico wrote: [...] >> You can find an encoding which is capable of decoding a file. That's >> not the same thing. > > If the result is correct, it is the same thing. But how do you know what is co

Re: UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 10442: character maps to

2018-05-22 Thread Chris Angelico
On Wed, May 23, 2018 at 8:31 AM, Peter J. Holzer wrote: > On 2018-05-23 07:38:27 +1000, Chris Angelico wrote: >> On Wed, May 23, 2018 at 7:23 AM, Peter J. Holzer wrote: >> >> The best you can do is to go ask the canonical source of the >> >> file what encoding the file is _supposed_ to be in. >>

Re: UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 10442: character maps to

2018-05-22 Thread Peter J. Holzer
On 2018-05-23 07:38:27 +1000, Chris Angelico wrote: > On Wed, May 23, 2018 at 7:23 AM, Peter J. Holzer wrote: > >> The best you can do is to go ask the canonical source of the > >> file what encoding the file is _supposed_ to be in. > > > > I disagree on both counts. > > > > 1) For any given file

Re: UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 10442: character maps to

2018-05-22 Thread Chris Angelico
On Wed, May 23, 2018 at 7:23 AM, Peter J. Holzer wrote: >> The best you can do is to go ask the canonical source of the >> file what encoding the file is _supposed_ to be in. > > I disagree on both counts. > > 1) For any given file it is almost always possible to find the correct >encoding (or

Re: UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 10442: character maps to

2018-05-22 Thread Peter J. Holzer
On 2018-05-20 15:43:54 +0200, Karsten Hilbert wrote: > On Sun, May 20, 2018 at 04:59:12AM -0700, bellcanada...@gmail.com wrote: > > > On Saturday, 19 May 2018 19:48:20 UTC-4, Skip Montanaro wrote: > > > As Chris indicated, you'll have to figure out the correct encoding. You > > > might want to ch

Re: UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 10442: character maps to

2018-05-20 Thread bellcanadardp
On Sunday, 20 May 2018 08:58:32 UTC-4, Richard Damon wrote: > On 5/20/18 7:59 AM, bellcanada...@gmail.com wrote: > > On Saturday, 19 May 2018 19:03:09 UTC-4, Chris Angelico wrote: > >> On Sun, May 20, 2018 at 8:58 AM, wrote: > >>> On Thursday, 29 January 2009 12:09:29 UTC-5, Anjanesh Lekshminar

Re: UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 10442: character maps to

2018-05-20 Thread Karsten Hilbert
On Sun, May 20, 2018 at 04:59:12AM -0700, bellcanada...@gmail.com wrote: > On Saturday, 19 May 2018 19:48:20 UTC-4, Skip Montanaro wrote: > > As Chris indicated, you'll have to figure out the correct encoding. You > > might want to check out the chardet module (available on PyPI, I believe) > > a

Re: UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 10442: character maps to

2018-05-20 Thread Skip Montanaro
> how exactly am i supposed to find oout what is the correct encodeing? It seems you are a Python beginner. Rather than just tell you how to use this one module, I'll point you at some of the ways to get help through Python. * On pypi.org, search for "chardet" and see if the author provided onlin

Re: UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 10442: character maps to

2018-05-20 Thread Richard Damon
On 5/20/18 7:59 AM, bellcanada...@gmail.com wrote: > On Saturday, 19 May 2018 19:03:09 UTC-4, Chris Angelico wrote: >> On Sun, May 20, 2018 at 8:58 AM, wrote: >>> On Thursday, 29 January 2009 12:09:29 UTC-5, Anjanesh Lekshminarayanan >>> wrote: > It does auto-detect it as cp1252- look at t

Re: UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 10442: character maps to

2018-05-20 Thread bellcanadardp
On Saturday, 19 May 2018 19:03:09 UTC-4, Chris Angelico wrote: > On Sun, May 20, 2018 at 8:58 AM, wrote: > > On Thursday, 29 January 2009 12:09:29 UTC-5, Anjanesh Lekshminarayanan > > wrote: > >> > It does auto-detect it as cp1252- look at the files in the traceback and > >> > you'll see lib\e

Re: UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 10442: character maps to

2018-05-20 Thread bellcanadardp
On Saturday, 19 May 2018 19:48:20 UTC-4, Skip Montanaro wrote: > As Chris indicated, you'll have to figure out the correct encoding. You > might want to check out the chardet module (available on PyPI, I believe) > and see if it can come up with a better guess. I imagine there are other > encoding

Re: UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 10442: character maps to

2018-05-20 Thread Peter Otten
bellcanada...@gmail.com wrote: > On Thursday, 29 January 2009 12:09:29 UTC-5, Anjanesh Lekshminarayanan > wrote: >> > It does auto-detect it as cp1252- look at the files in the traceback >> > and you'll see lib\encodings\cp1252.py. Since cp1252 seems to be the >> > wrong encoding, try opening it

Re: UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 10442: character maps to

2018-05-19 Thread Skip Montanaro
As Chris indicated, you'll have to figure out the correct encoding. You might want to check out the chardet module (available on PyPI, I believe) and see if it can come up with a better guess. I imagine there are other encoding guessers out there. That's just one I'm familiar with. Skip -- https:

Re: UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 10442: character maps to

2018-05-19 Thread Chris Angelico
On Sun, May 20, 2018 at 8:58 AM, wrote: > On Thursday, 29 January 2009 12:09:29 UTC-5, Anjanesh Lekshminarayanan wrote: >> > It does auto-detect it as cp1252- look at the files in the traceback and >> > you'll see lib\encodings\cp1252.py. Since cp1252 seems to be the wrong >> > encoding, try ope

Re: UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 10442: character maps to

2018-05-19 Thread bellcanadardp
On Thursday, 29 January 2009 12:09:29 UTC-5, Anjanesh Lekshminarayanan wrote: > > It does auto-detect it as cp1252- look at the files in the traceback and > > you'll see lib\encodings\cp1252.py. Since cp1252 seems to be the wrong > > encoding, try opening it as utf-8 or latin1 and see if that fixe

Re: UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 10442: character maps to

2009-02-26 Thread Anjanesh Lekshminarayanan
> (1) what is produced on Anjanesh's machine >>> sys.getdefaultencoding() 'utf-8' > (2) it looks like a small snippet from a Python source file! Its a file containing just JSON data - but has some unicode characters as well as it has data from the web. > Anjanesh, Is it a .py file Its a .json fil

Re: UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 10442: character maps to

2009-01-29 Thread John Machin
Benjamin Kaplan case.edu> writes: > First of all, you're right that might be confusing. I was thinking of auto-detect as in "check the platform and locale and guess what they usually use". I wasn't thinking of it like the web browsers use it.I think it uses locale.getpreferredencoding(). You're

Re: UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 10442: character maps to

2009-01-29 Thread Benjamin Kaplan
y UTF-8 encoded. Thinking about it now, it could also be MacRoman but that isn't as common as UTF-8. > > > If you want to read the file as text, find out which encoding it actually > is. > In one of those encodings, you'll probably see some nonsense characters. If > yo

Re: UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 10442: character maps to

2009-01-29 Thread John Machin
in binary mode rather than text. That way, you'll avoid this issue all together (just make sure you use byte strings instead of unicode strings). In fact, inspection of Anjanesh's report: """UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position

Re: UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 10442: character maps to

2009-01-29 Thread Benjamin Peterson
Anjanesh Lekshminarayanan anjanesh.net> writes: > > > It does auto-detect it as cp1252- look at the files in the traceback and > > you'll see lib\encodings\cp1252.py. Since cp1252 seems to be the wrong > > encoding, try opening it as utf-8 or latin1 and see if that fixes it. > > Thanks a lot !

Re: UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 10442: character maps to

2009-01-29 Thread Benjamin Kaplan
On Thu, Jan 29, 2009 at 12:09 PM, Anjanesh Lekshminarayanan < m...@anjanesh.net> wrote: > > It does auto-detect it as cp1252- look at the files in the traceback and > > you'll see lib\encodings\cp1252.py. Since cp1252 seems to be the wrong > > encoding, try opening it as utf-8 or latin1 and see if

Re: UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 10442: character maps to

2009-01-29 Thread Anjanesh Lekshminarayanan
> It does auto-detect it as cp1252- look at the files in the traceback and > you'll see lib\encodings\cp1252.py. Since cp1252 seems to be the wrong > encoding, try opening it as utf-8 or latin1 and see if that fixes it. Thanks a lot ! utf-8 and latin1 were accepted ! -- http://mail.python.org/mail

Re: UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 10442: character maps to

2009-01-29 Thread Benjamin Kaplan
quot;C:\Python30\lib\encodings\cp1252.py", line 23, in decode >return codecs.charmap_decode(input,self.errors,decoding_table)[0] > UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position > 10442: character maps to > > The string at position 10

UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 10442: character maps to

2009-01-29 Thread Anjanesh Lekshminarayanan
.buffer.read(), final=True)) File "C:\Python30\lib\io.py", line 1295, in decode output = self.decoder.decode(input, final=final) File "C:\Python30\lib\encodings\cp1252.py", line 23, in decode return codecs.charmap_decode(input,self.errors,decoding_table)[0] UnicodeDecod