Øyvind wrote: > >>Where are you getting these errors (what line of the program)? Do you > > know >what kind of strings objSelection.Find.Execute() is expecting? > >>Kent > > >> The program stops working and gives me these errors when I try to run it >> when it encounters a non-english letter. > >> This is the full error: >> Traceback (most recent call last): >> File >> "C:\Python23\Lib\site-packages\pythonwin\pywin\framework\scriptutils.py", >> line 310, in RunScript >> exec codeObject in __main__.__dict__ >> File "C:\Python\BA\Oversett.py", line 47, in ? >> File "C:\Python\BA\Oversett.py", line 23, in kjor >> en = i.split('\t')[0] >> File "C:\Python23\lib\codecs.py", line 388, in readlines >> return self.reader.readlines(sizehint) >> File "C:\Python23\lib\codecs.py", line 314, in readlines >> return self.decode(data, self.errors)[0].splitlines(1) >> UnicodeDecodeError: 'utf8' codec can't decode bytes in position 168-170: >> invalid data
>This is fairly strange as the line > en = i.split('\t')[0] >should not call any method in codecs. I don't know how you can get such a >stack trace. The file f where en comes from does contain lots of lines with one english word followed by a tab and a norwegian one. (Approximately 25000 lines) It can look like this: core\tkjærne So en is supposed to be the english word that the program need to find in MS Word, and to is the replacement word. So wouldn't that be a string that should be handeled by codecs? for i in self.f.readlines(): en = i.split('\t')[0] >Maybe try deleting all the .pyc files to make sure they are in sync with >the source and try again? This didn't seem to help. >The actual error indicates that the input data is not valid utf-8. Are you >sure that is the correct encoding for the input file? If the file is utf-8 >and has bad characters you could pass error='ignore' or error='replace' as >a parameter to codecs.open() to change the error handling style to >something more forgiving. Is not valid utf-8? I have tried with latin-1 as well. No avail. The letters that are the problem is æøå. They shouldn't be that exotic? >> Traceback (most recent call last): >> File >> "C:\Python23\Lib\site-packages\pythonwin\pywin\framework\scriptutils.py", >> line 310, in RunScript >> exec codeObject in __main__.__dict__ >> File "C:\Python\BA\Oversett.py", line 49, in ? >> File "C:\Python\BA\Oversett.py", line 33, in kjor >> if t % 1000 == 0: >> UnicodeDecodeError: 'ascii' codec can't decode byte 0xe5 in position 17: >> ordinal not in range(128) >Again this stack trace doesn't make sense, the indicated line doesn't do >any string operation. >This error message normally occurs when a non-ascii string is converted to >unicode using the default encoding (which is 'ascii'). Often the >conversion is implicit in some other operation but I don't see any such >operation here. But regardless, shouldn't 'ascii' be excluded here? Since I tell the program to change to utf-8, not only once but twice? >> objSelection.Find.Execute() is supposed to accept any kind of string. (It >> is the function Search & Replace in MS Word). >It has to make some assumption about the type of the string. Does it want >unicode or encoded bytes? If encoded bytes, what encoding does it expect? I think the letters should be accepted. The pythonscript here is written to replace abot 25000 MS Word-macros, so all the letters have been accepted by MS Word when feeded by Visual Basic. All I have done now is to extract the words from the macros and put them in a file. -- This email has been scanned for viruses & spam by Decna as - www.decna.no Denne e-posten er sjekket for virus & spam av Decna as - www.decna.no _______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor