Re: Why isn't my re.sub replacing the contents of my MS Word file?

2014-05-14 Thread wxjmfauth
Le mardi 13 mai 2014 22:26:51 UTC+2, MRAB a écrit : On 2014-05-13 20:01, scottca...@gmail.com wrote: On Tuesday, May 13, 2014 9:49:12 AM UTC-4, Steven D'Aprano wrote: You may have missed my follow up post, where I said I had not noticed you were operating on a binary .doc file.

Re: Why isn't my re.sub replacing the contents of my MS Word file?

2014-05-14 Thread alister
On Tue, 13 May 2014 23:12:40 -0700, wxjmfauth wrote: Le mardi 13 mai 2014 22:26:51 UTC+2, MRAB a écrit : On 2014-05-13 20:01, scottca...@gmail.com wrote: On Tuesday, May 13, 2014 9:49:12 AM UTC-4, Steven D'Aprano wrote: You may have missed my follow up post, where I said I had not

Re: Why isn't my re.sub replacing the contents of my MS Word file?

2014-05-14 Thread scottcabit
On Tuesday, May 13, 2014 4:26:51 PM UTC-4, MRAB wrote: 0x96 is a hexadecimal literal for an int. Within a string you need \x96 (it's \x for 2 hex digits, \u for 4 hex digits, \U for 8 hex digits). Yes, that was my problem. Figured it out just after posting my last message. using \x96

Re: Why isn't my re.sub replacing the contents of my MS Word file?

2014-05-13 Thread Dave Angel
On 05/12/2014 01:35 PM, scottca...@gmail.com wrote: On Friday, May 9, 2014 8:12:57 PM UTC-4, Steven D'Aprano wrote: Good: # Untested fStr = re.sub(b'#x(201[2-5])|(2E3[AB])|(00[2A]D)', b'-', fStr) Still doesn't work. Guess whatever the code is for endash and mdash are not

Re: Why isn't my re.sub replacing the contents of my MS Word file?

2014-05-13 Thread Steven D'Aprano
On Mon, 12 May 2014 10:35:53 -0700, scottcabit wrote: On Friday, May 9, 2014 8:12:57 PM UTC-4, Steven D'Aprano wrote: Good: fStr = re.sub(b'#x2012', b'-', fStr) Doesn't work...the document has been verified to contain endash and emdash characters, but this does NOT

Re: Why isn't my re.sub replacing the contents of my MS Word file?

2014-05-13 Thread Chris Angelico
On Tue, May 13, 2014 at 11:49 PM, Steven D'Aprano steve+comp.lang.pyt...@pearwood.info wrote: This {EN DASH} is an n-dash. or: x\x9c\x0b\xc9\xc8,V\xa8v\xf5Spq\x0c\xf6\xa8U\x00r\x12 \xf3\x14\xf2tS\x12\x8b3\xf4\x00\x82^\x08\xf8 (that last one is the text passed through the zlib

Re: Why isn't my re.sub replacing the contents of my MS Word file?

2014-05-13 Thread scottcabit
On Tuesday, May 13, 2014 9:49:12 AM UTC-4, Steven D'Aprano wrote: You may have missed my follow up post, where I said I had not noticed you were operating on a binary .doc file. If you're not willing or able to use a full-blown doc parser, say by controlling Word or LibreOffice, the

Re: Why isn't my re.sub replacing the contents of my MS Word file?

2014-05-13 Thread MRAB
On 2014-05-13 20:01, scottca...@gmail.com wrote: On Tuesday, May 13, 2014 9:49:12 AM UTC-4, Steven D'Aprano wrote: You may have missed my follow up post, where I said I had not noticed you were operating on a binary .doc file. If you're not willing or able to use a full-blown doc parser, say

Re: Why isn't my re.sub replacing the contents of my MS Word file?

2014-05-12 Thread scottcabit
On Friday, May 9, 2014 8:12:57 PM UTC-4, Steven D'Aprano wrote: Good: fStr = re.sub(b'#x2012', b'-', fStr) Doesn't work...the document has been verified to contain endash and emdash characters, but this does NOT replace them. Better: fStr =

Re: Why isn't my re.sub replacing the contents of my MS Word file?

2014-05-12 Thread Rustom Mody
On Monday, May 12, 2014 11:05:53 PM UTC+5:30, scott...@gmail.com wrote: On Friday, May 9, 2014 8:12:57 PM UTC-4, Steven D'Aprano wrote: fStr = fStr.replace(b'#x2012', b'-') Still doesn't work Best: # Untested fStr =

Re: Why isn't my re.sub replacing the contents of my MS Word file?

2014-05-10 Thread wxjmfauth
Le samedi 10 mai 2014 06:22:00 UTC+2, Rustom Mody a écrit : On Saturday, May 10, 2014 1:21:04 AM UTC+5:30, scott...@gmail.com wrote: Hi, here is a snippet of code that opens a file (fn contains the path\name) and first tried to replace all endash, emdash etc characters

Re: Why isn't my re.sub replacing the contents of my MS Word file?

2014-05-10 Thread Tim Golden
On 10/05/2014 08:11, wxjmfa...@gmail.com wrote: Anyway, as Python may fail as soon as one uses an EM DASH or an EM DASH, I think it's not worth the effort to spend to much time with it. Nope -- seems all right to me. (Hopefully helping the OP out as well as rebutting a rather foolish

Why isn't my re.sub replacing the contents of my MS Word file?

2014-05-09 Thread scottcabit
Hi, here is a snippet of code that opens a file (fn contains the path\name) and first tried to replace all endash, emdash etc characters with simple dash characters, before doing a search. But the replaces are not having any effect. Obviously a syntax problemwwhat silly thing am I doing

Re: Why isn't my re.sub replacing the contents of my MS Word file?

2014-05-09 Thread MRAB
On 2014-05-09 20:51, scottca...@gmail.com wrote: Hi, here is a snippet of code that opens a file (fn contains the path\name) and first tried to replace all endash, emdash etc characters with simple dash characters, before doing a search. But the replaces are not having any effect.

Re: Why isn't my re.sub replacing the contents of my MS Word file?

2014-05-09 Thread Chris Angelico
On Sat, May 10, 2014 at 5:51 AM, scottca...@gmail.com wrote: But the replaces are not having any effect. Obviously a syntax problemwwhat silly thing am I doing wrong? Thanks! fn = 'z:\Documentation\Software' def processdoc(fn,outfile): fStr = open(fn, 'rb').read()

Re: Why isn't my re.sub replacing the contents of my MS Word file?

2014-05-09 Thread Tim Chase
On 2014-05-09 12:51, scottca...@gmail.com wrote: here is a snippet of code that opens a file (fn contains the path\name) and first tried to replace all endash, emdash etc characters with simple dash characters, before doing a search. But the replaces are not having any effect. Obviously a

Re: Why isn't my re.sub replacing the contents of my MS Word file?

2014-05-09 Thread scottcabit
re.sub _returns_ its result (strings are immutable). Ahhso I tried this for each re.sub fStr = re.sub(b'#x2012','-',fStr) No errors running it, but it still does nothing. -- https://mail.python.org/mailman/listinfo/python-list

Re: Why isn't my re.sub replacing the contents of my MS Word file?

2014-05-09 Thread scottcabit
On Friday, May 9, 2014 4:09:58 PM UTC-4, Tim Chase wrote: A Word doc (as your subject mentions) is a binary format. There's the older .doc and the newer .docx (which is actually a .zip file with a particular content-structure renamed to .docx). I am using .doc files only.. For

Re: Why isn't my re.sub replacing the contents of my MS Word file?

2014-05-09 Thread Steven D'Aprano
On Fri, 09 May 2014 12:51:04 -0700, scottcabit wrote: Hi, here is a snippet of code that opens a file (fn contains the path\name) and first tried to replace all endash, emdash etc characters with simple dash characters, before doing a search. But the replaces are not having any

Re: Why isn't my re.sub replacing the contents of my MS Word file?

2014-05-09 Thread Steven D'Aprano
On Fri, 09 May 2014 13:49:56 -0700, scottcabit wrote: On Friday, May 9, 2014 4:09:58 PM UTC-4, Tim Chase wrote: A Word doc (as your subject mentions) is a binary format. There's the older .doc and the newer .docx (which is actually a .zip file with a particular content-structure renamed to

Re: Why isn't my re.sub replacing the contents of my MS Word file?

2014-05-09 Thread Rustom Mody
On Saturday, May 10, 2014 1:21:04 AM UTC+5:30, scott...@gmail.com wrote: Hi, here is a snippet of code that opens a file (fn contains the path\name) and first tried to replace all endash, emdash etc characters with simple dash characters, before doing a search. But the replaces