Le mardi 13 mai 2014 22:26:51 UTC+2, MRAB a écrit :
On 2014-05-13 20:01, scottca...@gmail.com wrote:
On Tuesday, May 13, 2014 9:49:12 AM UTC-4, Steven D'Aprano wrote:
You may have missed my follow up post, where I said I had not noticed you
were operating on a binary .doc file.
On Tue, 13 May 2014 23:12:40 -0700, wxjmfauth wrote:
Le mardi 13 mai 2014 22:26:51 UTC+2, MRAB a écrit :
On 2014-05-13 20:01, scottca...@gmail.com wrote:
On Tuesday, May 13, 2014 9:49:12 AM UTC-4, Steven D'Aprano wrote:
You may have missed my follow up post, where I said I had not
On Tuesday, May 13, 2014 4:26:51 PM UTC-4, MRAB wrote:
0x96 is a hexadecimal literal for an int. Within a string you need \x96
(it's \x for 2 hex digits, \u for 4 hex digits, \U for 8 hex digits).
Yes, that was my problem. Figured it out just after posting my last message.
using \x96
On 05/12/2014 01:35 PM, scottca...@gmail.com wrote:
On Friday, May 9, 2014 8:12:57 PM UTC-4, Steven D'Aprano wrote:
Good:
# Untested
fStr = re.sub(b'#x(201[2-5])|(2E3[AB])|(00[2A]D)', b'-', fStr)
Still doesn't work.
Guess whatever the code is for endash and mdash are not
On Mon, 12 May 2014 10:35:53 -0700, scottcabit wrote:
On Friday, May 9, 2014 8:12:57 PM UTC-4, Steven D'Aprano wrote:
Good:
fStr = re.sub(b'#x2012', b'-', fStr)
Doesn't work...the document has been verified to contain endash and
emdash characters, but this does NOT
On Tue, May 13, 2014 at 11:49 PM, Steven D'Aprano
steve+comp.lang.pyt...@pearwood.info wrote:
This {EN DASH} is an n-dash.
or:
x\x9c\x0b\xc9\xc8,V\xa8v\xf5Spq\x0c\xf6\xa8U\x00r\x12
\xf3\x14\xf2tS\x12\x8b3\xf4\x00\x82^\x08\xf8
(that last one is the text passed through the zlib
On Tuesday, May 13, 2014 9:49:12 AM UTC-4, Steven D'Aprano wrote:
You may have missed my follow up post, where I said I had not noticed you
were operating on a binary .doc file.
If you're not willing or able to use a full-blown doc parser, say by
controlling Word or LibreOffice, the
On 2014-05-13 20:01, scottca...@gmail.com wrote:
On Tuesday, May 13, 2014 9:49:12 AM UTC-4, Steven D'Aprano wrote:
You may have missed my follow up post, where I said I had not noticed you
were operating on a binary .doc file.
If you're not willing or able to use a full-blown doc parser, say
On Friday, May 9, 2014 8:12:57 PM UTC-4, Steven D'Aprano wrote:
Good:
fStr = re.sub(b'#x2012', b'-', fStr)
Doesn't work...the document has been verified to contain endash and emdash
characters, but this does NOT replace them.
Better:
fStr =
On Monday, May 12, 2014 11:05:53 PM UTC+5:30, scott...@gmail.com wrote:
On Friday, May 9, 2014 8:12:57 PM UTC-4, Steven D'Aprano wrote:
fStr = fStr.replace(b'#x2012', b'-')
Still doesn't work
Best:
# Untested
fStr =
Le samedi 10 mai 2014 06:22:00 UTC+2, Rustom Mody a écrit :
On Saturday, May 10, 2014 1:21:04 AM UTC+5:30, scott...@gmail.com wrote:
Hi,
here is a snippet of code that opens a file (fn contains the path\name)
and first tried to replace all endash, emdash etc characters
On 10/05/2014 08:11, wxjmfa...@gmail.com wrote:
Anyway, as Python may fail as soon as one uses an
EM DASH or an EM DASH, I think it's not worth the
effort to spend to much time with it.
Nope -- seems all right to me. (Hopefully helping the OP out as well as
rebutting a rather foolish
Hi,
here is a snippet of code that opens a file (fn contains the path\name) and
first tried to replace all endash, emdash etc characters with simple dash
characters, before doing a search.
But the replaces are not having any effect. Obviously a syntax
problemwwhat silly thing am I doing
On 2014-05-09 20:51, scottca...@gmail.com wrote:
Hi,
here is a snippet of code that opens a file (fn contains the path\name) and
first tried to replace all endash, emdash etc characters with simple dash
characters, before doing a search.
But the replaces are not having any effect.
On Sat, May 10, 2014 at 5:51 AM, scottca...@gmail.com wrote:
But the replaces are not having any effect. Obviously a syntax
problemwwhat silly thing am I doing wrong?
Thanks!
fn = 'z:\Documentation\Software'
def processdoc(fn,outfile):
fStr = open(fn, 'rb').read()
On 2014-05-09 12:51, scottca...@gmail.com wrote:
here is a snippet of code that opens a file (fn contains the
path\name) and first tried to replace all endash, emdash etc
characters with simple dash characters, before doing a search. But
the replaces are not having any effect. Obviously a
re.sub _returns_ its result (strings are immutable).
Ahhso I tried this for each re.sub
fStr = re.sub(b'#x2012','-',fStr)
No errors running it, but it still does nothing.
--
https://mail.python.org/mailman/listinfo/python-list
On Friday, May 9, 2014 4:09:58 PM UTC-4, Tim Chase wrote:
A Word doc (as your subject mentions) is a binary format. There's
the older .doc and the newer .docx (which is actually a .zip file
with a particular content-structure renamed to .docx).
I am using .doc files only..
For
On Fri, 09 May 2014 12:51:04 -0700, scottcabit wrote:
Hi,
here is a snippet of code that opens a file (fn contains the path\name)
and first tried to replace all endash, emdash etc characters with
simple dash characters, before doing a search.
But the replaces are not having any
On Fri, 09 May 2014 13:49:56 -0700, scottcabit wrote:
On Friday, May 9, 2014 4:09:58 PM UTC-4, Tim Chase wrote:
A Word doc (as your subject mentions) is a binary format. There's the
older .doc and the newer .docx (which is actually a .zip file with a
particular content-structure renamed to
On Saturday, May 10, 2014 1:21:04 AM UTC+5:30, scott...@gmail.com wrote:
Hi,
here is a snippet of code that opens a file (fn contains the path\name) and
first tried to replace all endash, emdash etc characters with simple dash
characters, before doing a search.
But the replaces
21 matches
Mail list logo