Hi Kent,

thanks your respond.

> > I have to remove the thousand separator by moving the numbers before 
it to
> > right.
> > So the number and char groups has to be left in their original 
position.
> >
> > I have to make this kind of changes on the problematic lines:
> >      MOATOT    79   47.281,680      DKK4
> >      MOATOT    79    47281,680      DKK4
> >
> > I have no idea how to make it :(

> Break it up into smaller problems:
> for each line in the data:
> break the line up into fields
> fix the field containing the amount
> rebuild the line
> 
> You don't really have to make a regex for the whole line. re.split() is
> useful for splitting the line and preserving the whitespace so you can
> rebuild the line with the same format.

> Kent

I can't find the way to preserve the whitespace at split.

But I found a way to use regexp.

import re
# this is a group of numbers, followed by a dot, followed by 3 numerics 
and a comma
pat = re.compile(r'(\d+)\.(\d{3},)')

def replace_thousand_separator(line):
        # when the '.' removed a space has to be prepended
    return re.sub(pat, r' \1\2', line)

for file in glob.glob('*.doc'):
    lines = open(file, 'r').readlines()
    lines = [replace_thousand_separator(line) for line in lines]
    open(file, 'wt').writelines(lines)



Yours sincerely, 
______________________________
Janos Juhasz 
_______________________________________________
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Reply via email to