"Jay Mutter III" <jmutter at uakron.edu> wrote

> See example  next:
> A.-C. Manufacturing Company. (See Sebastian, A. A.,
> and Capes, assignors.)
>...
>Aaron, Solomon E., Boston, Mass. Pliers. No. 1,329,155 ;
>Jan. 27 ; v. 270 ; p. 554.
>
> For instance, I would like to go to end of line and if last
> character  is a comma or semicolon or hyphen then
> remove the CR.

It would look something like:

output = open('example.fixed','w')
for line in file('example.txt'):
    if line[-1] in ',;-':            # check last character
      line = line.strip()         # lose the C/R
      output.write(line)        # write to output
    else: output.write(line)  # append the next line complete with C/R
output.close()




Working from the above suggestion ( and thank you very much - i did enjoy your online tutorial)
I came up with the following:

import os
import sys
import re
import string

# The next 5 lines are so I have an idea of how many lines i started with in the file.

in_filename = raw_input('What is the COMPLETE name of the file you want to open: ')
in_file = open(in_filename, 'r')
text = in_file.read()
num_lines = text.count('\n')
print 'There are', num_lines, 'lines in the file', in_filename

output = open("cleandata.txt","a") # file for writing data to after stripping newline character

# read file, copying each line to new file
for line in text:
    if line[:-1] in '-':
        line = line.rstrip()
        output.write(line)
    else: output.write(line)

print "Data written to cleandata.txt."

# close the files
in_file.close()
output.close()

The above ran with no erros, gave me the number of lines in my orginal file but then when i opened the cleandata.txt file
I got:

A.-C.䴀愀渀甀昀愀挀琀甀爀椀渀最Company.⠀匀攀攀 Sebastian,䄀⸀A.,and䌀愀瀀攀猀Ⰰassignors.)A.䜀⸀A.刀 愀椀氀眀愀礀Light☀Signal䌀漀⸀(See䴀攀搀攀渀 ⰀElofHassignor.)A-N䌀漀洀瀀愀渀礀ⰀThe.⠀匀攀攀 Alexander愀渀搀Nasb,愀猀ⴀ猀椀最渀漀爀猀⸀㬀䄀一 Company,吀栀攀⸀(See一愀猀栀ⰀIt.䨀⸀Ⰰand䄀氀攀砀 愀渀搀攀爀Ⰰas-

So what did I do to cause all of the strange characters????
Plus since this goes on it is as if it removed all \n and not just the ones after a hyphen which I was using as my test case.

Thanks again.

Jay



> Then move line by line through the file and delete everything
> after a  numerical sequence

Slightly more tricky because you need to use a regular expression.
But if you know regex then only slightly.

>  I am wondering if Python would be a good tool

Absolutely, its one of the areas where Python excels.

> find information on how to accomplish this

You could check  my tutorial on the three topics:

Handling text
Handling files
Regular Expressions.

Also the standard python documentation for the general tutorial
(assuming you've done basic programming in some other language
before) plus the re module

> using something like the unix tool awk or something else??

awk or sed could both be used, but Python is more generally
useful so unless you already know awk I'd take the time to
learn the basics of Python (a few hours maybe) and use that.

--
Alan Gauld
Author of the Learn to Program web site
http://www.freenetpages.co.uk/hp/alan.gauld
_______________________________________________
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Reply via email to