I have a legacy program at work that outputs a text file with this header:

ÉÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ»
º Radio Source Precession Program º
º by John B. Doe º
º 31 August 1992 º
ÈÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍŒ
Enter Date for Precession as (MM-DD-YYYY) or C/R for 05-28-2004 > 05-28-2004
Enter the Catalog Name or C/R for CATALOG.SRC >
The Julian Date is = 2453153.5
0022+002 5.6564 +0.2713 00:22:37.54 00:16:16.65
0106+013 17.2117 +1.6052 01:08:50.80 01:36:18.58
.
.
.
much more regular integer data lines to the end of the section.

One section is created each time the program is run. Each section has one of these headers. Each section is appended to the end of the file every time the program is run. So that each new header follows the last data line in the previous section.


I am trying to write a python script to strip this header (the first five lines)(these headers) from the file. The name of this legacy program is PRECESS. Every time we run PRECESS, this header is repeated, not just at the top.

Here's my code so far:

(code)
import re

def main():
f = open('/home/jerry/sepoct08.txt', 'r') # sepoct08.txt is the PRECESS output file
for line in f:
if re.search('ÉÍÍÍ', line):
print line
elif re.search('> ..-..-....', line): # this line prints out
print line
elif re.search('Catalog', line): # this line prints out
print line
elif re.search('Julian', line): # this line prints out
print line
print "Hi there!" # I print out this just so I know my script is looping

f.close()

if __name__ == "__main__":
main()
(/code)

Here's the output from my code:

(output)
Hi there!
Hi there!
Hi there!
Hi there!
Hi there!
Enter Date for Precession as (MM-DD-YYYY) or C/R for 05-28-2004 > 05-28-2004

Hi there!
Enter the Catalog Name or C/R for CATALOG.SRC >

Hi there!
The Julian Date is = 2453153.5

Hi there!
Hi there!
.
.
.
. end of file
(/output)

As you can see, I can print out the three lines after the strange header lines, but not the strange character lines. How can I match on those strange characters? What are they?

I'm just trying to figure out how to print out each line from the header first, then later I will modify the code to process those lines as needed. My problem is those strange characters in the top part of the header. The re module doesn't recognize them. How can I match on them, so I can delete those lines? I can't do it by line number because they aren't recognized

The original PRECESS code cannot be modified. So, short of rewriting the PRECESS program, I thought it would be easy to modify the output as needed. I'm pretty sure PRECESS is written in C.


Sorry for the long post, I tried to only include the relevant information. Please fire away with questions and comments.


TIA,
Jerry


_______________________________________________
Tutor maillist  -  [email protected]
http://mail.python.org/mailman/listinfo/tutor

Reply via email to