On 03/06/15 21:13, richard kappler wrote:
I was trying to keep it simple, you'd think by now I'd know better. My fault and my apology.

It's definitely not all dates and times, the data and character types vary. This is the output from my log parser script which you helped on the other day. there are essentially two types of line:

Tue Jun 2 10:22:42 2015<usertag1 name="SE">SE201506012200310389PS01CT1407166S0011.40009.00007.6IN 000000000018.1LB000258]C10259612019466862270088094]L0223PDF</usertag1> Tue Jun 2 10:22:43 2015<usertag1 name="SE">SE0389icdim01307755C0038.20033.20012.0IN1000000000 0032]C10259612804038813568089577</usertag1>

I have to do several things:
the first type can be of variable length, everything after the ] is an identifier that I have to separate, some lines have one, some have more than one, variable length, always delimited by a ]

So why not just split by ']'?

identifiers = line.split(']')[1:]  # lose the first one


and finally, I have to break these apart and put a descriptor with each.


Nope. I don't understand that.
Break what apart? and how do you 'put a descriptor with each'?
What is a descriptor for that matter?!


While I was waiting for a response to this, I put together a script to start figuring things out (what could possibly go wrong?!?!?! :-) )

and I can't post the exact script but the following is the guts of it:

f1 = open('unformatted.log', 'r')
f2 = open('formatted.log', 'a')

for line in f1:
    for tag in ("icdm"):
        if tag in line:
newline = 'log datestamp:' + line[0:24] # + and so on to format the lines with icdm in them including adding 14 x's for the missing timestamp
            f2.write(newline) #write the formatted output to the new log
        else:
newline = 'log datestamp:' + line[0:24] # + and so on to format the non-icdm lines
            f2.write(newline)


So this checks each line for the 4 tags:  i,c,d and m.
if the tag is in the line it does the if clause, including writing to f2
If the tag is not in the line it does the else which also writes to f2.
So you always write 4 lines to f2. Is that correct?

The problems are:
1. for some reason this iterates over the 24 line file 5 times, and it writes the 14 x's to every file, so my non-icdm code (the else:) isn't getting executed. I'm missing something basic and obvious but have no idea what.

That's not what I'd expect. I'd expect it to write 4 lines out for every input line. What gets written depending on however many of the 4 tags are found in the line.

Since we only have partial code we don't know what the formatted lines look like.

2. I still don't know how to handle the differences in the end of the non-icdm files (potentially more than identifier ] delimited as described above).

I'm not clear on this yet either.
I suspect that once you clarify what you are trying to do you will know how to do it...

--
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos

_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

Reply via email to