[Slightly busy at the moment; can someone else help? Alan, in the future, don't send replies directly to me: send them to the Tutor list. It's an ad-hoc way to load-balance your questions across all the tutors.]
---------- Forwarded message ---------- Date: Thu, 24 Nov 2005 03:55:00 -0600 From: Alan <[EMAIL PROTECTED]> To: [EMAIL PROTECTED], [EMAIL PROTECTED] Subject: RE: lil help please - updated Sorry Lil better english I have about 150 lines of python extracting text from large file, the problem I need a few lines to clean first to avoid the problem the script is facing Overview There is large text and I am trying to organize it for the python script to process, it is badly organized and I attempted to do it like this which the master script understand Keywords: ##### is number like 1 thru 99999 |H paragraphs |F reFerence |R Rating BEFORE I organized by text global and replace Each set of tokens was like this ##### paragraph F reference R rating Now (where master script understand) |H###### paragraph |F reference |R rating Notice no ##### in |F |R PROBLEMS Phase 1 PROBLEM 1 the |H paragraph (multi lines) has some words between () such as (xyz blah words) also maybe in multi lines �.( blah blah blah blah) � We need to move it to the end of |F reference (xyz blah words) Example BEFORE |H 00100 a friend in need is a friend indeed (author means both young \ and old) so select the best friend as soon as you can blah |F Old London book |R Cool AFTER your process |H 00100 "a friend in need is a friend indeed so select the best friend as soon as you can blah" |F Old London book |R Cool PROBLEM 2 I need to find out if the order is broken so I go and fix it by hand i.e. |H##### |F |R is any other order so it is outputted in ErrorOrderLogFile |H##### paragraph |H paragraph |R rating or any order like run new cleaning script and cat ErrorOrderLogFile |H00299 paragraph |F Reference |H Rating |H00300 paragraph |H paragraph |H rating cat ErrorOrderLogFile: bad set orders |H00300 paragraph Phase II PROBLEM 3 Once I fix by the order hand I need to renumber all from say 00001 to 99999 In this format |H00001 paragraph |F00001 reference |R00001 rating |H99999 paragraph |F99999 reference |R99999 rating --- Outgoing mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.778 / Virus Database: 525 - Release Date: 10/15/2004 --- Outgoing mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.778 / Virus Database: 525 - Release Date: 10/15/2004
_______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor