a regular expression question, or at least a text manipulation question

Peter Alcibiades Wed, 27 Aug 2008 13:35:48 -0700

How do you do the following?

I have a series of lines which go like this


|  [record separator, new record starts]
AAA consectetur adipisicing elit, sed
BBB lorem ipsum
CCC consectetur adipisicing elit, sed
CCC laboris nisi ut aliquip ex ea
DDD ut aliquip ex ea commodo
| [record separator]
AAA adipisicing elit, sed   [new record starts]

| is the record separator.

In the above, its CCC that is repeated, but it could be any prefix.  Also CCC 
is next to its repetition.  This will always be the case.

I want to go through the file.  When I find a single prefix (like AAA) this 
should be written to the output file.  when the next line starts with the 
same prefix (as in the CCC cases, I want to put both occurences on the same 
line.  So the desired output would be

AAA consectetur adipisicing elit, sed
BBB lorem ipsum
CCC consectetur adipisicing elit, sed CCC laboris nisi ut aliquip ex ea
DDD ut aliquip ex ea commodo
EOR
AAA adipisicing elit, sed

How do I detect a repetition of that sort and do this? 

A similar question, if the line is

CCC  adipisicing elit, sed TAB CCC  adipisicing elit, sed

How do you detect the multiple occurence (I can do this with regex) and then 
write out in place of thie above expression (this I don't see how to do) the 
following:

CCC  adipisicing elit, sed CCC  adipisicing elit, sed

Obviously, the pseudo latin is different in each case, so no way to check 
using that.

Peter
_______________________________________________
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution

a regular expression question, or at least a text manipulation question

Reply via email to