Re: deleting repeated blocks of text

Gerald Lai Wed, 03 May 2006 18:03:50 -0700

On Wed, 3 May 2006, Gerald Lai wrote:

On Wed, 3 May 2006, Vim Visual wrote:

Hi,

this is the continuation of a post... The point is that I have a file
where blocks of text appear sometimes once, sometimes twice or even
three times etc...
I would like to find out how to delete the blocks that are repeated,
so that in the end I am left with a text file in which the blocks
appear only ONCE....

The text file looks like this, for instance:(please note that there
are NOT blank lines in my text file, it's just after pasting here)

What a call a "block" is a paragraph starting with a "<br><a
href="http://xxx.lanl.gov/..."; until the next "<br><a
href="http://xxx.lanl.gov/...";

[snip]

One possibility of what you're asking for is a uniq of custom defined
blocks. Blocks that are repeated in a row will be reduced to a unique
one block.

The other possibility, where getting rid of duplicate blocks that are
sandwiched between one another, is extremely complicated, and would
require storage for every unique block in the text for comparison. Some
might argue that if you did manage to store every unique block, then you
would have done the work of uniq-ing already.

For blocks uniq, the following commands should perform what you want.

First, place block delimiters so the start of a block != end of another
block. I chose "#end#" on its own line as the end delimiter.

 :g/xxx\.lanl\.gov\|<!-- acaba -->/put!='#end#'

Then the uniq command is a simple

 :g/^\(.*www\.lanl\.gov\_.\{-}\_^#end#\)\n\1$/.,/^#end#$/d A

[snip]

Another way of doing this is:

  :%s/^\(.*www\.lanl\.gov\_.\{-}\_^#end#\)\%(\n\1\)\+$/\1

After that, you just need to clean up the end delimiters:

 :%s/^#end#$//


It would probably be better to do

  :g/^#end#$/d

HTH :)
--
Gerald

Re: deleting repeated blocks of text

Reply via email to