Re: Regular Expression Question

Tim Chase Fri, 26 Jan 2007 06:05:17 -0800

print ("small string");
print (
  "This is a very long string");
and I need to format it as so:

print ("small string\n");
print (
  "This is a very long string\n");
Ideally, I would like to do this in one command and I would also like tounderstand the regex itself. So, given the above, here is what Iunderstand of the regex pattern:
    %s/print\s*(\s*"[^"]*\(\\n\)\@<!\ze"/&\\n/g
% - globally
s              - substitute
/              - delimeter
print\s*(\s*" - my phrase to match including zero or more matchingspaces at the end print, then a literal paren then zero or more spacesup until the quote
[^"]*       - then everything that is not a quote (zero or more)


Doing well up through here...

(             - The beginning of the group ???
\\n          - literal \n
)             - End group ????
\@<!          - Nothing, requires no match behind ???

You've got the understanding right (though those parens are "\("and "\)" with backslashes). Those four lines in concert assertthat a literal "\n" doesn't come before the current point.Without the grouping, it would only assure that the previous atom(in this case, the "n") didn't appear here, so you'd haveproblems with things like


        print("terminal n")

because it sees the terminal "n" so it doesn't do thesubstitution. By grouping them, you assert "and when you get tothis point [before the closing quote] and there isn't a literalbackslash-en here, then we match"

In here, you're missing the "\ze" which means "when doing thereplacement, treat it as though the thing we're substitutingended here, even though there's more stuff we're looking for(namely, the double-quote that's next)"

"             - my ending quote to match in the pattern print ("")


correct

/&          - ???

This is standard substitution...the slash is the break betweenthe search and its replacement. The ampersand is "the wholeprevious match". In this case, it's slightly tweaked because ofthe "\ze" that we used...the thing we replace goes up through(but not including) the second double-quote. So it drops ineverything from "print" through the end of the internal string(sans-closing-quote)

\\n          - literal \n


correct...appending the literal \n you want.

/             - delimeter
g            - each occurrence on the line

Then we have the spanning multiple lines option:

\_ [^"]*


that's

        \_[

not

        \_ [

\_ - match text over multiple lines (Is this like another regexengine, like the one sed uses?)


It's a vim thing:

        :help /\_

should drop you in the fray. It prefixes (infixes?)a number ofatoms that could include whitespace, so for your change, you'dlikely want to do something like change the \s atoms to \_s toinclude newlines.

Does this make since? The area I am having difficulty with is /& and howthe grouping is working.

Hopefully this sheds some light on matters and helps you tweakyour own regexps in the future. If you have any questions, feelfree to ask.


-tim

Re: Regular Expression Question

Reply via email to