print ("small string");
print (
  "This is a very long string");

and I need to format it as so:

print ("small string\n");
print (
  "This is a very long string\n");

Ideally, I would like to do this in one command and I would also like to understand the regex itself. So, given the above, here is what I understand of the regex pattern:

    %s/print\s*(\s*"[^"]*\(\\n\)\@<!\ze"/&\\n/g
% - globally
s              - substitute
/              - delimeter
print\s*(\s*" - my phrase to match including zero or more matching spaces at the end print, then a literal paren then zero or more spaces up until the quote
[^"]*       - then everything that is not a quote (zero or more)

Doing well up through here...

(             - The beginning of the group ???
\\n          - literal \n
)             - End group ????
\@<!          - Nothing, requires no match behind ???

You've got the understanding right (though those parens are "\(" and "\)" with backslashes). Those four lines in concert assert that a literal "\n" doesn't come before the current point. Without the grouping, it would only assure that the previous atom (in this case, the "n") didn't appear here, so you'd have problems with things like

        print("terminal n")

because it sees the terminal "n" so it doesn't do the substitution. By grouping them, you assert "and when you get to this point [before the closing quote] and there isn't a literal backslash-en here, then we match"

In here, you're missing the "\ze" which means "when doing the replacement, treat it as though the thing we're substituting ended here, even though there's more stuff we're looking for (namely, the double-quote that's next)"

"             - my ending quote to match in the pattern print ("")

correct

/&          - ???

This is standard substitution...the slash is the break between the search and its replacement. The ampersand is "the whole previous match". In this case, it's slightly tweaked because of the "\ze" that we used...the thing we replace goes up through (but not including) the second double-quote. So it drops in everything from "print" through the end of the internal string (sans-closing-quote)

\\n          - literal \n

correct...appending the literal \n you want.

/             - delimeter
g            - each occurrence on the line

Then we have the spanning multiple lines option:

\_ [^"]*

that's

        \_[

not

        \_ [

\_ - match text over multiple lines (Is this like another regex engine, like the one sed uses?)

It's a vim thing:

        :help /\_

should drop you in the fray. It prefixes (infixes?)a number of atoms that could include whitespace, so for your change, you'd likely want to do something like change the \s atoms to \_s to include newlines.

Does this make since? The area I am having difficulty with is /& and how the grouping is working.


Hopefully this sheds some light on matters and helps you tweak your own regexps in the future. If you have any questions, feel free to ask.

-tim



Reply via email to