On 02/03/2011 02:15 PM, Peter Otten wrote:
Karim wrote:

I am trying to subsitute a '""' pattern in '\"\"' namely escape 2
consecutives double quotes:

     * *In Python interpreter:*

$ python
Python 2.7.1rc1 (r271rc1:86455, Nov 16 2010, 21:53:40)
[GCC 4.4.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
  >>>  expression = *' "" '*
  >>>  re.subn(*r'([^\\])?"', r'\1\\"', expression*)
Traceback (most recent call last):
    File "<stdin>", line 1, in<module>
    File "/home/karim/build/python/install/lib/python2.7/re.py", line
162, in subn
      return _compile(pattern, flags).subn(repl, string, count)
    File "/home/karim/build/python/install/lib/python2.7/re.py", line
278, in filter
      return sre_parse.expand_template(template, match)
    File "/home/karim/build/python/install/lib/python2.7/sre_parse.py",
line 787, in expand_template
      raise error, "unmatched group"
sre_constants.error: unmatched group

But if I remove '?' I get the following:

  >>>  re.subn(r'([^\\])"', r'\1\\"', expression)
(' \\"" ', 1)

Only one substitution..._But this is not the same REGEX._ And the
count=2 does nothing. By default all occurrence shoul be substituted.

     * *On linux using my good old sed command, it is working with my '?'
       (0-1 match):*

*$* echo *' "" '* | sed *'s/\([^\\]\)\?"/\1\\"/g*'*
   \"\"

*Indeed what's the matter with RE module!?*
You should really fix the problem with your email program first;
Thunderbird issue with bold type (appears as stars) but I don't know how to fix it yet.
  afterwards
it's probably a good idea to try and explain your goal clearly, in plain
English.

I already did it. (cf the mails queue). But to resume I pass the expression string to TCL command which delimits string with double quotes only.
Indeed I get error with nested double quotes => That's the key problem.
Yes. What Steven said ;)

Now to your question as stated: if you want to escape two consecutive double
quotes that can be done with

s = s.replace('""', '\"\"')

I have already done it as a workaround but I have to add another replacement before to consider all other cases.
I want to make the original command work to suppress the workaround.


but that's probably *not* what you want. Assuming you want to escape two
consecutive double quotes and make sure that the first one isn't already
escaped,

You hit it !:-)

this is my attempt:

def sub(m):
...     s = m.group()
...     return r'\"\"' if s == '""' else s
...
print re.compile(r'[\\].|""').sub(sub, r'\\\"" \\"" \"" "" \\\" \\" \"')

That is not the thing I want. I want to escape any " which are not already escaped. The sed regex '/\([^\\]\)\?"/\1\\"/g' is exactly what I need (I have made regex on unix since 15 years).

For me the equivalent python regex is buggy: r'([^\\])?"', r'\1\\"'
'?' is not accepted Why? character which should not be an antislash with 0 or 1 occurence. This is quite simple.

I am a poor tradesman but I don't deny evidence.

Regards
Karim

\\\"" \\\"\" \"" \"\" \\\" \\" \"

Compare that with

$ echo '\\\"" \\"" \"" "" \\\" \\" \"' | sed 's/\([^\\]\)\?"/\1\\"/g'
\\\"\" \\"\" \"\" \"\" \\\\" \\\" \\"

Concerning the exception and the discrepancy between sed and python's re, I
suggest that you ask it again on comp.lang.python aka the python-list
mailing list where at least one regex guru will read it.

Peter

_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Reply via email to