On 31/07/2013 3:03 a.m., Tsantilas Christos wrote:
Hi all,
I am posting an alternate patch.
This patch:

  1) If token starts from ( do not consider it as function parameters
start point and return a token from ( to the next white space

OK.

  2) If configuration_includes_quoted_values is set to "on" (new style
enabled) and token ends on a ( character, consider it as function name.
If the token is "parameters" return FunctionParameters type else return
FunctionNameUnknown type and parsing fails

Um. I am okay with this but do forsee a few tricky points here ... (not quite problems, but tricky).

This above logic still means that we are requiring all regex patterns to be quoted strings whenever they contain () brackets around the final segment of pattern.

=> If we are going to make any regex require special quoting, then I think it worthwhile being consistent and making them all require quoting.
    Are we agreed that making regex "-quoted is a good idea?

The '\' escape is already used internally by regex and layering multiple levels of \-escaping gets nasty quite quickly.

=> Luckily we only have 2 layers here. But it does mean that simply wrapping existing patterns in "" is not enough, each pattern needs to be individually marked up with escaping inside the quotes.

=> The number of admin using long lists of regex is quite high and means that we are likely going to have to support a mix of quoted and non-quoted regex simultaneously in the same file when configuration_includes_quoted_values is set to "on"

3) If configuration_includes_quoted_values is set to "off" (new style
enabled) and token ends on a (, if the token is "parameters" return
FunctionParameters type, else do not end the token on '(' but to the
next whitespace character.
For example for the string "test(test) more"  will return as token the
"test(test)"

config set to "off" is legacy parser. Typo in your description?

If quoted-values is OFF. I would be expecting to get the token... "test(test) ... notice the missing end-quote since start-quote should have been ignored and treated as part of the single word token.


For the new style if someone want to use '(' character on a regex for
example, he should use quotes:
  'test(.*/)'
  "test(.*/)"

If someone does not want interpret macros he should use single quotes:
   'test(.*)\/$'

Is it OK?

Alex and I had a small discussion on IRC and neither of us could see a clear reason why this parser is doing anything with $macros right now. The only $macros supported by Squid so far are replaced earlier and don't exist in the config file lines this parse layer is tokenising. If you have any good reson to keep it please mention. Otherwise please remove that '$' special case inside quoted-string parsing. (my patch added #if 0 around it).


I was going to suggest removing the single-quoted strings support as well to avoid letting people quote randomly with either. But with the regex problems I think we should leave that in as the method to prevent \-unescaping and document that regex patterns be single-quoted strings while other tokens be double-quoted.


Also, Alex review of my patch mentioned the Bungled #1 and #2 markers as a problem. They can be dropped,it was a leftover from me trying to figure out when file nesting was occuring. The debugs lines on push/pop do a better job.

Amos

On 07/29/2013 08:25 PM, Amos Jeffries wrote:
Looks like nobody actually ran the trivial "-k parse" test using 3.HEAD
or the initial quoted strings code.

Attached is a patch which adds a secondary form of undo for handling
incorrectly identified ConfigParser::FunctionNameToken elements. Instead
of aborting Squid on any non-function token containing a '(' we take the
wrongly identified assumed function name element and reform it with the
'(' delimiter and trailing element. The resulting aggregate token which
should have been identified initially is then sent to the upper parser
layer.

Amos


Reply via email to