On 2/10/2013 5:12 a.m., Tsantilas Christos wrote:
On 09/27/2013 08:47 PM, Alex Rousskov wrote:
On 09/27/2013 09:39 AM, Amos Jeffries wrote:
On 28/09/2013 3:18 a.m., Tsantilas Christos wrote:
On 09/27/2013 08:23 AM, Alex Rousskov wrote:
Using approach (2) with flexible RE delimiter, we could write

      acl foo url_regex /ends[) (]/
or
      acl foo url_regex {ends[) (]}
or
      acl foo url_regex @ends[) (]@

and it will all work without double escaping.

Alex, in the "Revised approach to fixing configuration syntax" mail
thread you are proposing to use "regex::" prefix for regular
expressions. This is required for grammar consistency.
This is means that the regex should like :

       acl foo url_regex regex::/ends[) (]/
   or
       acl foo url_regex regex::{ends[) (]}
   or
       acl foo url_regex regex::@ends[) (]@
Yes, IF that syntax is adopted.


Okay Alex I think we can agree on that flexible-delimiter syntax to
avoid escaping.

I also agree with that regex:: prefix.

Is there anything else we have been disagreeing on?

As far as REs are concerned, we need to decide

1) Whether we want to support the new regex:: syntax at all or keep
using spaceless REs as before (at least for now) while reserving the
regex:: prefix.

What benefit would be gained from not using it?



2) If we want to support the new regex:: syntax:

2a) What characters do we allow as RE delimiters? Perl allows virtually
any non-whitespace character, even #, but we probably want to be more
restrictive.
Any non whitespace character I think is good choice. Else any
non-alphanumeric, non-whitespace character.

Any ASCII character 33 through 126 should be okay. The alphanumerical ones make little sense though. I have no objection if you want to narrow it down to punctuation characters. You may find it a bit difficult or a waste of CPU to test for complex boundaries in the character set when validating the delimiter start byte though.



2b) Do we add support for escaping sequences? As discussed a few emails
back, that support is necessary if we want to support arbitrary REs,
which is somewhat important for automated config generators. It is also
needed for (2c).
Escaping is important. The user will select the delimiters which
requires the less escaping but may he is not able to avoid it:
eg select this one
     regex::#A/test/with/one\#and/many/#
instead of this:
     regex::/A\/test\/with\/one#and\/many\//

If we allow the entire range of 33-126 characters it will be an exceedingly rare case where this is necessary. With regex one can always use . in place of a difficult character in the pattern.



2c) Do we add support for character sequences so that one can add
special characters and such? This also requires a form of escaping. For
example, here are some of the sequences supported by Perl (we do not
support all of them immediately, of course, but we need to reserve
\-escape if we want them in the future):

Sequence       Description
\t             tab               (HT, TAB)
\n             newline           (NL)
\r             return            (CR)
\f             form feed         (FF)
\b             backspace         (BS)
\a             alarm (bell)      (BEL)
\e             escape            (ESC)
\x{263A}       hex char          (example: SMILEY)
\x1b           restricted range hex char (example: ESC)
\N{name}       named Unicode character or character sequence
\N{U+263D}     Unicode character (example: FIRST QUARTER MOON)
\c[            control char      (example: chr(27))
\o{23072}      octal char        (example: SMILEY)
\033           restricted range octal char  (example: ESC)
We could also try to abuse existing character class [[:class:]] syntax
for those. For example, we can find and replace [[:squid::octal(32):]]
sequences with a space character.
Looks good idea to me to support perl syntax ...

No. Remember we are not parsing and compiling this pattern ourselves. There is a library behind it all for the pattern compilation. Those libraries support these things already in far better way than we can and there is no reason for us to allow these control codes as syntax delimiters in squid.conf.

Yes it would be a good idea to support other syntax patterns. But for that we can change the regex:: token to a new name for a new pattern syntax.


Amos

Reply via email to