Hi Romain,

On 8/17/15 11:03 AM, Romain Jacquinot wrote:
For now, the following regular expression features are supported by
content blockers:

  * Matching any character with “.”.
  * Matching ranges with the range syntax [a-b].
  * Quantifying expressions with “?”, “+” and “*”.
  * Groups with parenthesis.
  * Beginning of line (“^”) and end of line (“$”) marker

However, there doesn’t seem to be a way to find any of the alternatives
specified with “|” or find any character not between the brackets "[^]”.

Actually the "[^]" character set syntax is supported.

It could cause compile time issues on previous betas. That has been fixed in beta 5.

This is an issue when you want to block addresses like
*http://www.example.com <http://example.com>/*,
*https://example.com/*foobar.jpg, *http://example.com:*8080 but not
*http://example.com**.*hk.

The URLs are canonicalized before being processed by Content Blockers. That ensure some invariants on the format. For example, the end of the domain name always ends with ":" or "/". The domain name is always lowercase.

Typically, I write domain triggers like this:

"trigger": {
    "url-filter": "^https://([^:/]+\\.)example.com[:/]",
    "url-filter-is-case-sensitive": true
}


With at least one of those features, you could write something like:

     {
"action" : {
"type" : "block"
         },
"trigger" : {
"url-filter": "^https?://(www\\.)?example\\.com(/|:|?)+"

This does not work but
    "^https?://(www\\.)?example\\.com[/:?]+"
is equivalent.

         }
     }

or:

     {
"action" : {
"type" : "block"
         },
"trigger" : {
"url-filter" : "^https?://(www\\.)?example\\.com[^.]"

This pattern should work fine in beta 5.

         }
     }

Please note that in this case, the if-domain field wouldn’t help for
embedded content.

Should I write the same rule many times for the different cases (“/",
“:", “?”)? (doesn’t feel like a very elegant solution though). Since
they share the same prefix, will these rules be optimized? On the webkit
blog, it is written "/The rules are grouped by the prefix “https?://,
and it only counts as one rule with quantifiers./”. Does it mean that it
will only count as one rule against the 50,000 rule limit?

Having 3 rules with 3 different ending is fine as long as they are not quantified. Their prefix would be merged in the compiler frontend.

Having 3 rules with quantifiers per URL would likely cause your rules to be rejected by the compiler even under the 50k rule limit.

In any case, the 50k rule limit is on the number of trigger. The number of rule is counted before rules are merged.

Do you see an elegant solution to handle this case? If not, could you
please consider adding at least one of those regular expression features
for content blockers in Safari?

Are the solutions above good enough for your use case?

Benjamin
_______________________________________________
webkit-help mailing list
[email protected]
https://lists.webkit.org/mailman/listinfo/webkit-help

Reply via email to