[Wikidata-bugs] [Maniphest] [Updated] T105126: Evaluate pattern constraints (safely)

2015-10-28 Thread thiemowmde
thiemowmde added a project: Wikidata-Sprint-2015-11-03.

TASK DETAIL
  https://phabricator.wikimedia.org/T105126

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: thiemowmde
Cc: thiemowmde, Nikki, Popcorndude, Aklapper, daniel, Wikidata-bugs, aude, 
GWicke, csteipp



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Updated] T105126: Evaluate pattern constraints (safely)

2015-10-27 Thread thiemowmde
thiemowmde added a blocking task: T102892: Collect all regular expressions used 
in Wikidata's Template:Constraint:Format.

TASK DETAIL
  https://phabricator.wikimedia.org/T105126

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: thiemowmde
Cc: thiemowmde, Nikki, Popcorndude, Aklapper, daniel, Wikidata-bugs, aude, 
GWicke, csteipp



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Updated] T105126: Evaluate pattern constraints (safely)

2015-10-26 Thread Popcorndude
Popcorndude added a comment.

I messed with the constraints a bit, and it would be pretty easy to get up to 
~50% with the constraints you outlined (the numbers I gave before may have 
forgotten to skip newlines, lowering the count). Adding + and * covers 3/4, and 
most of the rest could be rewritten without to much trouble (other than 
https://phabricator.wikimedia.org/P1793 and possibly a few others that are 
really basically impossible).

  (\\.|[^(){}\[\]\\]|\{\d+(,\d+)?\}|\[(?!^)(\\\[|\\\]|[^\[\]])*\])*

should work (though it does let character classes through).

After a bit more fiddling:

https://phabricator.wikimedia.org/P234, https://phabricator.wikimedia.org/P274, 
https://phabricator.wikimedia.org/P305, P353, 
https://phabricator.wikimedia.org/P395, https://phabricator.wikimedia.org/P428, 
https://phabricator.wikimedia.org/P529, https://phabricator.wikimedia.org/P866, 
https://phabricator.wikimedia.org/P998, 
https://phabricator.wikimedia.org/P1190, 
https://phabricator.wikimedia.org/P1256, 
https://phabricator.wikimedia.org/P1257, 
https://phabricator.wikimedia.org/P1472, 
https://phabricator.wikimedia.org/P1612, 
https://phabricator.wikimedia.org/P1662, 
https://phabricator.wikimedia.org/P1793, 
https://phabricator.wikimedia.org/P1986, 
https://phabricator.wikimedia.org/P2015 are the only properties where I don't 
see a way to rewrite the format with your constraints plus infinite repetition 
(i.e. no groups), and with a lot of these there probably is a upper limit, it's 
just that no one bothered to write it into the format.

With that in mind, the list shortens to https://phabricator.wikimedia.org/P234, 
https://phabricator.wikimedia.org/P274, https://phabricator.wikimedia.org/P428, 
https://phabricator.wikimedia.org/P1472, 
https://phabricator.wikimedia.org/P1612, 
https://phabricator.wikimedia.org/P1793 (most of which are probably things you 
didn't intend to support anyway).


TASK DETAIL
  https://phabricator.wikimedia.org/T105126

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Popcorndude
Cc: Nikki, Popcorndude, Aklapper, daniel, Wikidata-bugs, aude, GWicke, csteipp



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Updated] T105126: Evaluate pattern constraints (safely)

2015-10-25 Thread Popcorndude
Popcorndude added a comment.

Those criteria accept 62 (8%) of the current constraints.
Adding character classes (\d is everywhere) brings it up to 166 (23%)

I would suggest allowing infinite repetition if the thing being repeated cannot 
overlap with the next thing. Although to prevent that from requiring the 
checking regex to execute the format on itself maybe infinite repetition should 
be allowed on character classes if they don't overlap with the following thing. 
(e.g. \d*[a-z] but not \d*0), and then also treat them as atomic (automatically 
convert \d+ to (?>\d+))

It would probably make things a lot easier for the people making the formats if 
each property was allowed to have multiple format strings (match this or this).

As for grouping, maybe allow it but only non-nested atomic, since the 
alternative is to rewrite this (admittedly slightly extreme example) as a flat 
list of alternatives

https?://((academia|android|anime|apple|arduino|astronomy|aviation|beer|bicycles|biology|bitcoin|blender|boardgames|bricks|buddhism|chemistry|chess|chinese|christianity|codegolf|codereview|cogsci|cooking|craftcms|crypto|cs|cstheory|datascience|dba|diy|drupal|dsp|earthscience|ebooks|electronics|ell|emacs|english|expatriates|expressionengine|fitness|freelancing|french|gamedev|gaming|gardening|genealogy|german|gis|graphicdesign|ham|hermeneutics|hinduism|history|homebrew|islam|italian|japanese|joomla|judaism|linguistics|magento|martialarts|math|matheducators|mathematica|mechanics|meta|moderators|money|movies|music|networkengineering|opendata|outdoors|parenting|patents|pets|philosophy|photo|physics|pm|poker|politics|productivity|programmers|puzzling|quant|raspberrypi|reverseengineering|robotics|rpg|russian|salesforce|scicomp|scifi|security|sharepoint|skeptics|softwarerecs|sound|space|spanish|sports|sqa|startups|stats|sustainability|tex|tor|travel|tridion|unix|ux|video|webapps|webmasters|
windowsphone|wordpress|workplace|writers)\.stackexchange\.com|askubuntu\.com|mathoverflow\.net|pt\.stackoverflow\.com|serverfault\.com|stackapps\.com|stackoverflow\.com|superuser\.com)/(tags|questions/tagged)/.*
(https://phabricator.wikimedia.org/P1482)

I'll see if I can make some regexs for this in a bit.


TASK DETAIL
  https://phabricator.wikimedia.org/T105126

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Popcorndude
Cc: Popcorndude, Aklapper, daniel, Wikidata-bugs, aude, GWicke, csteipp



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Updated] T105126: Evaluate pattern constraints (safely)

2015-10-24 Thread Popcorndude
Popcorndude added a comment.

I did some analysis of what regex features are actually used: 
https://www.wikidata.org/wiki/User:Popcorndude/formats

Only 6 properties use backrefs, lookahead, or lookbehind: 
https://phabricator.wikimedia.org/P1256, 
https://phabricator.wikimedia.org/P1257, 
https://phabricator.wikimedia.org/P1297, 
https://phabricator.wikimedia.org/P1472, 
https://phabricator.wikimedia.org/P1612, 
https://phabricator.wikimedia.org/P1793. Everything else uses only "?*+ [] () | 
{} {,} \d\D\s\S\w ."

Is this analysis helpful? If so, is there anything I else you'd like me to 
check?


TASK DETAIL
  https://phabricator.wikimedia.org/T105126

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Popcorndude
Cc: Popcorndude, Aklapper, daniel, Wikidata-bugs, aude, GWicke, csteipp



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Updated] T105126: Evaluate pattern constraints (safely)

2015-10-24 Thread Popcorndude
Popcorndude added a comment.

I letting things like 0?\d{8} through the filter, and most of what's left is 
checking file extensions. I can make them not backtrack at all if commons 
filenames don't contain periods (I don't know what characters are allowed). 
They are generally of the form .*\.()

Besides files and the 6 I mentioned before, the only ones I see are
https://phabricator.wikimedia.org/P1949 oai:.*:.*
https://phabricator.wikimedia.org/P274  
([αβγδφωλμπ]-)?([([]*[A-Z☐][ub]?[a-z]?[₁₂₃₄₅₆₇₈₉₀]*(\)?[¹²³⁴⁵⁶⁷⁸⁹⁰]*[⁺⁻]?)?[])|,₁₂₃₄₅₆₇₈₉₀]*(·\(?[-0-9.]*n?\)?)?)+
https://phabricator.wikimedia.org/P367  .+ (S|s)ymbol.*\.svg
https://phabricator.wikimedia.org/P395  ([A-Z0-9\p{L}]+(,\w)?)+
https://phabricator.wikimedia.org/P866  
(([a-z]+|39)\-)*([1-9][0-9]*|[a-z]+|[1-9][a-z0-9][a-z0-9])

everything else either doesn't have this problem (as far as I can tell) or are 
guaranteed to be looking for less than 10 characters.


TASK DETAIL
  https://phabricator.wikimedia.org/T105126

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Popcorndude
Cc: Popcorndude, Aklapper, daniel, Wikidata-bugs, aude, GWicke, csteipp



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Updated] T105126: Evaluate pattern constraints (safely)

2015-07-08 Thread daniel
daniel added projects: Security-General, Performance.

TASK DETAIL
  https://phabricator.wikimedia.org/T105126

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: daniel
Cc: Aklapper, daniel, Wikidata-bugs, aude, GWicke, csteipp, Malyacko, P.Copp



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs