Hi all,
I followed the discussion about the/some regex and I'm a bit suprised
that there seems to be not that mutch real practice with regex.
This is independent of w3af, so I'd marked all following off-topic.
I also put some of the comment together here, instead of replying to
each, as it's my opinion, that all comments should be read in context.
Please be tolerant to my pedantic comments below. It's just about regex,
nothing else ;-)
Achim
<off-topic>
-----
!! Are these equivalent?
!!
!! <\W*script[^>]*>
!! <\W*script.*>
!!
!! Yes it does make sense because, using <\W*script.*> you get a greedy regular
!! expression. I haven'tchecked the source, but if the response parsing function
!! parses more than one line (or the html textis one liner) things could get
really
!! buggy. <\W*script.*> will make a similar [0] match, while<\W*script[^>]*>
will
!! make a proper [1] match...
Beside the newline behaviour some other modifiers may also make a
difference. I'm not sure which regex lib is used by python, but if
it is a (so-called) PCRE-lib, you also need to know some compile
time flags (mainly according . and $ behaviour).
Regex are no fun here, unfortunately :-(
-----
!! > don't fully understand the meaning of this section: "[^>]*",
!! > could you explain it?
!!
!! Sure, it means "zero or more of any character except for >".
No, not always, see condition above and below.
"any character" may be any character except or including a newline
which make a big difference.
-----
!! I don't know if there is a difference in performance.
!! --------------
!!
!! I did the test in performance... for the following string: <script
type="text/javascript">
!! The results are self explanatory:
!!
!! <\W*script[^>]*> matched it in 29 steps
!! <\W*script.*?> matched it in 73 steps
Such performance tests should be taken with care.
There're so much pre-conditions and things you need to know to qualify
such a test that simply posting 2 numbers is rather useless.
Just 2 question out of many others:
did you use a NFA or DFA regex engine?
did you use pre-compiled regex?
-----
!! http://blog.stevenlevithan.com/archives/greedy-lazy-performance
hmm, surprise :-/
as Steve's site is a good resource for regex problems in general,
this description is not true in all cases as it depends on the type
of the regex engine (NFA vs. DFA) also.
-----
If someone is interested in doing regex more perfect, I highly recommend
reading Jeffrey Friedl's Mastering Reg.Expressions (1'st *and* 3'rd edition).
It contains in particular very interesting examples (including detailled
description) about perfromance and things like .* vs. [^>]* etc. etc..
I.g. very complex but unambigious/non-greedy regex are most often much faster
than a "lazy" human understandable regex. A very good practical example are
ModSecurity' core rule sets.
Anyway, I guess speed is not that much important for w3af as for a WAF :)
Also --even I've not yet seen someone doing it here-- do not use other
tools to test/improve your regex if you do not know 101% that this tools
are designed to support your regex flavour (most likely python for w3af).
There're are some good tools out in the wild, but you need to know the
dragons of your regex flavour and the dragons of the tool. Take care.
<shameless ad>
If someone just wants a simple "human" description about a regex, have
a look at http://ende.my-stp.net/EnDe.html?onlyRX
(which does not support 100% python, but also does not claim to do so:)
all comments wellcome ..
</shameless ad>
</off-topic>
------------------------------------------------------------------------------
------------------------------------------------------------------------------
_______________________________________________
W3af-develop mailing list
W3af-develop@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/w3af-develop