Martin,

On Wed, Apr 14, 2010 at 4:25 AM, Martin Holst Swende <mar...@swende.se> wrote:
> Hi,
>
> I noticed that some greppers took extremely long time to run for certain
> input, especially two of them almost appeared to halt when I ran them.
> Those were ajax and svnusers. In ajax.py, the following regexp is used :
>
>        regex_string = '< *?script.*?>.*?'
>        regex_string +=
> '(XMLHttpRequest|eval\(\)|ActiveXObject\("Msxml2.XMLHTTP"\)|'
>        regex_string += 'ActiveXObject\("Microsoft.XMLHTTP"\))'
>        regex_string += '.*?</ *?script *?>'
>
> This is a very 'loose' regexp, which has a lot of wildcards, therefore
> it basically becomes ReDos:ed for certain pages. I suggest changing this
> to just checking for the calls. Also, it looks like the construct
> checking eval will check explicitly for "eval()", not "eval(foo)".
> Something like this should work, if we want to check use of any eval :

    Yes, I've been experiencing the same issue in my tests. One of the
things I'm afraid of is that we'll never be able to make all the
regexes work fast. Python's regular expression engine is not as fast
as most people believe. w3af has one big problem right now: "We don't
use the DOM in grep plugins", and we don't use it, because it's simply
not there, the core is not providing a DOM for each HTTP response. If
we would add that core feature, many plugins could stop using those
regular expressions, which in some cases are *slow*, thus increasing
the framework performance.

    But still, regular expressions are going to be needed... which
needs some kind of enhancement also. Something that I started to
research is the use of the Aho-Corasick algorithm [0] [1] that could
enhance the speed a lot (I hope!)

[0] http://sourceforge.net/apps/trac/w3af/ticket/160053
[1] http://pypi.python.org/pypi/acora/1.4

>        regex_string =
> '(XMLHttpRequest|eval\(|ActiveXObject\("Msxml2.XMLHTTP"\)|'
>        regex_string += 'ActiveXObject\("Microsoft.XMLHTTP"\))'
>
> svnusers.py contains the following
>    regex = '\$.*?: .*? .*? \d{4}[-/]\d{1,2}[-/]\d{1,2}'
>    regex += ' \d{1,2}:\d{1,2}:\d{1,2}.*? (.*?) (Exp )?\$'
>
> This can be enhanced by replacing wildcards with harder matches and
> removing optional stuff at the end (Exp )?.
> However, it seems to me that the following regexp would work and be much
> quicker :
>        regex  = "date:.*author:\W(\w+);"

    I will try these new regular expressions and see how they work out.

> Additionally, both of them contains the construction ".*?" which is
> strange. Unless I am not missing something special about python regexps,
> this should be ".*", as * means zero or more times, and ? is optional,
> which is one or zero times.

    As Tom said, the "?" indicates non-greedy match.

> Regards,
> Martin Holst Swende
>
> ------------------------------------------------------------------------------
> Download Intel&#174; Parallel Studio Eval
> Try the new software tools for yourself. Speed compiling, find bugs
> proactively, and fine-tune applications for parallel performance.
> See why Intel Parallel Studio got high marks during beta.
> http://p.sf.net/sfu/intel-sw-dev
> _______________________________________________
> W3af-develop mailing list
> W3af-develop@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/w3af-develop
>



-- 
Andrés Riancho
Founder, Bonsai - Information Security
http://www.bonsai-sec.com/
http://w3af.sf.net/

------------------------------------------------------------------------------
Download Intel&#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
_______________________________________________
W3af-develop mailing list
W3af-develop@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/w3af-develop

Reply via email to