Taras,

    Please read inline,

On Thu, Jan 21, 2010 at 12:34 PM, Taras <naplan...@gmail.com> wrote:
> Hi, all!
>
> Today when I test W3AF on large site with URLs like
> http://site.com/news.php?id=1, http://site.com/categories.php?id=2 and
> so on
> I found some problem. It seems that webSpider don't consider that such
> URLs are "same".

    w0w, its amazing how we stumble upon the same problems all the
time! If you take a look at my commit[0] exactly one week ago (Jan 15)
and specifically at the comment, you'll see that I'm trying to fix
this issue in the code section BEFORE the audit process. This is my
code comment extracted from w3afCore.py:

                    #
                    #   What I want to do here, is filter the repeated
fuzzable requests.
                    #   For example, if the spidering process found:
                    #       - http://host.tld/?id=3739286
                    #       - http://host.tld/?id=3739285
                    #       - http://host.tld/?id=3739282
                    #       - http://host.tld/?id=3739212
                    #
                    #   I don't want to have all these different
fuzzable requests. The reason is that
                    #   audit plugins will try to send the payload to
each parameter, thus generating
                    #   the following requests:
                    #       - http://host.tld/?id=payload1
                    #       - http://host.tld/?id=payload1
                    #       - http://host.tld/?id=payload1
                    #       - http://host.tld/?id=payload1
                    #
                    #   w3af has a cache, but its still a waste of
time to send those requests.
                    #
                    #   Now lets analyze this with more than one
parameter. Spidered URIs:
                    #       - http://host.tld/?id=3739286&action=create
                    #       - http://host.tld/?id=3739285&action=create
                    #       - http://host.tld/?id=3739282&action=remove
                    #       - http://host.tld/?id=3739212&action=remove
                    #
                    #   Generated requests:
                    #       - http://host.tld/?id=payload1&action=create
                    #       - http://host.tld/?id=3739286&action=payload1
                    #       - http://host.tld/?id=payload1&action=create
                    #       - http://host.tld/?id=3739285&action=payload1
                    #       - http://host.tld/?id=payload1&action=remove
                    #       - http://host.tld/?id=3739282&action=payload1
                    #       - http://host.tld/?id=payload1&action=remove
                    #       - http://host.tld/?id=3739212&action=payload1
                    #
                    #   In cases like this one, I'm sending these
repeated requests:
                    #       - http://host.tld/?id=payload1&action=create
                    #       - http://host.tld/?id=payload1&action=create
                    #       - http://host.tld/?id=payload1&action=remove
                    #       - http://host.tld/?id=payload1&action=remove
                    #   But there is not much I can do about it...
(except from having a nice cache)
                    #
                    #   TODO: Is the previous statement completely true?
                    #

The problem that I wanted to fix there is that the discovery process
will navigate ALL URLs and the audit process will get a lot of
"repeated" URLs. The fix I added was simple and filtered some URL
before they got to audit, but there is still the problem of the
discovery plugins(not the webSpider only) following ALL URLS. This is
something that might bring problems, or might be a blessing. Here are
two examples:

    - The good case:
        http://host.tld/index.php?id=1 links to http://host.tld/index.php?id=3
        http://host.tld/index.php?id=1 links to http://host.tld/index.php?id=5
        http://host.tld/index.php?id=1 links to http://host.tld/index.php?id=7
        http://host.tld/index.php?id=7 links to
http://host.tld/run_command.php?cmd=whoami

    In this case, if w3af decides that "id=1" is not an action
parameter and decides not to follow all the different ids, then we
wouldn't have gotten to id=7 that links to run_command.php that is the
one that has the vulnerability.


    - The bad case:
        http://host.tld/index.php?id=1 links to (
http://host.tld/search.php?text= ,
http://host.tld/view_image.php?name= )
        http://host.tld/index.php?id=2 links to (
http://host.tld/search.php?text= ,
http://host.tld/view_image.php?name= )
        http://host.tld/index.php?id=3 links to (
http://host.tld/search.php?text= ,
http://host.tld/view_image.php?name= )
        http://host.tld/index.php?id=4 links to (
http://host.tld/search.php?text= ,
http://host.tld/view_image.php?name= )
        http://host.tld/index.php?id=5 links to (
http://host.tld/search.php?text= ,
http://host.tld/view_image.php?name= )
        http://host.tld/index.php?id=6 links to (
http://host.tld/search.php?text= ,
http://host.tld/view_image.php?name= )
        http://host.tld/index.php?id=7 links to (
http://host.tld/search.php?text= ,
http://host.tld/view_image.php?name= )
        http://host.tld/index.php?id=8 links to (
http://host.tld/search.php?text= ,
http://host.tld/view_image.php?name= )
        ...
        http://host.tld/index.php?id=65265 links to (
http://host.tld/search.php?text= ,
http://host.tld/view_image.php?name= )

    In this case, w3af spider will follow all the links with the
different ids, and because of the application's nature it won't find
anything interesting by doing it.

    This seems to be a complex issue, but its actually not that
complex. The problem is that we have to decide whether to follow all
the links or not. We could create a function that decides if a link
should be followed or not based on the previous links and their
contents. I'm thinking how to fix the bad case: the webspider could
follow the first 100 links with 100 different ids, and then perform an
analysis of the situation and when it finds the 101th id, it will just
ignore it.

    The problem that we could find here is that maybe... "?id=3855"
has a link to a part of the application that is vulnerable, and we're
not following that link because of our "101th decision".

> And in some point of view it is correct behavior. But very often these
> params are not "action" params which can  make really different
> pages with different structure. Simply these pages differs in some text 
> content.

    Yes, totally agree. I think that maybe these URLs should be
compared based on their "structure" (outgoing links). If the contents
of all the URLs with id=1 to id=100 link to (a,b,c) ; then we could
safely (???) infer that id=101 to id=N are also going to link to
(a,b,c) and that we're not missing anything important.

    The question here is... how safe do we want to be? Should we
follow 20? 100? 1000? 10k? I think that 300 would be a good number,
but I don't really have a way to backup my feeling that 300 would be
ok. What do you guys think?

> I also found letter in our mail list from Adi Mutu "[W3af-users]
> mod_rewrite functionality and variations" [0]
> about same problem. As I know current workaround is using of
> maxDiscoveryLoops option.

    Yes... but thats not a real solution. The solution to this problem
would be to fix the REAL problem.

> But what is the best way to scan such sites with w3af? May be we will
> add some option to webSpider or we need some core option?

    I think that this would be an option in the core. In the discovery
function in w3afCore, we should perform some strict analysis on what
the plugins are returning, and then ignore some of the found links if
necessary.

    Anyone feels like trying to fix this issue?

@Taras: Thank you for one of the most interesting emails to this
mailing list :) You really made me work on this one :)

> [0] 
> http://sourceforge.net/mailarchive/message.php?msg_id=140339.90793.qm%40web43507.mail.sp1.yahoo.com

[0] 
http://w3af.svn.sourceforge.net/viewvc/w3af/trunk/core/controllers/w3afCore.py?r1=3243&r2=3303

> Taras
> --
> "Software is like sex: it's better when it's free.", - Linus Torvalds.
>
> ------------------------------------------------------------------------------
> Throughout its 18-year history, RSA Conference consistently attracts the
> world's best and brightest in the field, creating opportunities for Conference
> attendees to learn about information security's most important issues through
> interactions with peers, luminaries and emerging and established companies.
> http://p.sf.net/sfu/rsaconf-dev2dev
> _______________________________________________
> W3af-develop mailing list
> W3af-develop@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/w3af-develop
>



-- 
Andrés Riancho
Founder, Bonsai - Information Security
http://www.bonsai-sec.com/
http://w3af.sf.net/

------------------------------------------------------------------------------
Throughout its 18-year history, RSA Conference consistently attracts the
world's best and brightest in the field, creating opportunities for Conference
attendees to learn about information security's most important issues through
interactions with peers, luminaries and emerging and established companies.
http://p.sf.net/sfu/rsaconf-dev2dev
_______________________________________________
W3af-develop mailing list
W3af-develop@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/w3af-develop

Reply via email to