Taras,

On Wed, Apr 11, 2012 at 12:11 PM, Andres Riancho
<andres.rian...@gmail.com> wrote:
> On Wed, Apr 11, 2012 at 4:56 AM, Taras <ox...@oxdef.info> wrote:
>> Andres,
>>
>>
>>>>>     If the framework IS working like this, I think that the shared
>>>>> fuzzable request list wouldn't do much good. If it is not working like
>>>>> this (and I would love to get an output log to show it), it seems that
>>>>> we have a lot of work ahead of us.
>>>>
>>>>
>>>> And w3afCore need to filter requests from discovery plugins on every loop
>>>> in
>>>> _discover_and_bruteforce(), am I right?
>>>
>>>
>>> It should filter things as they come out of the plugin and before
>>> adding them to the fuzzable request list,
>>
>> Agree, but as I see in w3afCore.py there is no filtering in it.
>> I just have added it [0]. It shows good results on the test suite (see
>> attachment).
>>
>> Without filtering:
>>  Found 2 URLs and 87 different points of injection.
>>  ...
>>  Scan finished in 3 minutes 30 seconds.
>>
>> With filtering:
>>  Found 2 URLs and 3 different points of injection.
>>  ...
>>  Scan finished in 11 seconds.
>
> Reviewing this and reproducing in my environment. Will have some opinions in 
> ~1h

All right... now I see your concern and understand it. I run the scan
you proposed and was able to reproduce the issue, which is actually
generated by a simple constant:

    webSpider.py:
    MAX_VARIANTS = 40

Let me explain what is going on here and what your patch is doing:
    #1 In the current trunk version, w3af's webSpider is parsing the
index.php file you sent and identifies many links, most of them
variants of each other. Before returning them to the w3afCore the
webSpider uses the variant_db class and MAX_VARIANTS to define if
enough variants of that link have been analyzed. If there are not
enough then the variant needs to be analyzed so it is returned to the
core. Given that MAX_VARIANTS is 40 [Note: I changed this to 5 in the
latest commit.], the webSpider returns all/most of the links in your
index.php to the core.

    a) This makes sense, since a link to a previously unknown section
might be present in "article.php?id=25" and NOT present in
"article.php?id=35", so w3af needs to make a choice on how many of
those variants are going to be analyzed and how many are going to be
left out.

    b) The same happens with vulnerabilities, there might be a
vulnerability in the foo parameter of "article.php?id=28&foo=bar" when
the id=28 and the vulnerability might NOT be present when the id is
32.

    #2 With your patch, which filters all variants and "flattens" the
previously found ones, w3afCore only ends up with
"article.php?id=number" and ""article.php?id=number&foo=string" ,
which won't allow for other discovery plugins to analyze the variants
(#1 - a) and audit plugins to identify the more complex
vulnerabilities (#1 - b). What will happen (of course) is that the
scanner will be VERY fast.

But lets try to understand what happens with the audit plugins when
they are presented with multiple variants. According to 1-b they
should send multiple requests and those should generate a lot of
network traffic, slowing the scan down. Here is a grep of a scan with
the audit.sqli plugin enabled:

dz0@dz0-laptop:~/workspace/w3af$ grep "d'z\"0" output-w3af.txt
GET http://moth/w3af/discovery/web_spider/variants/article.php?id=145&foo=d'z"0
returned HTTP code "200" - id: 93
GET http://moth/w3af/discovery/web_spider/variants/article.php?id=d'z"0&foo=bar
returned HTTP code "200" - id: 94
GET http://moth/w3af/discovery/web_spider/variants/article.php?id=d'z"0
returned HTTP code "200" - id: 96
GET http://moth/w3af/discovery/web_spider/variants/article.php?id=d'z"0
returned HTTP code "200" - id: 98 - from cache.
GET http://moth/w3af/discovery/web_spider/variants/article.php?id=d'z"0
returned HTTP code "200" - id: 100 - from cache.
GET http://moth/w3af/discovery/web_spider/variants/article.php?id=d'z"0
returned HTTP code "200" - id: 102 - from cache.
GET http://moth/w3af/discovery/web_spider/variants/article.php?id=d'z"0
returned HTTP code "200" - id: 104 - from cache.
GET http://moth/w3af/discovery/web_spider/variants/article.php?id=d'z"0&foo=bar
returned HTTP code "200" - id: 106 - from cache.
GET http://moth/w3af/discovery/web_spider/variants/article.php?id=122&foo=d'z"0
returned HTTP code "200" - id: 107
GET http://moth/w3af/discovery/web_spider/variants/article.php?id=d'z"0&foo=bar
returned HTTP code "200" - id: 109 - from cache.
GET http://moth/w3af/discovery/web_spider/variants/article.php?id=119&foo=d'z"0
returned HTTP code "200" - id: 110
GET http://moth/w3af/discovery/web_spider/variants/article.php?id=d'z"0&foo=bar
returned HTTP code "200" - id: 112 - from cache.
GET http://moth/w3af/discovery/web_spider/variants/article.php?id=82&foo=d'z"0
returned HTTP code "200" - id: 113
GET http://moth/w3af/discovery/web_spider/variants/article.php?id=d'z"0&foo=bar
returned HTTP code "200" - id: 115 - from cache.
GET http://moth/w3af/discovery/web_spider/variants/article.php?id=75&foo=d'z"0
returned HTTP code "200" - id: 116

The most important thing to notice here are the repeated HTTP requests
to the variants and the "from cache" strings at the end of the
repeated requests. For example:

GET http://moth/w3af/discovery/web_spider/variants/article.php?id=d'z"0&foo=bar
returned HTTP code "200" - id: 93
GET http://moth/w3af/discovery/web_spider/variants/article.php?id=d'z"0&foo=bar
returned HTTP code "200" - id: 95 - from cache.

And then, we're following the logic from #1-b and actually sending
these two requests to the remote web application:

GET http://moth/w3af/discovery/web_spider/variants/article.php?id=215&foo=d'z"0
returned HTTP code "200" - id: 96
GET http://moth/w3af/discovery/web_spider/variants/article.php?id=29&foo=d'z"0
returned HTTP code "200" - id: 105

I'm not saying that this is all perfect. The downsides of this scan
strategy are:
    * Slow
        - Because more HTTP requests are sent
        - Because more pattern matching is applied to more HTTP responses
        - Because (maybe) the responses that are retrieved from the
cache are slow to get
        - Because "MAX_VARIANTS = 40" was too high

But of course this has good things like #1-a and #1-b, which provides
the scanner with better code coverage at the end.

Maybe we could have different scan strategies, or change MAX_VARIANTS
to be a user defined parameter, or... (please send your ideas).-

Regards,
>>
>>
>>> Please let me know if the discovery process is NOT working as we
>>> expect and if we have to filter stuff somewhere
>>
>> See above.
>>
>> [0] https://sourceforge.net/apps/trac/w3af/changeset/4861
>> --
>> Taras
>> http://oxdef.info
>
>
>
> --
> Andrés Riancho
> Project Leader at w3af - http://w3af.org/
> Web Application Attack and Audit Framework



-- 
Andrés Riancho
Project Leader at w3af - http://w3af.org/
Web Application Attack and Audit Framework

------------------------------------------------------------------------------
Better than sec? Nothing is better than sec when it comes to
monitoring Big Data applications. Try Boundary one-second 
resolution app monitoring today. Free.
http://p.sf.net/sfu/Boundary-dev2dev
_______________________________________________
W3af-develop mailing list
W3af-develop@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/w3af-develop

Reply via email to