Taras, On Wed, Apr 11, 2012 at 12:11 PM, Andres Riancho <andres.rian...@gmail.com> wrote: > On Wed, Apr 11, 2012 at 4:56 AM, Taras <ox...@oxdef.info> wrote: >> Andres, >> >> >>>>> If the framework IS working like this, I think that the shared >>>>> fuzzable request list wouldn't do much good. If it is not working like >>>>> this (and I would love to get an output log to show it), it seems that >>>>> we have a lot of work ahead of us. >>>> >>>> >>>> And w3afCore need to filter requests from discovery plugins on every loop >>>> in >>>> _discover_and_bruteforce(), am I right? >>> >>> >>> It should filter things as they come out of the plugin and before >>> adding them to the fuzzable request list, >> >> Agree, but as I see in w3afCore.py there is no filtering in it. >> I just have added it [0]. It shows good results on the test suite (see >> attachment). >> >> Without filtering: >> Found 2 URLs and 87 different points of injection. >> ... >> Scan finished in 3 minutes 30 seconds. >> >> With filtering: >> Found 2 URLs and 3 different points of injection. >> ... >> Scan finished in 11 seconds. > > Reviewing this and reproducing in my environment. Will have some opinions in > ~1h
All right... now I see your concern and understand it. I run the scan you proposed and was able to reproduce the issue, which is actually generated by a simple constant: webSpider.py: MAX_VARIANTS = 40 Let me explain what is going on here and what your patch is doing: #1 In the current trunk version, w3af's webSpider is parsing the index.php file you sent and identifies many links, most of them variants of each other. Before returning them to the w3afCore the webSpider uses the variant_db class and MAX_VARIANTS to define if enough variants of that link have been analyzed. If there are not enough then the variant needs to be analyzed so it is returned to the core. Given that MAX_VARIANTS is 40 [Note: I changed this to 5 in the latest commit.], the webSpider returns all/most of the links in your index.php to the core. a) This makes sense, since a link to a previously unknown section might be present in "article.php?id=25" and NOT present in "article.php?id=35", so w3af needs to make a choice on how many of those variants are going to be analyzed and how many are going to be left out. b) The same happens with vulnerabilities, there might be a vulnerability in the foo parameter of "article.php?id=28&foo=bar" when the id=28 and the vulnerability might NOT be present when the id is 32. #2 With your patch, which filters all variants and "flattens" the previously found ones, w3afCore only ends up with "article.php?id=number" and ""article.php?id=number&foo=string" , which won't allow for other discovery plugins to analyze the variants (#1 - a) and audit plugins to identify the more complex vulnerabilities (#1 - b). What will happen (of course) is that the scanner will be VERY fast. But lets try to understand what happens with the audit plugins when they are presented with multiple variants. According to 1-b they should send multiple requests and those should generate a lot of network traffic, slowing the scan down. Here is a grep of a scan with the audit.sqli plugin enabled: dz0@dz0-laptop:~/workspace/w3af$ grep "d'z\"0" output-w3af.txt GET http://moth/w3af/discovery/web_spider/variants/article.php?id=145&foo=d'z"0 returned HTTP code "200" - id: 93 GET http://moth/w3af/discovery/web_spider/variants/article.php?id=d'z"0&foo=bar returned HTTP code "200" - id: 94 GET http://moth/w3af/discovery/web_spider/variants/article.php?id=d'z"0 returned HTTP code "200" - id: 96 GET http://moth/w3af/discovery/web_spider/variants/article.php?id=d'z"0 returned HTTP code "200" - id: 98 - from cache. GET http://moth/w3af/discovery/web_spider/variants/article.php?id=d'z"0 returned HTTP code "200" - id: 100 - from cache. GET http://moth/w3af/discovery/web_spider/variants/article.php?id=d'z"0 returned HTTP code "200" - id: 102 - from cache. GET http://moth/w3af/discovery/web_spider/variants/article.php?id=d'z"0 returned HTTP code "200" - id: 104 - from cache. GET http://moth/w3af/discovery/web_spider/variants/article.php?id=d'z"0&foo=bar returned HTTP code "200" - id: 106 - from cache. GET http://moth/w3af/discovery/web_spider/variants/article.php?id=122&foo=d'z"0 returned HTTP code "200" - id: 107 GET http://moth/w3af/discovery/web_spider/variants/article.php?id=d'z"0&foo=bar returned HTTP code "200" - id: 109 - from cache. GET http://moth/w3af/discovery/web_spider/variants/article.php?id=119&foo=d'z"0 returned HTTP code "200" - id: 110 GET http://moth/w3af/discovery/web_spider/variants/article.php?id=d'z"0&foo=bar returned HTTP code "200" - id: 112 - from cache. GET http://moth/w3af/discovery/web_spider/variants/article.php?id=82&foo=d'z"0 returned HTTP code "200" - id: 113 GET http://moth/w3af/discovery/web_spider/variants/article.php?id=d'z"0&foo=bar returned HTTP code "200" - id: 115 - from cache. GET http://moth/w3af/discovery/web_spider/variants/article.php?id=75&foo=d'z"0 returned HTTP code "200" - id: 116 The most important thing to notice here are the repeated HTTP requests to the variants and the "from cache" strings at the end of the repeated requests. For example: GET http://moth/w3af/discovery/web_spider/variants/article.php?id=d'z"0&foo=bar returned HTTP code "200" - id: 93 GET http://moth/w3af/discovery/web_spider/variants/article.php?id=d'z"0&foo=bar returned HTTP code "200" - id: 95 - from cache. And then, we're following the logic from #1-b and actually sending these two requests to the remote web application: GET http://moth/w3af/discovery/web_spider/variants/article.php?id=215&foo=d'z"0 returned HTTP code "200" - id: 96 GET http://moth/w3af/discovery/web_spider/variants/article.php?id=29&foo=d'z"0 returned HTTP code "200" - id: 105 I'm not saying that this is all perfect. The downsides of this scan strategy are: * Slow - Because more HTTP requests are sent - Because more pattern matching is applied to more HTTP responses - Because (maybe) the responses that are retrieved from the cache are slow to get - Because "MAX_VARIANTS = 40" was too high But of course this has good things like #1-a and #1-b, which provides the scanner with better code coverage at the end. Maybe we could have different scan strategies, or change MAX_VARIANTS to be a user defined parameter, or... (please send your ideas).- Regards, >> >> >>> Please let me know if the discovery process is NOT working as we >>> expect and if we have to filter stuff somewhere >> >> See above. >> >> [0] https://sourceforge.net/apps/trac/w3af/changeset/4861 >> -- >> Taras >> http://oxdef.info > > > > -- > Andrés Riancho > Project Leader at w3af - http://w3af.org/ > Web Application Attack and Audit Framework -- Andrés Riancho Project Leader at w3af - http://w3af.org/ Web Application Attack and Audit Framework ------------------------------------------------------------------------------ Better than sec? Nothing is better than sec when it comes to monitoring Big Data applications. Try Boundary one-second resolution app monitoring today. Free. http://p.sf.net/sfu/Boundary-dev2dev _______________________________________________ W3af-develop mailing list W3af-develop@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/w3af-develop