Taras, Please read inline,
On Thu, Jan 21, 2010 at 12:34 PM, Taras <naplan...@gmail.com> wrote: > Hi, all! > > Today when I test W3AF on large site with URLs like > http://site.com/news.php?id=1, http://site.com/categories.php?id=2 and > so on > I found some problem. It seems that webSpider don't consider that such > URLs are "same". w0w, its amazing how we stumble upon the same problems all the time! If you take a look at my commit[0] exactly one week ago (Jan 15) and specifically at the comment, you'll see that I'm trying to fix this issue in the code section BEFORE the audit process. This is my code comment extracted from w3afCore.py: # # What I want to do here, is filter the repeated fuzzable requests. # For example, if the spidering process found: # - http://host.tld/?id=3739286 # - http://host.tld/?id=3739285 # - http://host.tld/?id=3739282 # - http://host.tld/?id=3739212 # # I don't want to have all these different fuzzable requests. The reason is that # audit plugins will try to send the payload to each parameter, thus generating # the following requests: # - http://host.tld/?id=payload1 # - http://host.tld/?id=payload1 # - http://host.tld/?id=payload1 # - http://host.tld/?id=payload1 # # w3af has a cache, but its still a waste of time to send those requests. # # Now lets analyze this with more than one parameter. Spidered URIs: # - http://host.tld/?id=3739286&action=create # - http://host.tld/?id=3739285&action=create # - http://host.tld/?id=3739282&action=remove # - http://host.tld/?id=3739212&action=remove # # Generated requests: # - http://host.tld/?id=payload1&action=create # - http://host.tld/?id=3739286&action=payload1 # - http://host.tld/?id=payload1&action=create # - http://host.tld/?id=3739285&action=payload1 # - http://host.tld/?id=payload1&action=remove # - http://host.tld/?id=3739282&action=payload1 # - http://host.tld/?id=payload1&action=remove # - http://host.tld/?id=3739212&action=payload1 # # In cases like this one, I'm sending these repeated requests: # - http://host.tld/?id=payload1&action=create # - http://host.tld/?id=payload1&action=create # - http://host.tld/?id=payload1&action=remove # - http://host.tld/?id=payload1&action=remove # But there is not much I can do about it... (except from having a nice cache) # # TODO: Is the previous statement completely true? # The problem that I wanted to fix there is that the discovery process will navigate ALL URLs and the audit process will get a lot of "repeated" URLs. The fix I added was simple and filtered some URL before they got to audit, but there is still the problem of the discovery plugins(not the webSpider only) following ALL URLS. This is something that might bring problems, or might be a blessing. Here are two examples: - The good case: http://host.tld/index.php?id=1 links to http://host.tld/index.php?id=3 http://host.tld/index.php?id=1 links to http://host.tld/index.php?id=5 http://host.tld/index.php?id=1 links to http://host.tld/index.php?id=7 http://host.tld/index.php?id=7 links to http://host.tld/run_command.php?cmd=whoami In this case, if w3af decides that "id=1" is not an action parameter and decides not to follow all the different ids, then we wouldn't have gotten to id=7 that links to run_command.php that is the one that has the vulnerability. - The bad case: http://host.tld/index.php?id=1 links to ( http://host.tld/search.php?text= , http://host.tld/view_image.php?name= ) http://host.tld/index.php?id=2 links to ( http://host.tld/search.php?text= , http://host.tld/view_image.php?name= ) http://host.tld/index.php?id=3 links to ( http://host.tld/search.php?text= , http://host.tld/view_image.php?name= ) http://host.tld/index.php?id=4 links to ( http://host.tld/search.php?text= , http://host.tld/view_image.php?name= ) http://host.tld/index.php?id=5 links to ( http://host.tld/search.php?text= , http://host.tld/view_image.php?name= ) http://host.tld/index.php?id=6 links to ( http://host.tld/search.php?text= , http://host.tld/view_image.php?name= ) http://host.tld/index.php?id=7 links to ( http://host.tld/search.php?text= , http://host.tld/view_image.php?name= ) http://host.tld/index.php?id=8 links to ( http://host.tld/search.php?text= , http://host.tld/view_image.php?name= ) ... http://host.tld/index.php?id=65265 links to ( http://host.tld/search.php?text= , http://host.tld/view_image.php?name= ) In this case, w3af spider will follow all the links with the different ids, and because of the application's nature it won't find anything interesting by doing it. This seems to be a complex issue, but its actually not that complex. The problem is that we have to decide whether to follow all the links or not. We could create a function that decides if a link should be followed or not based on the previous links and their contents. I'm thinking how to fix the bad case: the webspider could follow the first 100 links with 100 different ids, and then perform an analysis of the situation and when it finds the 101th id, it will just ignore it. The problem that we could find here is that maybe... "?id=3855" has a link to a part of the application that is vulnerable, and we're not following that link because of our "101th decision". > And in some point of view it is correct behavior. But very often these > params are not "action" params which can make really different > pages with different structure. Simply these pages differs in some text > content. Yes, totally agree. I think that maybe these URLs should be compared based on their "structure" (outgoing links). If the contents of all the URLs with id=1 to id=100 link to (a,b,c) ; then we could safely (???) infer that id=101 to id=N are also going to link to (a,b,c) and that we're not missing anything important. The question here is... how safe do we want to be? Should we follow 20? 100? 1000? 10k? I think that 300 would be a good number, but I don't really have a way to backup my feeling that 300 would be ok. What do you guys think? > I also found letter in our mail list from Adi Mutu "[W3af-users] > mod_rewrite functionality and variations" [0] > about same problem. As I know current workaround is using of > maxDiscoveryLoops option. Yes... but thats not a real solution. The solution to this problem would be to fix the REAL problem. > But what is the best way to scan such sites with w3af? May be we will > add some option to webSpider or we need some core option? I think that this would be an option in the core. In the discovery function in w3afCore, we should perform some strict analysis on what the plugins are returning, and then ignore some of the found links if necessary. Anyone feels like trying to fix this issue? @Taras: Thank you for one of the most interesting emails to this mailing list :) You really made me work on this one :) > [0] > http://sourceforge.net/mailarchive/message.php?msg_id=140339.90793.qm%40web43507.mail.sp1.yahoo.com [0] http://w3af.svn.sourceforge.net/viewvc/w3af/trunk/core/controllers/w3afCore.py?r1=3243&r2=3303 > Taras > -- > "Software is like sex: it's better when it's free.", - Linus Torvalds. > > ------------------------------------------------------------------------------ > Throughout its 18-year history, RSA Conference consistently attracts the > world's best and brightest in the field, creating opportunities for Conference > attendees to learn about information security's most important issues through > interactions with peers, luminaries and emerging and established companies. > http://p.sf.net/sfu/rsaconf-dev2dev > _______________________________________________ > W3af-develop mailing list > W3af-develop@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/w3af-develop > -- Andrés Riancho Founder, Bonsai - Information Security http://www.bonsai-sec.com/ http://w3af.sf.net/ ------------------------------------------------------------------------------ Throughout its 18-year history, RSA Conference consistently attracts the world's best and brightest in the field, creating opportunities for Conference attendees to learn about information security's most important issues through interactions with peers, luminaries and emerging and established companies. http://p.sf.net/sfu/rsaconf-dev2dev _______________________________________________ W3af-develop mailing list W3af-develop@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/w3af-develop