Javier, On Tue, Jul 19, 2011 at 10:21 AM, Javier Andalia <javier_anda...@rapid7.com> wrote: > List, > > This is my attempt to improve the performance of the xpath evaluation given > a DOM Element. > The original (and current) version is in httpResponse.py. Examples of how > this is used can be found at: > ajax.py, fileUpload.py, formAutocomplete.py, etc > > > def getDOM2(self): > > ''' > > TODO: Put docstring here > > ''' > > class DOM(object): > > def xpath(self, tag, xpathpredicate='.'): > > xpath = etree.XPath(xpathpredicate) > > root = etree.fromstring(self.body, > > etree.HTMLParser(recover=True)) > > > context = etree.iterwalk(root, events=('start',), tag=tag) > > try: > > for evt, elem in context: > > if xpath(elem): > > yield elem > > while elem.getprevious() is not None: > > del elem.getparent()[0] > > except etree.XPathSyntaxError: > > om.out.debug('Invalid XPath expression: "%s"' % > > xpathpredicate) > > raise > > del context
Are you sure that this is equivalent to the old implementation? I'm guessing that the old implementation is faster because it's C with a Python wrapper and this is "python calling many times different C functions" ? Have you tested [0] to see WHERE the CPU is consumed? [0] http://code.google.com/p/jrfonseca/wiki/Gprof2Dot > dom = DOM() > > dom.body = self.body > > return dom > > > > Unfortunately this didn't work out as expected. It is slower. > >>>> code = ''' > > f = open("index-form-two-fields.html") > > html = f.read() > > f.close() > > u = url_object('http://w3af.com') > > res = core.data.url.httpResponse.httpResponse(200, html, {'content-type': > 'text/html'}, u, u) > > for i in res.getDOM2().xpath('input', > "translate(@type,'PASWORD','pasword')='password'"): > > pass > > ''' > >>>> setup = '''import sys > > sys.path.append('/home/jandalia/workspace/w3af.unicode'); > > from core.data.parsers.urlParser import url_object; > > import core.data.url.httpResponse > > ''' > >>>> t = timeit.Timer(code, setup) > >>>> min(t.repeat(repeat=3, number=10000)) > > 27.584304094314575 > >>>> > > > Using the original version: > >>>> code = ''' > > f = open("/home/jandalia/Desktop/index-form-two-fields.html") > > html = f.read() > > f.close() > > u = url_object('http://w3af.com') > > res = core.data.url.httpResponse.httpResponse(200, html, {'content-type': > 'text/html'}, u, u) > > dom = res.getDOM() > > for i in > dom.xpath("//input[translate(@type,'PASWORD','pasword')='password']"): > > pass > > ''' > >>>> t = timeit.Timer(code, setup) >>>> min(t.repeat(repeat=3, number=10000)) > > 3.8396580219268799 > > > In other words, it is about 7 times slower. > If anyone has an idea on how to improve this code it would be very > appreciated. The html doc used for the tests. is attached. > > Thanks! > > Javier > > Note: Some useful info can be found here: > http://www.ibm.com/developerworks/xml/library/x-hiperfparse/ > > > > ------------------------------------------------------------------------------ > Magic Quadrant for Content-Aware Data Loss Prevention > Research study explores the data loss prevention market. Includes in-depth > analysis on the changes within the DLP market, and the criteria used to > evaluate the strengths and weaknesses of these DLP solutions. > http://www.accelacomm.com/jaw/sfnl/114/51385063/ > _______________________________________________ > W3af-develop mailing list > W3af-develop@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/w3af-develop > > -- Andrés Riancho Director of Web Security at Rapid7 LLC Founder at Bonsai Information Security Project Leader at w3af ------------------------------------------------------------------------------ Magic Quadrant for Content-Aware Data Loss Prevention Research study explores the data loss prevention market. Includes in-depth analysis on the changes within the DLP market, and the criteria used to evaluate the strengths and weaknesses of these DLP solutions. http://www.accelacomm.com/jaw/sfnl/114/51385063/ _______________________________________________ W3af-develop mailing list W3af-develop@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/w3af-develop