Javier,
On Tue, Jul 19, 2011 at 10:21 AM, Javier Andalia
<[email protected]> wrote:
> List,
>
> This is my attempt to improve the performance of the xpath evaluation given
> a DOM Element.
> The original (and current) version is in httpResponse.py. Examples of how
> this is used can be found at:
> ajax.py, fileUpload.py, formAutocomplete.py, etc
>
>
> def getDOM2(self):
>
> '''
>
> TODO: Put docstring here
>
> '''
>
> class DOM(object):
>
> def xpath(self, tag, xpathpredicate='.'):
>
> xpath = etree.XPath(xpathpredicate)
>
> root = etree.fromstring(self.body,
>
> etree.HTMLParser(recover=True))
>
>
> context = etree.iterwalk(root, events=('start',), tag=tag)
>
> try:
>
> for evt, elem in context:
>
> if xpath(elem):
>
> yield elem
>
> while elem.getprevious() is not None:
>
> del elem.getparent()[0]
>
> except etree.XPathSyntaxError:
>
> om.out.debug('Invalid XPath expression: "%s"' %
>
> xpathpredicate)
>
> raise
>
> del context
Are you sure that this is equivalent to the old implementation?
I'm guessing that the old implementation is faster because it's C
with a Python wrapper and this is "python calling many times different
C functions" ? Have you tested [0] to see WHERE the CPU is consumed?
[0] http://code.google.com/p/jrfonseca/wiki/Gprof2Dot
> dom = DOM()
>
> dom.body = self.body
>
> return dom
>
>
>
> Unfortunately this didn't work out as expected. It is slower.
>
>>>> code = '''
>
> f = open("index-form-two-fields.html")
>
> html = f.read()
>
> f.close()
>
> u = url_object('http://w3af.com')
>
> res = core.data.url.httpResponse.httpResponse(200, html, {'content-type':
> 'text/html'}, u, u)
>
> for i in res.getDOM2().xpath('input',
> "translate(@type,'PASWORD','pasword')='password'"):
>
> pass
>
> '''
>
>>>> setup = '''import sys
>
> sys.path.append('/home/jandalia/workspace/w3af.unicode');
>
> from core.data.parsers.urlParser import url_object;
>
> import core.data.url.httpResponse
>
> '''
>
>>>> t = timeit.Timer(code, setup)
>
>>>> min(t.repeat(repeat=3, number=10000))
>
> 27.584304094314575
>
>>>>
>
>
> Using the original version:
>
>>>> code = '''
>
> f = open("/home/jandalia/Desktop/index-form-two-fields.html")
>
> html = f.read()
>
> f.close()
>
> u = url_object('http://w3af.com')
>
> res = core.data.url.httpResponse.httpResponse(200, html, {'content-type':
> 'text/html'}, u, u)
>
> dom = res.getDOM()
>
> for i in
> dom.xpath("//input[translate(@type,'PASWORD','pasword')='password']"):
>
> pass
>
> '''
>
>>>> t = timeit.Timer(code, setup)
>>>> min(t.repeat(repeat=3, number=10000))
>
> 3.8396580219268799
>
>
> In other words, it is about 7 times slower.
> If anyone has an idea on how to improve this code it would be very
> appreciated. The html doc used for the tests. is attached.
>
> Thanks!
>
> Javier
>
> Note: Some useful info can be found here:
> http://www.ibm.com/developerworks/xml/library/x-hiperfparse/
>
>
>
> ------------------------------------------------------------------------------
> Magic Quadrant for Content-Aware Data Loss Prevention
> Research study explores the data loss prevention market. Includes in-depth
> analysis on the changes within the DLP market, and the criteria used to
> evaluate the strengths and weaknesses of these DLP solutions.
> http://www.accelacomm.com/jaw/sfnl/114/51385063/
> _______________________________________________
> W3af-develop mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/w3af-develop
>
>
--
Andrés Riancho
Director of Web Security at Rapid7 LLC
Founder at Bonsai Information Security
Project Leader at w3af
------------------------------------------------------------------------------
Magic Quadrant for Content-Aware Data Loss Prevention
Research study explores the data loss prevention market. Includes in-depth
analysis on the changes within the DLP market, and the criteria used to
evaluate the strengths and weaknesses of these DLP solutions.
http://www.accelacomm.com/jaw/sfnl/114/51385063/
_______________________________________________
W3af-develop mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/w3af-develop