List,

This is my attempt to improve the performance of the xpath evaluation given a DOM Element. The original (and current) version is in httpResponse.py. Examples of how this is used can be found at:
ajax.py, fileUpload.py, formAutocomplete.py, etc


    def getDOM2(self):

        '''

        TODO: Put docstring here

        '''

        class DOM(object):

            def xpath(self, tag, xpathpredicate='.'):

                xpath = etree.XPath(xpathpredicate)

                root = etree.fromstring(self.body,

                                        etree.HTMLParser(recover=True))

                context = etree.iterwalk(root, events=('start',), tag=tag)

                try:

                    for evt, elem in context:

                        if xpath(elem):

                            yield elem

                        while elem.getprevious() is not None:

                            del elem.getparent()[0]

                except etree.XPathSyntaxError:

                        om.out.debug('Invalid XPath expression: "%s"' %

                                     xpathpredicate)

                        raise

                del context

        dom = DOM()

        dom.body = self.body

        return dom



Unfortunately this didn't work out as expected. It is slower.

 code = '''

f = open("index-form-two-fields.html")

html = f.read()

f.close()

u = url_object('http://w3af.com')

res = core.data.url.httpResponse.httpResponse(200, html, {'content-type': 
'text/html'}, u, u)

for i in res.getDOM2().xpath('input', 
"translate(@type,'PASWORD','pasword')='password'"):

    pass

'''

 setup = '''import sys

sys.path.append('/home/jandalia/workspace/w3af.unicode');

from core.data.parsers.urlParser import url_object;

import core.data.url.httpResponse

'''

 t = timeit.Timer(code, setup)

 min(t.repeat(repeat=3, number=10000))

27.584304094314575




Using the original version:

 code = '''

f = open("/home/jandalia/Desktop/index-form-two-fields.html")

html = f.read()

f.close()

u = url_object('http://w3af.com')

res = core.data.url.httpResponse.httpResponse(200, html, {'content-type': 
'text/html'}, u, u)

dom = res.getDOM()

for i in dom.xpath("//input[translate(@type,'PASWORD','pasword')='password']"):

    pass

'''

 t = timeit.Timer(code, setup)
 min(t.repeat(repeat=3, number=10000))
3.8396580219268799


In other words, it is about 7 times slower.
If anyone has an idea on how to improve this code it would be very appreciated. The html doc used for the tests. is attached.

Thanks!

Javier

Note: Some useful info can be found here: http://www.ibm.com/developerworks/xml/library/x-hiperfparse/


Title: Two password fields
Su número de Documento
Su clave Santander Río
Su usuario
------------------------------------------------------------------------------
Magic Quadrant for Content-Aware Data Loss Prevention
Research study explores the data loss prevention market. Includes in-depth
analysis on the changes within the DLP market, and the criteria used to
evaluate the strengths and weaknesses of these DLP solutions.
http://www.accelacomm.com/jaw/sfnl/114/51385063/
_______________________________________________
W3af-develop mailing list
W3af-develop@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/w3af-develop

Reply via email to