List,
This is my attempt to improve the performance of the xpath evaluation
given a DOM Element.
The original (and current) version is in httpResponse.py. Examples of
how this is used can be found at:
ajax.py, fileUpload.py, formAutocomplete.py, etc
def getDOM2(self):
'''
TODO: Put docstring here
'''
class DOM(object):
def xpath(self, tag, xpathpredicate='.'):
xpath = etree.XPath(xpathpredicate)
root = etree.fromstring(self.body,
etree.HTMLParser(recover=True))
context = etree.iterwalk(root, events=('start',), tag=tag)
try:
for evt, elem in context:
if xpath(elem):
yield elem
while elem.getprevious() is not None:
del elem.getparent()[0]
except etree.XPathSyntaxError:
om.out.debug('Invalid XPath expression: "%s"' %
xpathpredicate)
raise
del context
dom = DOM()
dom.body = self.body
return dom
Unfortunately this didn't work out as expected. It is slower.
code = '''
f = open("index-form-two-fields.html")
html = f.read()
f.close()
u = url_object('http://w3af.com')
res = core.data.url.httpResponse.httpResponse(200, html, {'content-type':
'text/html'}, u, u)
for i in res.getDOM2().xpath('input',
"translate(@type,'PASWORD','pasword')='password'"):
pass
'''
setup = '''import sys
sys.path.append('/home/jandalia/workspace/w3af.unicode');
from core.data.parsers.urlParser import url_object;
import core.data.url.httpResponse
'''
t = timeit.Timer(code, setup)
min(t.repeat(repeat=3, number=10000))
27.584304094314575
Using the original version:
code = '''
f = open("/home/jandalia/Desktop/index-form-two-fields.html")
html = f.read()
f.close()
u = url_object('http://w3af.com')
res = core.data.url.httpResponse.httpResponse(200, html, {'content-type':
'text/html'}, u, u)
dom = res.getDOM()
for i in dom.xpath("//input[translate(@type,'PASWORD','pasword')='password']"):
pass
'''
t = timeit.Timer(code, setup)
min(t.repeat(repeat=3, number=10000))
3.8396580219268799
In other words, it is about 7 times slower.
If anyone has an idea on how to improve this code it would be very
appreciated. The html doc used for the tests. is attached.
Thanks!
Javier
Note: Some useful info can be found here:
http://www.ibm.com/developerworks/xml/library/x-hiperfparse/
Title: Two password fields
------------------------------------------------------------------------------
Magic Quadrant for Content-Aware Data Loss Prevention
Research study explores the data loss prevention market. Includes in-depth
analysis on the changes within the DLP market, and the criteria used to
evaluate the strengths and weaknesses of these DLP solutions.
http://www.accelacomm.com/jaw/sfnl/114/51385063/
_______________________________________________
W3af-develop mailing list
W3af-develop@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/w3af-develop