Hi Fulvio,

At 08.09 23/01/2005 +0100, Fulvio Risso wrote:
Hi Gareth.

I don't provide you any example just because what I'm saying is evident from
the Xerces code.
There's a global list instead of something local to the node. And, probably,
the list is too small for handling data associated to each node.

In every piece of code there are trade-offs; you can choose to write the fastest code, but it's heavy on memory; or you can write the smallest code, but it will be slower. So you optimize against a reasonable test case. For this reason it's important to see your test case, as it shows how you use the setUserData/getUserData API (e.g. you associate 1 or 10 keys to a node? to every node or only to elements? how many nodes are involved? and so on).


So, can you send us (or just to me, privately, if you have concerns about posting your code in public) a small testcase, e.g. take DOMCount and, after the parsing phase, populate the document with all the user data you need and the query it all the times you need. Having such testcase we can test the various data structures to find a model that has acceptable performances in almost all usage scenarios.

Thanks,
Alberto


After writing this email, I changed the implementation of the DOMNodeImpl
class, adding a pointer that is going to be used in case the
get/setuserData() is invoked on this node.

I know that this is not a clean implementation (since this pointer supports
only ONE associated data, while the set/getUserData() are able to associate
more data through a key), but these are the results of my application:

- xerces 2.4.0 (Win32): 11 sec
- xerces 2.6.0 (Win32): 36 sec
- xerces 2.6.0 (Win32) + my patch: 3.5 sec

In my case, the overhead of the setUserData() is not important since I call
this function only a few times; however, it is very imporant the overhead of
the getUserData() which gets called > 1M times during the execution.

If you want, I can test a possible patch for Xerces (Win32) without any
problem (instead, it is pretty difficult to send my code to you, since it
has several dependencies).

For instance, if having a pointer associated to each node may be an
unacceptable overhead (as Alberto said in another reply), it may be very
helpful to have a function that is able to change the size of the hash (if
it exists, sorry then, I wasn't able to locate it).

Thanks for your time,

        fulvio

===================================================================
Date: Mon, 17 Jan 2005 15:48:52 +0000
From: Gareth Reakes <[EMAIL PROTECTED]>
Subject: Performance issues in DOMNode::setUserData() /
DOMNode::getUserData()
Content-Type: text/plain; charset=ISO-8859-1; format=flowed


Hi,

Fulvio Risso wrote:


> However, from an architecture point of view, I believe data should be > attached to every single node instead of using a global list.

I don't recall why it was done this way, does anybody else? It was like
that when I first started using xerces. I would assume it would be to
keep the DOM nodes smaller. This is a not a frequently used feature.

>
> Does anyone has any suggestion in order to avoid such this problem?

Is the slow down from accessing or putting them in? You could try and
make the hash modulus bigger. Alby put a optimization in a while ago for
this stuff. I have had a chat with him and he says he has another
possible optimization. Could you provide us with the test case and use
case for it (if the docs are large then send them off list).

Cheers,

Gareth


-- Gareth Reakes, Managing Director Parthenon Computing +44-1865-811184 http://www.parthcomp.com


--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]



--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to