On 22.08.2012 12:23, Józef Kucia wrote:
On Tue, Aug 21, 2012 at 10:52 PM, gurketsky <gurket...@googlemail.com
<mailto:gurket...@googlemail.com>> wrote:

    I just like to present the state of the ID3DXConstantTable
    implementation, so that possibly no work is done twice. This goes
    specifically to Józef. I'm not sure what's the plan on this. There
    are two problems which arise and I did not had time to sort those
    out, yet.


Thanks for notifying me. I was about to write some tests for structures
in constant tables. I see that you've already written such tests.

Cheers,
Józef Kucia

Well, I just had a closer look again. My speed test triggers a problem, so it's not really comparable. But it looks like native only allocates handles, if it really needs them. So I'm not sure we like to go the same approach or if it is fine allocating them all like I've done that. I fixed the test and native speed advantage was blown away. Although, if that's fine with your opinion, you could reuse the code or I could clean it up a bit and send it. I haven't send it and improved it, yet, because the problem with solution 1. or 2. isn't solved for me. I'm a little bit against version 1., because I think speed might be an issue and no one had a technical argument against 2. I don't think a extra handle list for handles with "small" values is the way to got, because it may also hit the values where strings could be in memory. The extra handle list would be 3., but I think it needs a lot more memory for just adding a "layer" for checking the handles. Thus 2. is a lot better than 3. (which I haven't explained in detail).

To tackle the problem:
1. The handle and table mixing could be worked around by using a global list for all tables and searching the handles in there. That should be easy to add. The problem I see with that, might be speed related, when we have a lot of handles, searching the list will be slow. Through, I'm fine with that solution and would add it. It will show, if it really is so slow...

2. But the other solution isn't dead for me, yet. I had another look at the D3DXHANDLE usage and the question what the hell is D3DXCONSTTABLE_LARGEADDRESSAWARE used for? It was said, it's bad and broken, but I haven't seen why, just that it is ugly, but I couldn't see a technical reason not to do so. What's specifically the problem:

The argument is, that D3DXHANDLES are distinguished from strings by using the highest bit (bit #31). Thus with LARGEADDRESSAWARE the usage of stings as D3DXHANDLEs is not allowed anymore (see http://msdn.microsoft.com/en-us/library/windows/desktop/bb943959%28v=vs.85%29.aspx). Also this would speed up the detection of a handle dramatically, well it doesn't check for validness, but native doesn't do that, if you pass a garbled handle, it will crash.

D3DXHANDLE handle_from_constant(struct ctab_constant *constant)
{
    if (largeadressaware && constant) return (D3DXHANDLE)constant;
    if (constant) return (D3DXHANDLE)((UINT_PTR)constant | 0x80000000);
    return NULL;
}
struct ctab_constant *is_valid_constant(struct ID3DXConstantTableImpl *table, D3DXHANDLE handle)
{
    if (largeadressaware) return (struct ctab_constant *)handle;
if ((UINT_PTR)handle >> 31) return (struct ctab_constant *)((UINT_PTR)handle & 0x7fffffff);
    return get_constant_by_name(table, NULL, handle);
}

According to http://en.wikipedia.org/wiki/Virtual_address_space:
32bit on 32bit without LARGEADDRESSAWARE: has only 2gb (default 32bit)
32bit on 64bit without LARGEADDRESSAWARE: has only 2gb (default 32bit)
64bit on 64bit without LARGEADDRESSAWARE: has only 2gb
32bit on 32bit with LARGEADDRESSAWARE: has 3gb
32bit on 64bit with LARGEADDRESSAWARE: has 4gb
64bit on 64bit with LARGEADDRESSAWARE: has 8tb (default 64bit)

So in cases, where the exe is linked with LARGEADDRESSAWARE, d3dx9 would have to be used with D3DXCONSTTABLE_LARGEADDRESSAWARE. That way it's the same for os with 32bit and 64bit. The only problem I see, nowhere is said, that the 2gb will always be the lowest 2gb. But my tests showed, that I always get the lower 31bit of addresses in my test runs when allocating memory. Thus I'm very unlucky by not getting a higher address or this might be the way it works on windows. Has anyone a technical argument against this solution?

I hope this helps you to make the correct decisions.

Cheers
Rico


Reply via email to