Re: for the record, my ole32 binary tree search patch is correct

2009-10-28 Thread Yuriy Kaminskiy
On 27.10.2009 22:50, Vincent Povirk wrote:
 I sent the following patch recently:
 
 commit ee6856d874d687c4504914e61bcde3e6b8823bca
 Author: Vincent Povirk madewokh...@gmail.com
 Date:   Fri Oct 23 13:57:42 2009 -0500
 
 ole32: Don't use IEnumSTATSTG to search for elements of storages.
 
 We use it to do a linear search of a binary tree, which is overkill.
 Replace it with a simple binary search.
 
 It was accepted and caused this bug:
 http://bugs.winehq.org/show_bug.cgi?id=20477
 
 It actually turned out that the mistake was in a previous patch, but
 it didn't matter for reading files until we started to depend on the
 tree to be valid. Nonetheless, I've done a test to make sure we can
 really rely on the binary trees in storage files to be valid.
 
[...]
 This means that Windows does depend on the tree in a storage file to
 be correct, and Wine's behavior is now closer to Windows, even though
 it now fails in some cases when it used to succeed.

If I've not mistaken, that uses lstricmp internally for comparing keys.

Note, that wine lstr[i]cmp and ms windows lstr[i]cmp uses different sort order
(see last [disabled] test in dlls/kernel32/tests/locale.c).

So binary search using wine lstricmp on presored array (used windows lstricmp)
will certainly fail on some files.

PS looking at code one more time, current code seems already depends on
lstricmp working same [in updatePropertyChain()], so already broken. It's pure
luck that it is not failed already.

Disclaimer: I'm not expert in wine code, looked at code very quickly, and may be
misread something :-\





Re: for the record, my ole32 binary tree search patch is correct

2009-10-28 Thread Vincent Povirk
 If I've not mistaken, that uses lstricmp internally for comparing keys.

It uses lstrcmpiW, which according to MSDN can behave differently
based on locale, but I don't think the Wine version does.

The comparison is in the propertyNameCmp function.

I think that the MS storage code probably does something else.

 PS looking at code one more time, current code seems already depends on
 lstricmp working same [in updatePropertyChain()], so already broken. It's 
 pure
 luck that it is not failed already.

Well, yes, now it just depends on it even more (we'll fail reading
files too instead of just writing invalid ones).

I've checked the compound file spec from MS and the one from
OpenOffice. Neither of them specify the comparison beyond character
by character

I think what saves us is that programs rarely expose storage and
stream names directly to users, so the set of characters we have to
deal with in practice is limited.

Still, we really should use a comparison function that's at least
guaranteed to be consistent.

-- 
Vincent Povirk




Re: for the record, my ole32 binary tree search patch is correct

2009-10-28 Thread Yuriy Kaminskiy
On 28.10.2009 20:30, Vincent Povirk wrote:
 If I've not mistaken, that uses lstricmp internally for comparing keys.
 
 It uses lstrcmpiW, which according to MSDN can behave differently
 based on locale, but I don't think the Wine version does.

Note, that, it is not only difference in 0x7f nls chars (somewhat
expected), there are difference in ascii range too (see that test from
test/locale.c). [I'm saying about difference between wine vs. windows; I'm not
sure if windows lstricmpW /really/ work differently in different locales]

[...]
 I think what saves us is that programs rarely expose storage and
 stream names directly to users, so the set of characters we have to
 deal with in practice is limited.

I've looked at some .msi files - stream names contain problematic characters:
'.' vs. '_' (mismatches between wine and windows; there are more such chars),
'_' vs. '0' (mismatches between strcasecmp and strcoll|lstricmp).

So, while problematic files maybe very rare, but not impossible.

 Still, we really should use a comparison function that's at least
 guaranteed to be consistent.





for the record, my ole32 binary tree search patch is correct

2009-10-27 Thread Vincent Povirk
I sent the following patch recently:

commit ee6856d874d687c4504914e61bcde3e6b8823bca
Author: Vincent Povirk madewokh...@gmail.com
Date:   Fri Oct 23 13:57:42 2009 -0500

ole32: Don't use IEnumSTATSTG to search for elements of storages.

We use it to do a linear search of a binary tree, which is overkill.
Replace it with a simple binary search.

It was accepted and caused this bug:
http://bugs.winehq.org/show_bug.cgi?id=20477

It actually turned out that the mistake was in a previous patch, but
it didn't matter for reading files until we started to depend on the
tree to be valid. Nonetheless, I've done a test to make sure we can
really rely on the binary trees in storage files to be valid.

I wrote two programs: one that reads the directory entries of a
storage file and prints some information to stdout (useful for
debugging), and one that swaps the left and right child of every
directory entry, corrupting the trees in the file.

It turns out that corrupting an msi file in this way makes it fail on
both windows and on wine (although it crashes on wine; msi's error
handling code is broken). Running the program again (thus swapping the
children back to their original, correct position) makes it work.

However, on an older version of Wine without my change, the corrupted
msi file loads fine.

This means that Windows does depend on the tree in a storage file to
be correct, and Wine's behavior is now closer to Windows, even though
it now fails in some cases when it used to succeed.

(I think that current Wine without the fix for bug 20477 will load
only the corrupted files, but I didn't test this.)

I've attached my test programs so they're in the public record.

-- 
Vincent Povirk
#!/usr/bin/env python

import sys
import struct

DIFSECT = 0xfffc
FATSECT = 0xfffd
ENDOFCHAIN = 0xfffe
FREESECT = 0x

def read_sector(string, sector):
return string[512*(sector+1):512*(sector+2)]

def write_sector(f, string, sector):
f.seek(512*(sector+1), 0)
f.write(string)

f = file(sys.argv[1], 'r+b')

data = f.read()

num_fat_sectors = struct.unpack(L, data[0x2c:0x30])[0]

fat_sectors = list(struct.unpack(109L, data[0x4c:0x4c+109*4]))

next_difat_sector = struct.unpack(L, data[0x44:0x48])[0]
while next_difat_sector != ENDOFCHAIN:
next_difat_data = read_sector(data, next_difat_sector)
fat_sectors.extend(struct.unpack(127L, next_difat_data[0:508]))
next_difat_sector = struct.unpack(L, next_difat_data[508:512])[0]

fat_entries = []

for fat_sector in fat_sectors:
if fat_sector == FREESECT:
break
fat_data = read_sector(data, fat_sector)
fat_entries.extend(struct.unpack(128L, fat_data))

next_dir_sector = struct.unpack(L, data[0x30:0x34])[0]
current_sid = 0
while next_dir_sector != ENDOFCHAIN:
dir_data = read_sector(data, next_dir_sector)
new_dir_data = list(dir_data)
for i in range(4):
dirent = dir_data[128*i:128*(i+1)]
# swap the left and right child links
new_dir_data[128*i+0x48:128*i+0x4c] = dirent[0x44:0x48]
new_dir_data[128*i+0x44:128*i+0x48] = dirent[0x48:0x4c]
current_sid += 1
new_dir_data = ''.join(new_dir_data)
write_sector(f, new_dir_data, next_dir_sector)
next_dir_sector = fat_entries[next_dir_sector]

f.close()
#!/usr/bin/env python

import sys
import struct

DIFSECT = 0xfffc
FATSECT = 0xfffd
ENDOFCHAIN = 0xfffe
FREESECT = 0x

def read_sector(string, sector):
return string[512*(sector+1):512*(sector+2)]

data = file(sys.argv[1], 'rb').read()

num_fat_sectors = struct.unpack(L, data[0x2c:0x30])[0]

fat_sectors = list(struct.unpack(109L, data[0x4c:0x4c+109*4]))

next_difat_sector = struct.unpack(L, data[0x44:0x48])[0]
while next_difat_sector != ENDOFCHAIN:
next_difat_data = read_sector(data, next_difat_sector)
fat_sectors.extend(struct.unpack(127L, next_difat_data[0:508]))
next_difat_sector = struct.unpack(L, next_difat_data[508:512])[0]

fat_entries = []

for fat_sector in fat_sectors:
if fat_sector == FREESECT:
break
fat_data = read_sector(data, fat_sector)
fat_entries.extend(struct.unpack(128L, fat_data))

next_dir_sector = struct.unpack(L, data[0x30:0x34])[0]
current_sid = 0
while next_dir_sector != ENDOFCHAIN:
dir_data = read_sector(data, next_dir_sector)
for i in range(4):
dirent = dir_data[128*i:128*(i+1)]
print SID 0x%x % current_sid
print Name=%s % repr(dirent[0:64].decode('utf_16'))
print Type=%i % ord(dirent[0x42])
print Left Child=0x%x % struct.unpack(L, dirent[0x44:0x48])[0]
print Right Child=0x%x % struct.unpack(L, dirent[0x48:0x4c])[0]
print Dir Root=0x%x % struct.unpack(L, dirent[0x4c:0x50])[0]
current_sid += 1
next_dir_sector = fat_entries[next_dir_sector]