Public bug reported:

Binary package hint: python-fuse

The xmp.py example filesystem included in the python-fuse - and thus probably 
many, if not all, filesystems derived from that one and directly accessing an 
underlying filesystem tree - has some issues with reading somewhat large files. 
By large I mean e.g. 4GB, where the error (described below) is nearly 
guaranteed to occure. Large can but also mean 3MB, where the bug only shows up 
from time to time.
Copying such a file into a mountpoint mounted by xmp.py and then comparing the 
original with that copy gives differences. I.e. "diff original mount/copy" 
would give a differing line and similarly "cmp original mount/copy" a differing 
byte. Actually  not only one line/byte, but from that first line/byte onwards, 
the whole copy seems to be "shifted". The size of the two files although is 
equal.
As said, depending on the size of the file, this bug appears more or less 
often. The difference also starts on seemingly random positions. These two 
circumstances lead me to think of a race condition and here is, what I found 
out:

python-fuse (or FUSE in general?) has a debug mode, so I mounted an xmp FS with 
the "-d" option, that activates it.
python /usr/share/doc/python-fuse/examples/xmp.py -d -o root=/home/user/test 
/home/user/mount
Now we have a fuse daemon running in the foreground, kindly telling us, what he 
is doing on the FS mounted at ~/mount, which is just a "mirror" of ~/test.
No assume that ~/lipsum.txt is a file of any kind but with a size that is 
appropriate to meet the above description, so that it triggers the bug:
cp ~/lipsum.txt ~/mount
Now just compare until cmp reports a differing byte:
cmp ~/lipsum.txt ~/mount/lipsum.txt
(NOTE for ~3MB files, this sometimes took me 20 tries, so be patient or choose 
an even bigger file)
Let's assume we got this:
lipsum.txt mount/lipsum.txt differieren: Byte 376833, Zeile 64658.
Now the good thing about cmp is, that - per default - stops reading after 
finding the first difference. We should now be able to see some odd behavior in 
the last lines of debug output of the xmp FS.
So here they are:
---
READ[136293708] 131072 bytes from 114688
   READ[136293708] 131072 bytes
   unique: 1594, error: 0 (Success), outsize: 131088
unique: 1595, opcode: READ (15), nodeid: 2, insize: 80
READ[136293708] 131072 bytes from 245760
   READ[136293708] 131072 bytes
   unique: 1595, error: 0 (Success), outsize: 131088
unique: 1596, opcode: READ (15), nodeid: 2, insize: 80
unique: 1597, opcode: READ (15), nodeid: 2, insize: 80
!!!!!!
READ[136293708] 131072 bytes from 376832
READ[136293708] 131072 bytes from 507904
!!!!!!
   READ[136293708] 131072 bytes
   unique: 1597, error: 0 (Success), outsize: 131088
   READ[136293708] 131072 bytes
   unique: 1596, error: 0 (Success), outsize: 131088
unique: 1598, opcode: FLUSH (25), nodeid: 2, insize: 64
FLUSH[136293708]
   unique: 1598, error: 0 (Success), outsize: 16
unique: 1599, opcode: RELEASE (18), nodeid: 2, insize: 64
RELEASE[136293708] flags: 0x8000
   unique: 1599, error: 0 (Success), outsize: 16
---
I marked the important part with exclamation marks: Two read commands directly 
after another and note the similarity between the offset of the first of them 
and the offset for the first difference, that cmp reported: the first byte read 
from that offset - in that first read command - differs; now to the second read 
command:
using cmp -b can also give you the values of the differing bytes. The following 
is just an example: Say cmp finds that byte 376833 differs and says "the 
original is 'x'" and the copy has a 'z' there, then you can open your file in a 
hex editor (or any other tool, that has a function like "Go to byte offset"), 
go to the offset mentioned by the _second_ read between my exclamation marks, 
and you will find exactly that 'z' there.
So here is the race condition: The two marked read commands happen 
simultaniously; the first one seeks to the offset mentioned first. the second 
one seeks to the second offset. The first "thread" (I don't know whether this 
is about threads, let me just call it thread for now) takes up its work again; 
it reads - BUT from the second offset (explaining the 'z' on the position of 
the 'x').
Outputting the offset and the actual file position (self.file.tell()) inside 
the read method gives differing results for the two marked reads, they can for 
example be reversed (e.g. first read has offset 20, tell() says 10, second read 
has offset 20, tell() says 10; the first read puts the cursor to 20 using seek, 
the second one does the same with offset 10. the first then reads from 10 on -- 
10 bytes -- and comes to stop at offset 20, from which position then the second 
one starts actually reading), for this particular example I got:
1. offset: 376832, position: 507904
2. offset: 507904, position: 638976,
so the second read again starts at the position, where the first one stopped 
reading AFTER being placed to a wrong offset by the second seek().

All this is very hard to explain - especially for me as a non native
English speaker. I hope you got the idea, if there are open questions,
please don't hesitate to demand an answer / further explanation!

This is on a dual core, 64bit AMD processer (especially the "dual core" may be 
important?) running Kubuntu 8.04.1 Hardy Heron - the 32 bit, KDE 3 version of 
it, to be precise.
python-fuse is version 1:0.2-pre3-4ubuntu1, which should be the latest. In 
general I should have all current updates applied; if you need the version of 
another package, please ask for that, too.

Ah and thanks for reading this far ;)

** Affects: python-fuse (Ubuntu)
     Importance: Undecided
         Status: New

-- 
python-fuse's xmp.py bad at handling larger files
https://bugs.launchpad.net/bugs/262961
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to