So far what I've read in the XEP-0234 doesn't say anything about file
checking. I suppose, it is implied that once the comparing of the
hashes takes place, the program has to inform the user of a possible
file corruption and then restart the file transfer manually.

This is very inefficient, and awkward (for the user). Instead of
resending the whole file, we should be able to detect in which part of
the file there is corruption and resend that part using ranged
transfers (as defined in
http://xmpp.org/extensions/xep-0234.html#range).

There are various issues with this. In the current implementation of
jingleFT, range file transfer have to initiate a new session in order
to transfer the file from a given offset. Ideally, we would want to
use the current session.

Example:

* initiator sends file
* responder receives it
* initiator communicates the hash
* responder checks, and notices that the file is corrupt. Finds where
the corruption took place (using techniques explained later)
* responder ask for ranged file transfer over the same session (note:
we haven't terminated the session).
* initiator resends the requested part of the file

I'm not sure that only providing an offset for as a range can
accomplish this, it would be probably be a better idea to provide a
range as a range. I might be wrong about this, I'm not sure.

Now, the question is, how do we know which part of the file is corrupted?

One way to go about this is using hash-lists. Which is the way
bittorrent makes sure there isn't corruption in the files being
transfered.

Basically, what we do with this is that we send the file in parts. For
each part the initiator will send an IQ with the hash of that file
part. The responder will respond with a result IQ or an error. In case
there is an error from part of the responder, the initiator will
resend that part.

Easy right? The problem with this algorithm is that it is too
expensive. It makes sense using it in p2p programs, because a file may
be share to other peers before it is completed. It doesn't make very
much sense using this in XMPP, where there is usually just one sender
and one receiver. But this technique is very illustrative and easy to
understand.

Another alternative technique that we can use is binary-hashing (I
made the name up, I don't know what this is called). Using this
technique we will wait until the file is transfered as usual, then
when we check the hashes if there is a mismatch we will splice the
file in two parts, take the hash of both parts and ask the initiator
for confirmation on those hashes, if one mismatches, that should tell
us that the corrupt part is, and we continue repeating this process
until the parts are small enough for them to be reasonable resent. If
the file is small enough to being with, then we can skip this.

The cost (in terms of computation) of this technique might be greater
than hash-list. But the overall cost is many times smaller than
hash-list, because we only need to use this if we find that the file
is corrupt, which is an unlikely event.

I would love to know what you guys think of this.

-- 
Jefry Lagrange

Reply via email to