So far what I've read in the XEP-0234 doesn't say anything about file checking. I suppose, it is implied that once the comparing of the hashes takes place, the program has to inform the user of a possible file corruption and then restart the file transfer manually.
This is very inefficient, and awkward (for the user). Instead of resending the whole file, we should be able to detect in which part of the file there is corruption and resend that part using ranged transfers (as defined in http://xmpp.org/extensions/xep-0234.html#range). There are various issues with this. In the current implementation of jingleFT, range file transfer have to initiate a new session in order to transfer the file from a given offset. Ideally, we would want to use the current session. Example: * initiator sends file * responder receives it * initiator communicates the hash * responder checks, and notices that the file is corrupt. Finds where the corruption took place (using techniques explained later) * responder ask for ranged file transfer over the same session (note: we haven't terminated the session). * initiator resends the requested part of the file I'm not sure that only providing an offset for as a range can accomplish this, it would be probably be a better idea to provide a range as a range. I might be wrong about this, I'm not sure. Now, the question is, how do we know which part of the file is corrupted? One way to go about this is using hash-lists. Which is the way bittorrent makes sure there isn't corruption in the files being transfered. Basically, what we do with this is that we send the file in parts. For each part the initiator will send an IQ with the hash of that file part. The responder will respond with a result IQ or an error. In case there is an error from part of the responder, the initiator will resend that part. Easy right? The problem with this algorithm is that it is too expensive. It makes sense using it in p2p programs, because a file may be share to other peers before it is completed. It doesn't make very much sense using this in XMPP, where there is usually just one sender and one receiver. But this technique is very illustrative and easy to understand. Another alternative technique that we can use is binary-hashing (I made the name up, I don't know what this is called). Using this technique we will wait until the file is transfered as usual, then when we check the hashes if there is a mismatch we will splice the file in two parts, take the hash of both parts and ask the initiator for confirmation on those hashes, if one mismatches, that should tell us that the corrupt part is, and we continue repeating this process until the parts are small enough for them to be reasonable resent. If the file is small enough to being with, then we can skip this. The cost (in terms of computation) of this technique might be greater than hash-list. But the overall cost is many times smaller than hash-list, because we only need to use this if we find that the file is corrupt, which is an unlikely event. I would love to know what you guys think of this. -- Jefry Lagrange