You understand the problem space well. One thing to think about on the stripe-wise retrievil/parallel decoding is to compare your decoding rate with your receive rate. So, if you're using K=128 that means that decoding can begin at around 127/128 = 99.2% completion. If you're able to decode an entire block in the time that it takes to receive a new packet/symbol at your estimated reception rate of 256 Kbps, then you should be able to parallelize most of the decoding. If you're finding that the decoding can't keep up with the reception, then you can introduce some slight preferencing into the system to retrieve "super blocks" of data. So you split the blocks into two (or more) groups and prioritize the first group first, then the second group second. In this manner, with two superblocks or groups, you'll be able to start decoding the first half at 50% completion.
At the end of the day LDPC codes are a bunch of marketing BS. For most applications, network and storage I/O is the bottleneck, not the CPU. As long as hard disk seek latency exists, they will perform horribly for large files. In our high speed deployments, such as digital cinema where we're moving 200GB+ files, its always the disk thats the bottleneck, never the actual codes. Have fun! -Justin > Now, lets assume that there is a good patent-free LDPC code we could > use, so we still have a choice: > > - LDPC uses a randomized structure which exhibits extremely poor spatial > locality of reference. The result is that although you COULD encode a > 600MB file as a single segment, it would be *extremely* slow to decode > unless it and its check blocks fitted entirely into RAM. And even > then, it will be limited by your RAM speed, whereas something with > small segments will often fit in L2 cache. > - According to the docs on the Onion codec (which is MDS), it is not > possible to even start to decode a segment until you have K of N blocks. > FECFile offers some level of progressive decoding but is limited by > this fact - if we fetch data blocks randomly from the whole file, then > there won't be any parallel decoding happening at all until we have a > complete segment, which will likely be after we have a *LOT* of > blocks. The result of this is that FECFile is not a great convenience; > it may be easier to implement parallel decoding of one segment while > we fetch the blocks of the next ourselves. > - With either codec, blocks are internally "striped"; this is why we can > decode a 192MB (128+64) segment in 24MB of RAM. I don't know how small > the stripes/packets can be; 1kB sounds plausible given the likely typical > applications of these codes however. It is likely that the smaller the > stripes/packets, the greater the setup cost... > - If we assume zero setup cost, it should be possible to use quite large > segments and still fit the stripes being decoded in a smallish amount > of RAM. However, reading the data in the first place (and then writing > it after the decode) means a lot of randomized disk access. Well okay > it doesn't have to be randomized; you can keep it on disk in > contiguous blocks of a file, so you fetch block 1, 33, 65, 97 for the > first stripe decode... > - So we could have segments of up to: > (max memory usage) * (block size / packet size) > (size after decoding; includes all data and check blocks) > - So if we have say 16MB of RAM to dedicate (we currently use 24, but we > will need some more for temporary stuff etc), we can then have > segments of 512MB (341MB plaintext). Which would be adequate for our > needs. If we want segments of 256MB plaintext, we would only need 24MB > of RAM (plus temp space); if we go for 128MB segments, we can get away > with 12MB of RAM plus temp space, a significant improvement on current > performance (RAM is an issue...). > > A few issues Justin mentioned that are peculiar to freenet: > - It would be preferable not to have to expand the file enormously - > e.g. Justin suggested a 32/256 code for reliability. So far we have > used 50% (128/192) and this has appeared to be adequate. > - It is preferable to have segments as large as possible, because while > we can rerequest blocks we won't necessarily be able to get them, and > while healing helps a lot, it only helps the download if somebody else > has recently downloaded the file. > - Block integrity is dealt with at a lower level; both the splitfile and > the blocks in the splitfile have separate hashes. > > BTW if you get this, Justin, thanks a lot for your previous advice - am > I right about striping? > > On Thu, Sep 29, 2005 at 10:59:45PM -0400, Ken Snider wrote: > > As discussed, I was able to get some information from Justin Chapweske, one > > of the chief engineers at openCOLA, the prior owner of the FEC codebase, > > before Onion networks acquired it. > > > > He had the following to say in response to your queries: > > > > === > > > > The #1 most important thing is to make sure that the native/JNI FEC > > codes are being used. They are many times faster than the pure Java > > ones. > > Indeed! > > > > Secondly, doing progressive decoding using FecFile will save you a > > lot of headaches, because as long as your decoding speed is as fast > > as your reception speed, then you'll experience no delay due to > > decoding. > > > > As far as "using less redundancy" as Matthew suggests, I'm assume > > that he's talking about decreasing the "N" value. In truth, the > > value of N has no impact on performance unless you're using an N > > > 256. If N is larger than 256, then you take a 4x speed reduction as > > 16 bit codes need to be used rather than 8 bit codes. The #1 impact > > on performance is the value of K, so if you're having problems with > > K=32, then you can try K=16, but that means that you'll have to split > > your files into even more pieces. > > > > The other thing to look at is your I/O performance. If you're doing > > a lot of random I/O, your files may become highly fragmented and may > > actually end up being the bottleneck in your system. We've > > successfully used FEC for applications well over 100 Mbps, so it is > > certainly possible to do high speed data transfers, but you have to > > pay very close attention to your I/O patterns. > > > > Besides that, you could try LDPC codes, but their non-deterministic > > nature will absolutely kill you on I/O performance, so even though > > they'll give you a boost on the CPU side, you'll quickly hit a wall > > with respect to I/O performance. > > > > Hope this helps. > > > > -Justin > > > > === > > > > -- > > Ken Snider
