> >>EME* essentially does two passes of ECB mode AES, plus three extra AES calls > It means, that a parallel implementation can perform all of the ECB mode > AES calls in the top and bottom row (a latency of 2), plus two of the > extra AES operations in the leftmost column (Figure 2 of the EME* > paper). We have a total delay of 4 AES operations, plus change. In case > of XCB the question boils down to how parallelizable is the GHash > operation. Because it is a hash function, it is basically sequential, > is not it? This would give the advantage to EME*, unless there was a > parallel version of the function h in XCB. > > For sequential implementation the speed relations depend on the speed of > AES vs. a GHash block operation. A GHash step can be implemented faster > in HW, cannot it be? That would make the sequential XCB faster. How > about the royalties?
>From what I remember of looking at EME (with my hardware hat on), it was fine >if you were prepared to put down 32 (or more) parallel AES encryptors. Ideally you would put down 65 so you could pipeline the whole thing nicely. Any smaller number such as 4 or 8 and you end up having to store lots if intermediate results in pipeline registers or local RAM before you can start the bottom row. It simply didn't scale well in hardware for lower throughputs (media transfer rates) which are typical of disk drives, or larger sector sizes. Putting down that many parallel encryptors will give far more throughput than is actually needed, and is a waste of logic. But I think is was OK for software, where the intermediate storage isn't usually an issue as much of the calculation could be done "in-place". I haven't looked at XCB in detail yet, but both AES and GF-multiply (GHASH) operations can be scaled to run at similar speeds - it's just a question of getting the balance right. Colin.