> >>EME* essentially does two passes of ECB mode AES, plus three extra AES calls
> It means, that a parallel implementation can perform all of the ECB mode
> AES calls in the top and bottom row (a latency of 2), plus two of the
> extra AES operations in the leftmost column (Figure 2 of the EME*
> paper). We have a total delay of 4 AES operations, plus change. In case
> of XCB the question boils down to how parallelizable is the GHash
> operation. Because it is a hash function, it is basically sequential,
> is not it? This would give the advantage to EME*, unless there was a
> parallel version of the function h in XCB.
>
> For sequential implementation the speed relations depend on the speed of
> AES vs. a GHash block operation. A GHash step can be implemented faster
> in HW, cannot it be? That would make the sequential XCB faster. How
> about the royalties?

>From what I remember of looking at EME (with my hardware hat on), it was fine 
>if you were prepared
to put down 32 (or more) parallel AES encryptors. Ideally you would put down 65 
so you could
pipeline the whole thing nicely. Any smaller number such as 4 or 8 and you end 
up having to store
lots if intermediate results in pipeline registers or local RAM before you can 
start the bottom row.

It simply didn't scale well in hardware for lower throughputs (media transfer 
rates) which are
typical of disk drives, or larger sector sizes. Putting down that many parallel 
encryptors will give
far more throughput than is actually needed, and is a waste of logic.

But I think is was OK for software, where the intermediate storage isn't 
usually an issue as much of
the calculation could be done "in-place".

I haven't looked at XCB in detail yet, but both AES and GF-multiply (GHASH) 
operations can be scaled
to run at similar speeds - it's just a question of getting the balance right.

Colin.

Reply via email to