>>[Shai] I proposed a different optimization with the same performance 
>>characteristics a few weeks ago (see end of message 
>>http://grouper.ieee.org/groups/1619/email/msg00528.html) which drew 
>>distinctively non-enthusiastic reactions.

Shai's proposal was using two keys K2a and K2b, and

T := K2a * compartment_index  xor  K2b * 2^{block_index}

Assuming the soon to be standard 4 KB sectors, block_index is 0...255,
but sector sizes up to 16 KB are also used in the practice, allowing a
block_index up to 1023. It was not obvious for the first sight, that
the range does not matter, independent on the thousand bit long
2^{block_index}, Shai's scheme works sequentially: we need one shift, a
bit test and half of the time an XOR with a constant.

Compare it to the original proposal, where we need a table with up to 11
pre-computed 128-bit values, and find the least significant 1-bit of
block_index, to index the table, and XOR it to the previous T (always).
Setting up the table requires one shift, bit-test, conditional XOR and
one more XOR. It looks like the overall number of operations is smaller
in the original proposal (no shifts), and the delay between cipher
blocks is also smaller. If there is a single cycle long shift operation
in HW, than there is no significant difference in the worst case
performance, but Shai's last proposal saves expensive secure RAM and
allows simpler HW designs.

After these reflections, I propose changing the P1619 (disk) standard in
the tweak part and the key length fields, accordingly!

Laszlo

> -------- Original Message --------
> Subject: Re: Can LRW be optimized for multiple sector keys?
> From: Shai Halevi <[EMAIL PROTECTED]>
> Date: Mon, January 16, 2006 10:24 am
> To: SISWG <[EMAIL PROTECTED]>
> 
> Mart, which document are you referring to? The last draft (dated Dec-20
> 2005) doesn't have section 5.2.1 and doesn't describe any optimizations.
> 
> As for your question, the current spec defines the tweak as
> 
>    i = some-number + block-position-in-sector
> 
> where some-number is something which is computed once per sector (which
> is a power of two, at least the number of blocks in a sector), and
> block-position-in-sector is the index of the block within the sector,
> starting at zero. The standard trick for optimizing the computation
> of K2*i, is to prepare a table T[] (example for 32-block sectors):
> 
>    T[0] = K2
>    T[1] = 3*K2
>    T[2] = 7*K2
>    T[3] = 15*K2
>    T[4] = 31*K2
> 
> Then, you initially compute some-number*K2 using a full multiplication.
> Once you have i*K2, computing (i+1)*K2 is done as follows:
> 
> 1. Let s = index of the leftmost '1' bit in i+1 (must be between 0 and 4)
> 2. compute (i+1)*K2 as i*K2 xor T[s]
> 
> For the case where each value of K2 is used only for one sector (as
> in your question), you can compute T in an on-line fashion by setting
> 
>    T[0] = K2, and
>    T[s+1] = 2*T[s] xor K2
> 
>  > It has to be noted, that XEX as specified in
>  > http://grouper.ieee.org/groups/1619/email/msg00610.html does not suffer
>  > these drawbacks and it seems that XEX is considerably more efficient
>  > than LRW in multiple key mode.
> 
> The update in XEX is somewhat more efficient than in LRW, but not really
> "considerably more efficient". I proposed a different optimization with
> the same performance characteristics a few weeks ago (see end of message
> http://grouper.ieee.org/groups/1619/email/msg00528.html) which drew
> distinctively non-enthusiastic reactions. People appear to think that
> the difference in performance is not large enough to bother with this
> change.
> 
> -- Shai
> 
> Mart Somermaa wrote:
> > Suppose we have a key derivation function f, that, given a global key GK 
> > and sector index i, generates a unique sector key K_i
> >     f(GK, i) = K_i
> > that will be used to key e.g. LRW-AES.
> > 
> > In this case, LRW key scope is only a single sector (generally 32 
> > blocks). Hence, the table based optimisation for multiplication in 
> > GF(2^128) do not work -- multiplication table (as described in LRW draft 
> > section 5.1) scope is also only 32 blocks. Also, the tweak increment 
> > optimisation (ibid. 5.2.1) is useless for the same reason.
> > 
> > Are there ways to optimize multiplication in GF(2^128) even when key 
> > scope is a single sector?
> > 
> > It has to be noted, that XEX as specified in
> > http://grouper.ieee.org/groups/1619/email/msg00610.html does not suffer 
> > these drawbacks and it seems that XEX is considerably more efficient 
> > than LRW in multiple key mode.

Reply via email to