Re: [PATCH] crypto: arm/aes-neonbs - process 8 blocks in parallel if we can

2016-12-29 Thread Ard Biesheuvel
On 29 December 2016 at 02:23, Herbert Xu wrote: > On Wed, Dec 28, 2016 at 07:50:44PM +, Ard Biesheuvel wrote: >> >> So about this chunksize, is it ever expected to assume other values >> than 1 (for stream ciphers) or the block size (for block ciphers)? >> Having block size, IV size *and* chun

Re: [PATCH] crypto: arm/aes-neonbs - process 8 blocks in parallel if we can

2016-12-28 Thread Herbert Xu
On Wed, Dec 28, 2016 at 07:50:44PM +, Ard Biesheuvel wrote: > > So about this chunksize, is it ever expected to assume other values > than 1 (for stream ciphers) or the block size (for block ciphers)? > Having block size, IV size *and* chunk size fields may be confusing to > some already, so i

Re: [PATCH] crypto: arm/aes-neonbs - process 8 blocks in parallel if we can

2016-12-28 Thread Ard Biesheuvel
On 28 December 2016 at 09:23, Herbert Xu wrote: > On Wed, Dec 28, 2016 at 09:19:32AM +, Ard Biesheuvel wrote: >> >> Ok, so that implies a field in the skcipher algo struct then, rather than >> some definition internal to the driver? > > Oh yes it should definitely be visible to other crypto A

Re: [PATCH] crypto: arm/aes-neonbs - process 8 blocks in parallel if we can

2016-12-28 Thread Herbert Xu
On Wed, Dec 28, 2016 at 09:19:32AM +, Ard Biesheuvel wrote: > > Ok, so that implies a field in the skcipher algo struct then, rather than > some definition internal to the driver? Oh yes it should definitely be visible to other crypto API drivers and algorithms. It's just that we don't want

Re: [PATCH] crypto: arm/aes-neonbs - process 8 blocks in parallel if we can

2016-12-28 Thread Ard Biesheuvel
> On 28 Dec 2016, at 09:10, Herbert Xu wrote: > >> On Tue, Dec 27, 2016 at 06:35:46PM +, Ard Biesheuvel wrote: >> >> OK, I will try to hack something up. >> >> One thing to keep in mind though is that stacked chaining modes should >> present the data with the same granularity for optimal p

Re: [PATCH] crypto: arm/aes-neonbs - process 8 blocks in parallel if we can

2016-12-28 Thread Herbert Xu
On Tue, Dec 27, 2016 at 06:35:46PM +, Ard Biesheuvel wrote: > > OK, I will try to hack something up. > > One thing to keep in mind though is that stacked chaining modes should > present the data with the same granularity for optimal performance. > E.g., xts(ecb(aes)) should pass 8 blocks at a

Re: [PATCH] crypto: arm/aes-neonbs - process 8 blocks in parallel if we can

2016-12-27 Thread Ard Biesheuvel
On 27 December 2016 at 08:57, Herbert Xu wrote: > On Fri, Dec 09, 2016 at 01:47:26PM +, Ard Biesheuvel wrote: >> The bit-sliced NEON implementation of AES only performs optimally if >> it can process 8 blocks of input in parallel. This is due to the nature >> of bit slicing, where the n-th bit

Re: [PATCH] crypto: arm/aes-neonbs - process 8 blocks in parallel if we can

2016-12-27 Thread Herbert Xu
On Fri, Dec 09, 2016 at 01:47:26PM +, Ard Biesheuvel wrote: > The bit-sliced NEON implementation of AES only performs optimally if > it can process 8 blocks of input in parallel. This is due to the nature > of bit slicing, where the n-th bit of each byte of AES state of each input > block is co

[PATCH] crypto: arm/aes-neonbs - process 8 blocks in parallel if we can

2016-12-09 Thread Ard Biesheuvel
The bit-sliced NEON implementation of AES only performs optimally if it can process 8 blocks of input in parallel. This is due to the nature of bit slicing, where the n-th bit of each byte of AES state of each input block is collected into NEON register 'n', for registers q0 - q7. This implies tha