> CTR doesn't have the same splitting up of the input data to speed the
> triggering of the intrinsic that GCM has. The need to split data is
> such as narrow situation as users don't typically use 1, 10 or 100MB
> data sizes.
In the case of Cipher usage outside of TLS, users may not know that they're
interacting with a system that uses cryptography at all. My use-case is a
library which stores data into one of several backends, and depending on
whether the feature is enabled (based on the level of trust of the systems
involved), guarantees that the data is encrypted before it's sent. Users
provide a byte-producer along the lines of "interface Content { void
writeTo(OutputStream out) throws IOException; }". This interface may be used
elsewhere to write unencrypted bytes to a filesystem, for example, but the idea
is that it stores arbitrary data. It's entirely reasonable that an
implementation may get an InputStream in some way, and invoke
inputStream.transferTo(out). In most cases, InputStream.transferTo isn't
overridden, and produces nice 8KiB chunks. However, it's also reasonable for
the inputstream to be a ByteArrayInputStream in cases where the bytes are
readily available, and already buffered on heap. In that case, the 'transferTo'
function will invoke a single write including the full content. To be clear,
the primary use-case isn't copying from one stream to another. In most cases
the caller will transform the input before storing it, which often requires the
full contents to be buffered, for instance reading CSV and converting to excel
before storing the result.
> Are you using a particular application where you are seeing the
> performance drop off clearly, besides a benchmark?
Yes! The benchmark is the result of investigation into a performance problem in
production, where we confirmed our profiling results, implemented a fix, and
confirmed the benchmarking improvements were relevant in the original
production scenario.
Carter Kozak