Nested parallel for is non-trivial, even OpenMP doesn't get it right.
Nim threadpool doesn't have the tools to deal with it.
You can use Weave for this, see this example matrix transposition from the
README:
I have noticed that multiple CPU's were not being used, each process is spawned
and seems to wait till it's finished before the next one starts so that only on
CPU is being used at once. Is there something I should check about my
implementation to make sure that I'm not doing something
Actually, I could just put the inner loop into its own function which should be
fine.
I'd really suggest trying to use something like
[https://github.com/mratsim/weave](https://github.com/mratsim/weave) :)
I am trying to parallelize kernel matrix calculations using threadpool:
proc calculateKernelMatrix*(K: AbstractKernel, data: Matrix[F]): Matrix[F] =
let n = int64(ncol(data));
var mat = Matrix[F](data: newSeq[F](n*n), dim: @[n, n]);
for j in 0..