[Numpy-discussion] np.dot yields a different result when computed in two pieces

2023-07-25 Thread Junyan Xu
Hello everyone, I asked this question https://stackoverflow.com/questions/76707696/np-dot-yields-a-different-result-when-computed-in-two-pieces on StackOverflow a few days ago but still haven't got an answer. Long story short, I discovered (in certain common settings like Google Colab) that the

[Numpy-discussion] Re: np.dot yields a different result when computed in two pieces

2023-07-25 Thread Robert Kern
Accelerated BLAS operations are accelerated precisely by taking these opportunities to rearrange the computations, not just (or primarily) by parallelism. They are very finely tuned kernels that use the fine details of the CPU/FPU to pipeline instructions (which might be SIMD) and optimize memory m