Hello,
I have an application that streams results from the STDOUT of compute nodes
back to the parent for more processing. I've encountered an issue where
lines longer than 1024 bytes appear to collide.
In this first case, I get what I expect. Starting 1000 tasks, that each
return a 1024 byte string (including newline), results in 1000 1024 byte
strings:
> srun --ntasks=1000 perl -E 'say 1 x 1023' | perl -nE 'say length' | sort
| uniq -c
1000 1024
When I increase the total length to 1025 bytes it appears to eclipse some
buffer size, resulting in lines of various sizes.
> srun --ntasks=1000 perl -E 'say 1 x 1024' | perl -nE 'say length' | sort
| uniq -c
802 1
2 10241
92 1025
3 11265
2 12289
...
The documentation for --output states that STDOUT from nodes is line
buffered. Does my observation disagree with that statement or is this
corruption happening elsewhere? Is this an issue with writes longer than
1024 bytes not being atomic?
It is interesting to note that the –l/--label option changes the behavior.
Though it appears to result in a synchronization of the data (resulting in
deterministic line sizes), it still seems to split lines longer than 1024
bytes:
> srun -l --ntasks=1000 perl -E 'say 1 x 1024' | perl -nE 'say length' |
sort | uniq -c
1000 1030
1000 6
Is there a way to coax srun to keep the long lines intact while streaming
results from the nodes?
Thanks,
- Dan Boorstein