I don't know if this is currently possible, but it would be nice to have. It seems I could theoretically speed the execution of this code. proc segsieve(Kmax: uint, KB: int) = # for Kn resgroups|bytes in segment let Ks = KB # make default seg size immutable parallel: # perform SSoZ in parallel for r in 0..rescnt-1: # for each residue track number 'r' let nextp_row = r * pcnt # set the 'nextp' table row address let seg_row = r * Ks # set the 'seg' memory row address spawn residue_sieve(nextp_row, seg_row, Kmax, Ks, r) # do sieve for row 'r' sync() # wait for all row threads to finish for i in 0..rescnt-1: # update 'primecnt' with the count of primecnt += cnts[i] # segment primes for each 'seg' row
Here `sync()` causes the following code to wait for execution until all the threads finished executing. It should be theoretically possible to speed overall execution by having the `cnts` from each thread be asynchronously put into a thread queue (FIFO) and extracted and added to `primecnt`. Since here there are a known number of `cnt` values (`rescnt` amount) `primecnt` can then be updated as these values become availble until `rescnt` are added. Is this possible now? Could it be faster?