On 11/30/2010 10:16 PM, Paul Eggert wrote: > Invoke MAX_MERGE(total, level) with level == 15. > 2 << level yields 65536, and 65536 * 65536 overflows to zero.
I managed to reproduce this bug on a (faked) host with 32768 processors, using a command like this: seq 1000000000 | sort --parallel=32768 -S 10G The result was a floating point exception (actually, a division by zero) and 'sort' crashed. However, the bug is timing dependent and is very hard to reproduce. I tried many more times to reproduce it, and they all failed. This proved to my satisfaction that it is a real bug, though, so I pushed the following patch. >From 1561c2b228d93a049e527824e14ad4fe8c256b52 Mon Sep 17 00:00:00 2001 From: Paul Eggert <egg...@cs.ucla.edu> Date: Wed, 1 Dec 2010 21:50:00 -0800 Subject: [PATCH] sort: fix bug on 64-bit hosts with at least 32768 processors * src/sort.c (MAX_MERGE): Avoid integer overflow when on a machine with (say) 32-bit int and 64-bit size_t and when level == 15. Without this fix, on such a machine with 32768 or more processors, the level computation could overflow on large input, and this would result in division by zero. --- src/sort.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/src/sort.c b/src/sort.c index 1aa1eb4..5c368cd 100644 --- a/src/sort.c +++ b/src/sort.c @@ -107,7 +107,7 @@ struct rlimit { size_t rlim_cur; }; /* Maximum number of lines to merge every time a NODE is taken from the MERGE_QUEUE. Node is at LEVEL in the binary merge tree, and is responsible for merging TOTAL lines. */ -#define MAX_MERGE(total, level) ((total) / ((2 << level) * (2 << level)) + 1) +#define MAX_MERGE(total, level) (((total) >> (2 * ((level) + 1))) + 1) /* Heuristic value for the number of lines for which it is worth creating a subthread, during an internal merge sort, on a machine -- 1.7.2