On 24 Mar 00, at 15:44, [EMAIL PROTECTED] wrote:
> I suspect for LL tests in the ~10M range, this happy medium may be as
> 'small' as 1-2MB. Are PC systems with L2 caches in this size range
> available? If so, how much of a premium does one pay for the extra cache?
Ernst, your Mlucas program has a similar "memory footprint" to
Prime95. My Alpha 21164-533 has 2MB L3 cache, yet cache effects start
to become noticeable around exponent 5 million, and are dominant well
before exponent 10 million. I think you'd need 4 MB cache in order to
be able to run a LL test on an exponent in the region of 10 million
whilst containing most of the memory accesses to the cache.
AMD K6-3 supported L3 cache, Socket 7 motherboards with 2MB cache
were available. But the K6 never did support SMP and the K6-3 is now
discontinued - AMD now run only the K6-2 as the "budget" line and the
Athlon as the "performance" line.
Of the Intel processors, only PPro and Xeon have ever been available
with caches bigger than 512K - PPro to 1MB and Xeon in 1MB and 2MB
variants. You pay a frightening premium for larger caches - about
double the price for the processor for each doubling of cache size.
Given the relatively small increase in performance, this premium is
very hard to justify. You can build four complete Pentium III systems
for the price of just one Xeon CPU with 2MB cache running at the same
speed.
More realistic options would appear to be to use systems using Rambus
memory technology (which has several times the throughput of SDRAM,
though at a significant cost penalty) and/or the Intel 840 chipset
(which I understand has dual memory busses, giving each CPU
simultaneously full speed access to memory locations which happen not
to be being accessed by the other processor at the same time). Compaq
have been building multiprocessor server systems using similar
technology for some time, these do not appear to suffer from
throughput throttling to any significant extent when multiple copies
of Prime95/NTPrime/mprime are run on them. But they aren't cheap!
Incidentally, with core/bus speed ratios as large as the are with the
fastest processors available today, memory bandwidth bottlenecks are
possible with uniprocessor systems too. In particular note that
systems based on the Intel 820 chipset using a memory translation hub
to enable affordable SDRAM to be used instead of Rambus memory will
probably deliver only 100 MHz memory bandwidth even if the CPU is
running at 133 MHz FSB. Such systems also appear to be _very_
particular about the memory being used - the parameters in the SPD
chip must be totally correct, since the MTH prevents the BIOS from
determining the correct parameters for accessing the RAM directly.
The upshot is that the system believes that it has no RAM fitted,
even though the memory may work perfectly well in another (non-i820)
board. My advice would be not to buy an i820-based motherboard unless
you're prepared to buy Rambus memory to fit in it - currently Rambus
is about five times the price of SDRAM.
BTW I have two dual-processor systems, both using Supermicro P6DBx
motherboards (single memory bus, BX chipset). One has PII-350s
fitted, the other has a pair of PIII-450s. When running LL tests, the
350 MHz system slows down about 7% when both processors are running
compared with the speed with one processor idle; the 450 MHz system
slows down about 15%. You get full performance on one LL test if the
other system is doing something which requires CPU time but little or
no memory access e.g. trial factoring, or Stage 1 of an ECM run on a
small exponent. This is true for both Win NT and linux. '9x does not
support SMP, so the second processor doesn't cause memory bus
congestion, but you don't get benefit from its CPU cycles either!
Regards
Brian Beesley
_________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers