Niels Möller <ni...@lysator.liu.se> writes: > See patch below. Makes both toom22 and toom32 recurse only to themselves > (or to schoolbook), and use provided scratch space, without any > additional allocations. Passes make check (with asserts enabled) on my > machine, but I probably don't hit the case marked with a FIXME in > mpn_mul.
Now I've run some new benchmarks, using the improved speed program. "speed.old" is current gmp head, "speed" is with the toom32 changes. When I look at these numbers, I see a modest speedup, about 1%. As I understand it, the reason we have a speedup is that we (i) avoid recursive calls to mpn_mul, and (ii) allocate all temporary storage up front, and I think it's nice that that's at all measurable. Regards, /Niels $ ./speed.old -p 10000000 -c -s 35-52 mpn_mul_basecase.-70 mpn_toom22_mul.-70 mpn_toom32_mul.-70 overhead 5.83 cycles, precision 10000000 units of 2.86e-10 secs, CPU freq 3500.09 MHz mpn_mul_basecase.-70 mpn_toom22_mul.-70 mpn_toom32_mul.-70 35 3278.95 #2807.57 n/a 36 3232.10 #2809.00 3074.14 37 3253.73 #2914.72 3093.68 38 3243.21 2907.72 #2885.04 39 3231.25 n/a #2924.85 40 3163.23 n/a #2853.63 41 3155.46 n/a #2802.04 42 3123.59 n/a #2718.50 43 3106.55 n/a #2862.91 44 3007.33 n/a #2808.01 45 2975.00 n/a #2832.13 46 #2920.29 n/a 2942.15 47 #2872.80 n/a 2883.77 48 #2775.10 n/a 2868.21 49 #2726.65 n/a 3150.99 50 #2639.82 n/a 3118.55 51 #2551.59 n/a 3069.26 52 #2446.09 n/a n/a $ ./speed -p 10000000 -c -s 35-52 mpn_mul_basecase.-70 mpn_toom22_mul.-70 mpn_toom32_mul.-70 overhead 5.83 cycles, precision 10000000 units of 2.86e-10 secs, CPU freq 3500.09 MHz mpn_mul_basecase.-70 mpn_toom22_mul.-70 mpn_toom32_mul.-70 35 3278.95 #2807.24 n/a 36 3229.47 #2806.18 3049.43 37 3251.88 #2921.26 3076.06 38 3243.21 2908.94 #2844.26 39 3225.47 n/a #2889.04 40 3161.26 n/a #2832.61 41 3166.19 n/a #2795.00 42 3125.16 n/a #2690.38 43 3090.94 n/a #2828.44 44 3008.59 n/a #2789.46 45 2975.01 n/a #2804.18 46 2920.30 n/a #2909.20 47 2872.79 n/a #2851.71 48 #2775.11 n/a 2829.49 49 #2727.72 n/a 3092.84 50 #2641.13 n/a 3063.21 51 #2553.35 n/a 3023.80 52 #2454.06 n/a n/a $ ./speed.old -p 10000000 -c -s 50-74 mpn_mul_basecase.-100 mpn_toom22_mul.-100 mpn_toom32_mul.-100 overhead 5.83 cycles, precision 10000000 units of 2.86e-10 secs, CPU freq 3500.09 MHz mpn_mul_basecase.-100 mpn_toom22_mul.-100 mpn_toom32_mul.-100 50 6601.16 #4972.21 n/a 51 6596.33 #5035.81 5434.71 52 6539.30 #5097.57 5259.36 53 6539.55 5316.81 #5245.75 54 6547.79 5300.64 #5228.22 55 6533.80 #5251.83 5304.58 56 6463.24 n/a #5149.18 57 6447.48 n/a #5094.77 58 6406.94 n/a #5033.60 59 6388.55 n/a #5095.43 60 6262.57 n/a #4966.43 61 6222.77 n/a #5121.12 62 6173.48 n/a #5091.35 63 6137.99 n/a #5111.19 64 6010.73 n/a #5167.84 65 5968.52 n/a #5120.25 66 5891.06 n/a #5097.39 67 5822.02 n/a #5217.12 68 5677.20 n/a #5275.71 69 5606.83 n/a #5308.16 70 5513.04 n/a #5325.46 71 5406.09 n/a #5292.37 72 5238.58 n/a #5181.20 73 #5145.21 n/a 5479.67 74 #5045.51 n/a n/a $ ./speed -p 10000000 -c -s 50-74 mpn_mul_basecase.-100 mpn_toom22_mul.-100 mpn_toom32_mul.-100 overhead 5.83 cycles, precision 10000000 units of 2.86e-10 secs, CPU freq 3500.09 MHz mpn_mul_basecase.-100 mpn_toom22_mul.-100 mpn_toom32_mul.-100 50 6610.22 #4979.09 n/a 51 6600.22 #5028.20 5427.28 52 6531.08 #5081.27 5149.80 53 6540.07 5382.98 #5211.13 54 6540.78 5272.30 #5268.33 55 6526.27 #5259.12 5261.60 56 6443.64 n/a #5164.87 57 6448.62 n/a #5095.84 58 6402.06 n/a #5185.73 59 6385.01 n/a #5070.71 60 6269.03 n/a #4970.77 61 6226.69 n/a #5067.00 62 6177.27 n/a #5084.48 63 6135.12 n/a #5082.11 64 6011.92 n/a #5149.51 65 5965.79 n/a #5126.55 66 5872.83 n/a #5065.02 67 5822.76 n/a #5199.71 68 5679.58 n/a #5247.53 69 5605.92 n/a #5284.30 70 5502.04 n/a #5299.45 71 5407.54 n/a #5154.35 72 5234.21 n/a #5088.36 73 #5150.88 n/a 5423.00 74 #5074.65 n/a n/a $ ./speed.old -p 10000000 -c -s 75-112 mpn_mul_basecase.-150 mpn_toom22_mul.-150 mpn_toom32_mul.-150 overhead 5.83 cycles, precision 10000000 units of 2.86e-10 secs, CPU freq 3500.09 MHz mpn_mul_basecase.-150 mpn_toom22_mul.-150 mpn_toom32_mul.-150 75 14750.47 #10053.81 n/a 76 14641.86 #10059.59 10762.58 77 15032.23 #10094.08 10836.42 78 14649.34 #10139.68 10609.55 79 14706.27 #10228.62 10607.27 80 14558.66 #10087.28 10422.71 81 14899.93 #10425.90 10445.55 82 14561.86 10458.30 #10373.80 83 14571.35 10699.90 #10201.05 84 14432.08 n/a #10393.73 85 14396.36 n/a #10143.03 86 14331.65 n/a #9722.61 87 14307.18 n/a #9772.08 88 14185.09 n/a #9974.81 89 14144.53 n/a #9641.15 90 14090.81 n/a #9836.56 91 14048.55 n/a #9672.05 92 13864.52 n/a #9613.13 93 13796.34 n/a #9755.47 94 13741.17 n/a #9754.20 95 13649.14 n/a #9869.11 96 13455.40 n/a #9730.26 97 13377.38 n/a #10239.68 98 13291.98 n/a #10126.57 99 13172.15 n/a #10204.89 100 12980.91 n/a #10222.67 101 12879.49 n/a #10230.03 102 12771.23 n/a #10366.77 103 12632.98 n/a #10552.21 104 12413.33 n/a #10699.01 105 12282.08 n/a #10563.21 106 12132.77 n/a #10579.57 107 11968.49 n/a #10730.71 108 11760.53 n/a #10685.40 109 11626.50 n/a #10899.59 110 11426.35 n/a #10910.19 111 11315.81 n/a #10787.85 112 #11047.85 n/a n/a $ ./speed -p 10000000 -c -s 75-112 mpn_mul_basecase.-150 mpn_toom22_mul.-150 mpn_toom32_mul.-150 overhead 5.83 cycles, precision 10000000 units of 2.86e-10 secs, CPU freq 3500.09 MHz mpn_mul_basecase.-150 mpn_toom22_mul.-150 mpn_toom32_mul.-150 75 14722.42 #10045.48 n/a 76 14681.34 #10059.76 10623.85 77 14688.72 #10108.38 10724.26 78 14665.32 #10128.87 10420.29 79 14711.07 #10193.76 10505.84 80 14555.44 #10088.39 10311.07 81 14575.39 10426.33 #10356.28 82 14566.95 10398.25 #10047.75 83 14523.21 10600.77 #10152.63 84 14421.79 n/a #10126.45 85 14411.42 n/a #10102.81 86 14345.01 n/a #9629.07 87 14331.22 n/a #9589.82 88 14177.51 n/a #9695.10 89 14135.56 n/a #9580.52 90 14085.55 n/a #9403.96 91 13987.50 n/a #9585.55 92 13839.73 n/a #9534.73 93 13800.50 n/a #9606.04 94 13732.90 n/a #9741.92 95 13646.09 n/a #9729.62 96 13445.50 n/a #9624.99 97 13390.72 n/a #10149.86 98 13271.28 n/a #10037.47 99 13158.93 n/a #10170.95 100 12995.47 n/a #10140.38 101 12884.30 n/a #10169.73 102 12771.51 n/a #10100.37 103 12618.64 n/a #10445.72 104 12414.24 n/a #10428.90 105 12280.04 n/a #10340.37 106 12144.41 n/a #10619.04 107 11977.16 n/a #10446.06 108 11761.08 n/a #10389.61 109 11646.30 n/a #10809.78 110 11433.62 n/a #10787.13 111 11278.82 n/a #10589.46 112 #11028.89 n/a n/a -- Niels Möller. PGP key CB4962D070D77D7FCB8BA36271D8F1FF368C6677. Internet email is subject to wholesale government surveillance. _______________________________________________ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel