On 12/11/18 9:53 AM, 臧琳 wrote: > And I just did quick test, that using the constant _num_buckets for modulo, > gcc issues several instructions instead > of idiv, and the speed up at about 12%, while using 65536 for _number_buckets > speedup at 20%.
OK. So GCC's constant optimization is an improvement over what we have already. There is something that I do not understand. You said that GCC didn't know that _num_buckets was constant. In that case, how did GCC know not to use a divide instruction when you tried 65536? In any case, if we really do care so much about this, I would have thought that the best solution would be to use 65537 as the table size because there is a nice way to calculate n % 65537:: unsigned mod_m(unsigned n) { unsigned tmp = n % 65536; tmp -= n / 65536; if (tmp >= 65537) // overflow tmp += 65537; return tmp; } It's very difficult to prove that using a non-prime table size won't impact the performance on some systems; using only a few bits of the address isn't worth the risk, IMO. -- Andrew Haley Java Platform Lead Engineer Red Hat UK Ltd. <https://www.redhat.com> EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671