List, how can we test if the Studio compiler and target platform support fast instructions for POPCNY and LZCNT? There is rumor, see attached mail, that SPARC has accelerated instructions for POPCNT.
Olga ---------- Forwarded message ---------- From: Andy Dougherty <[email protected]> Date: Fri, Oct 1, 2010 at 8:23 PM Subject: Re: Configure probe for POPCNT, LZCNT, and fast integer multiply machine instructions To: karl williamson <[email protected]> Cc: Perl5 Porters <[email protected]> On Fri, 1 Oct 2010, karl williamson wrote: > A number of utf8 operations on ASCII platforms can be made significantly > faster if there were a function to return the number of 1 bits in the operand. > > There are a number of algorithms out there that do this with various degrees > of cleverness and speed. The best are O(log 2). There is a trick that can > speed up one of those if there is a fast integer multiply machine instruction > and the compiler uses it. The web says that gcc when multiplying by a > constant always uses shifts, masks, and adds, which renders that trick worse > than not using it. But fastest of all is to use a population count machine > instruction, if available. Newer Intel and AMD processors, and IIRC old CDCs, > and I don't know what else have such a thing. I'm wondering it it would be > possible for somehow the code to be told if these two things are available, so > any of several population count implementations could be chosen. > > Similarly the number of leading zeroes (or its complement) in an operand is > sometimes useful and faster than some of our implementations, if available. (I believe that SPARC and Itanium also have such an instruction.) I'm reasonably confident we could detect such things at Configure time, at least for a reasonable range of systems. For example, for gcc, we could simply check for the __builtin_popcount() function. Depending on the gcc version, architecture and the command-line flags (e.g. gcc-4.4 with -msse4.2 for x86) this would just work. The problem is that perl is often built on one set of hardware but then distributed to run on other systems. Distributors are usually reluctant to include such processor-specific flags, though an end-user compiling perl for him- or herself certainly could. We could, I suppose, go the route of compiling a series of different functions, do runtime CPU identification, and dynamically call the appropriate runtime functions, but that would be a lot more work, and might not even end up saving any time in the long run. -- Andy Dougherty [email protected] -- , _ _ , { \/`o;====- Olga Kryzhanovska -====;o`\/ } .----'-/`-/ [email protected] \-`\-'----. `'-..-| / http://twitter.com/fleyta \ |-..-'` /\/\ Solaris/BSD//C/C++ programmer /\/\ `--` `--` _______________________________________________ tools-compilers mailing list [email protected]
