http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58897
Bug ID: 58897 Summary: Improve 128/64 division Product: gcc Version: 4.9.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: enhancement Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: glisse at gcc dot gnu.org Target: x86_64-linux-gnu typedef unsigned __int128 ui; ui f(ui a, unsigned long b){ return a/b; } is compiled to a library call to __udivti3, which is implemented as a rather long loop. However, it seems to me that 2 calls to divq should do it (and sometimes only 1 if we have range information on the result). Ideally the following would eventually compile to just mul+div, but that's probably too complicated for now. unsigned long prod(unsigned long a, unsigned long b, unsigned long m){ if (a >= m || b >= m) __builtin_unreachable (); return ((unsigned __int128) a * b) % m; }