http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58897

            Bug ID: 58897
           Summary: Improve 128/64 division
           Product: gcc
           Version: 4.9.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: enhancement
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: glisse at gcc dot gnu.org
            Target: x86_64-linux-gnu

typedef unsigned __int128 ui;
ui f(ui a, unsigned long b){
  return a/b;
}

is compiled to a library call to __udivti3, which is implemented as a rather
long loop. However, it seems to me that 2 calls to divq should do it (and
sometimes only 1 if we have range information on the result).


Ideally the following would eventually compile to just mul+div, but that's
probably too complicated for now.

unsigned long prod(unsigned long a, unsigned long b, unsigned long m){
  if (a >= m || b >= m) __builtin_unreachable ();
  return ((unsigned __int128) a * b) % m;
}

Reply via email to