Hi all,

So here goes my first "RFC" on the value range pass
for NIR that I'm attacking as part of my GSoC.
"RFC" as it is not nearly done, and should be treated
as a "this is how I imagine it ending up" instead of
"close to functioning implementation".
For that reason, whoever feels like looking at it should
not dig to much into the details(it doesn't even compile).

I'm going off on a two day trip for a Foo Fighters concert
the coming days, so I won't be back online until Friday.
Thought I could post what I got, and let it soak.

The pass is, as far as possible, written along the lines
of the paper "Constant Propagation with Conditional Branches".
Parts are based on work Connor had already done.
(Some of the functions used and the ssa-def worklist, ++).
Some adjustments have been, and will need to be, made.
A couple noteable changes I've done that's inspired by LLVM:
   - Use undefined and overdefind instead of top and bottom.
   - Process "overdefined" values first.
      - Accomplished by inserting overdefined values at
        the head of the worklist (should improve run-time)

So first, the things I'm unsure of:

How to handle other instructions than alu's, phi's and consts.
We should probably handle uniform loads or others?
I haven't really dug to much into this yet, but are there some
that we actually want to handle, and not just set to "overdefined"?

There are multiple ways of tackling the initialization of
the lattice_entries. We can initialize all the values up front,
or we can tackle them as we stumble upon them. An argument for
scrolling through all the variables first is so that we get
constants marked before getting into the analysis. An argument
for taking them on-the-fly is that it can lead to less analysis
of stuff that is in unreachable blocks.
I haven't really decided upon a solution, but I expect it might
be a case of "try it, and see what works best".

If one should do simultaneous analysis and rewriting.
Arguments can be made for both.
Most compelling reason to rewrite on-the-fly is to handle
useless instructions and things that we know the solution
of even though it is not a constant value.
Min, max, "true && x" and "false || x", etc. The last two
are probably handled by opt_algebraic, or at least should be.

How to propagate ranges implied by conditional branches into
the then and else branch of i.e ifs. Is there a place in
NIR where I can get inspiration for how to insert diverging
ssa-values with corresponding phi's?

Long story short: I need to insert a self-assignment of
an ssa value in the then and else branch of an if
(what "ABCD: Array Bounds Checks on Demand" calls a pi-node).
This will obviously lead to the variable having two paths to the
convergence-point of the control flow, and so I also need to
insert a phi node. The way this is done in "ABCD: ..."
(as far as I've understood, I've only given it one thorough reading),
is to do this before ssa is generated. That way they get away
from the whole issue of rewriting uses and inserting
phi-nodes in an already fixed up ssa-representation.

Some things that are "less than stellar" or completely wrong:
   - Handling ranges implied by ifs
   - Memory allocation and pointers (I'm still used to Java :/ )
   - Anything to do with rewriting the ssa's
   - This little section from the LLVM SCCP pass

      /// ResolvedUndefsIn - While solving the dataflow for a
function, we assume
      /// that branches on undef values cannot reach any of their successors.
      /// However, this is not a safe assumption.  After we solve dataflow, this
      /// method should be use to handle this.  If this returns true, the solver
      /// should be rerun.
        bool ResolvedUndefsIn(Function &F);

      I have not yet decided on how I want to add branches.
      A solution is to add all directions, all the time,
      and rely on Connors pass to remove the blocks.
      This avoids the case mentioned of not branching on undef.
      Might not give as good a solution though?

   - Loops are not "avoided" correctly. The checks are wrong.
       I've kinda put these on pause until I get more
       experience with NIR and loops when writing the loop
       analysis pass I've just started on.
   - Correct handling of vectors.
   - Checking number of components. This is more complicated than
       the check I have currently implemented, as far as I know.
       Might want to make a "num_components_for_ssa_def" thing
       along the lines of Rob's patch for doing the same for
       sources and destinations.
   - I will need to decide on a consistant naming of functions.
       Right now there are both "component_is_false"
       and "is_entry_true". I'm leaning towards the last option,
       as that is more of a question. Any opinions are welcome.

Below you can find the code as it was today.
There's a lot of comments with my thoughts and aha's
spread around, so I think it should be possible to get an
understanding of how I imagine things getting solved.
I haven't yet set up a git-mirror but will get to that in the weekend.
I sent it as is, instead of sending a patch, as I'm in a
"need for sleep" and my git tree is... dirty. Sorry for that.
When I get closer to finished I'll post a compiling series of patches.
If there's any questions feel free to ask.

PS: I'll be writing a new blog post on Friday explaining the
pass in more details, and also a "state of the nation"-post
for updating peeps on how far I've gotten. If you're
interested in the pass but feel the code's a bit hard to follow
or understand that blog post might just help, but that's for Friday.



#include "nir.h"
#include "nir_ssa_def_worklist.h"
#include "nir_block_worklist.h"
#include "nir_constant_expressions.h"

/*
 * This pass implements an extension of
 * "Constant Propagation with Conditional Branches" by Wegman and Zadeck
 * that also handles value ranges. This is usefull as a lot of shaders
 * have min/max expressions that can be eliminated, or conditionals
 * that we can prove to be false or true due to previously applied
 * restrictions on range. Value range propagation is a superset
 * of tracking constant values, and therefore this pass eliminates the need for
 * a separate constant propagation pass. This pass is optimistic, which means
 * we assume all variables are constant (or have restricted range) and
disprove it.
 * A pessimistic algorithm would assume all values where
undeterminable, and then
 * propagate expressions we know to be constant through the program.
 *
 * We don't preserve the range perfectly for a variable, as we combine
 * two ranges for a variable into a range of [min(low1, low2),
max(high1, high2)]
 * Doing that would complicate the pass a lot and is therefore not implemented.
 *
 * The lattice types are:
 * undefined: Variable may be constant or range-restricted (not yet processed)
 * constant: Value is determined to be constant
 * range: Value is determined to be range-restricted
 * overdefined: We cannot guarantee the value is constant or range-restricted
 *
 * There is one interesting situation: When we can eliminate dead code,
 * but this does not end in a constant value. Examples are things like
 * min(sin(x), y) where y > 2. We know sin(x) is dead code, but the result
 * is not a constant but instead an ssa-def with a range. We mark this in
 * the range-state so that we can eliminate it after the pass is done.
 * This probably means that the pass should be rerun if we resolve one of
 * these, as we will then have simplified the program, and new ranges may
 * be resolved.
 *
 * We extend the lattice so that constant entries are changed to inclusive
 * ranges for each vector component. The join rules are:
 *
 * undefined join undefined = undefined
 * undefined join overdefined = overdefined
 * overdefined join overdefined = overdefined
 * [low, high] join overdefined = overdefined
 * [low, high] join undefined = [low, high]
 * [low1, high1] join [low2, high2] = [min(low1, low2), max(high1, high2)]
 *
 * These rules are however only general pessimistic rules.
 * There may situations where we still can determine parts of the range of
 * the variable, even though it has an overdefined input (max, min, sat, abs).
 * There is also the case with boolean operations like AND and OR. These
 * can be determined even if we know only one of the operators.
 *
 * When the pass is done all "undefined" values should be determined as
 * either const, range, or overdefined. (Except for in blocks that are
 * marked as unreachable)
 */


#define IS_FLOAT_CONSTANT(const_value, operator, operand, num_components)    \
   ((num_components == 4) ?                                                  \
      const_value.f[0] operator operand &&                                   \
      const_value.f[1] operator operand &&                                   \
      const_value.f[2] operator operand &&                                   \
      const_value.f[3] operator operand :                                    \
      ((num_components == 3) ?                                               \
         const_value.f[0] operator operand &&                                \
         const_value.f[1] operator operand &&                                \
         const_value.f[2] operator operand :                                 \
         ((num_components == 2) ?                                            \
            const_value.f[0] operator operand &&                             \
            const_value.f[1] operator operand :                              \
            ((num_components == 1) ?                                         \
               const_value.f[0] operator operand :                           \
               false))))

typedef enum {
   undefined,
   range,
   constant,
   overdefined
} lattice_type;

typedef struct {
   /*
    * Is this entry float, unsigned or something else?
    * This will be interesting for nir_load_const_instr, as that is typeless.
    * Something smart needs to be done to handle that.
    * Or we may completely ignore it and mark it as invalid.
    * This should probably be handled in the initialization function
    * and then to never be touched again after that.
    */
   nir_alu_type range_type;

   nir_const_value low;
   nir_const_value high;

   /* What type of lattice is this */
   lattice_type type;

   /*
    * Whether we can remove the expression itself and replace
    * it with one of its operands. This is intended to be used
    * for things like min(a, b) where a < 4 and b > 5. We know
    * that the expression will choose a, but it is not constant
    * so we cannot mark it as such. Instead we want to keep it
    * as a range type, and replace it at the end of the pass.
    */
   bool can_be_predetermined;

   // The ssa_def associated with the lattice_entry
   nir_ssa_def *ssa_def;                        // XXX: Might not be
necessary but could make things less cluttered
   boolean is_initialized;                      // XXX: Should not be
necessary, but lets keep it here ffr
} lattice_entry;

typedef struct {
   // An array of lattice_antries for all the ssa_defs
   lattice_entry *entries;

   // Corresponds to SSA Work in the original paper
   nir_ssa_def_worklist *ssa_worklist;

   // Work list of blocks, corresponding to the papers Flow work list
   nir_block_worklist *block_worklist;

   // Keep track of which blocks are reachable
   BITSET_WORD *reachable_blocks;

   nir_function_impl *impl;
   void *mem_ctx;
} value_range_state;

static nir_const_value
get_type_max(nir_alu_type type, unsigned num_components)
{
   nir_const_value value;
   for (unsigned i = 0; i < num_components; i++) {
      switch (type) {
      case nir_type_float:
         value.f[i] = INFINITY;
         break;
      case nir_type_int:
         value.i[i] = INT32_MAX;
         break;
      case nir_type_bool:
      case nir_type_unsigned:
         value.u[i] = UINT32_MAX;
         break;
      }
   }

   return value;
}

static nir_const_value
get_type_min(nir_alu_type type, unsigned num_components)
{
   nir_const_value value;
   for (unsigned i = 0; i < num_components; i++) {
      switch (type) {
      case nir_type_float:
         value.f[i] = -INFINITY;
         break;
      case nir_type_int:
         value.i[i] = INT32_MIN;
         break;
      case nir_type_bool:
      case nir_type_unsigned:
         value.u[i] = 0;
         break;
      }
   }

   return value;
}

/*
 * Gets the lattice entry for the given ssa def
 */
static lattice_entry *
get_lattice_entry(nir_ssa_def *value, value_range_state *state)
{
   lattice_entry *entry = &state->entries[value->index];
   return entry;
}

static void
initialize_entry(nir_ssa_def *def, value_range_state *state)
{
   lattice_entry *entry = get_lattice_entry(def, state);

   entry->ssa_def = def;
   entry->can_be_predetermined = false;

   switch (def->parent_instr->type)
   case nir_load_const_instr: {
      nir_load_const_instr *instr = nir_instr_as_load_const(def.parent_instr);
      entry->type = constant;
      entry->low = instr->value;
      entry->high = instr->value;
      entry->range_type = nir_type_invalid;
      return;
   }
   case nir_alu_instr: {
      nir_alu_instr *instr = nir_instr_as_alu(def.parent_instr);
      entry->type = undefined;
      entry->low = get_type_min(entry->range_type, def->num_components);
      entry->high = get_type_max(entry->range_type, def->num_components);
      entry->range_type = nir_op_infos[instr->op].output_type;
      return;
   }
   case nir_phi_instr: {
      nir_phi_instr *instr = nir_instr_as_phi(def.parent_instr);
      entry->type = undefined;
      entry->range_type = nir_type_invalid; // XXX: Are phi's also
typeless? Should check this up closely
      return;
   }
   default: {
      entry->type = overdefined;
      entry->range_type = nir_type_invalid;
      return;
   }
}

static void
initialize_block(nir_block *block, void *state) {
   nir_foreach_instr(block, instr) {
      nir_foreach_ssa_def(instr, initialize_entry, block);
   }
}

static nir_const_value
per_component_max(nir_const_value src0, nir_const_value src1,
                  nir_alu_type type, unsigned num_components)
{
   nir_const_value value;
   for (unsigned i = 0; i < num_components; i++) {
      switch (type) {
      case nir_type_float:
         value.f[i] = MAX2(src0.f[i], src1.f[i]);
         break;
      case nir_type_int:
         value.i[i] = MAX2(src0.i[i], src1.i[i]);
         break;
      case nir_type_bool:
      case nir_type_unsigned:
         value.u[i] = MAX2(src0.u[i], src1.u[i]);
         break;
      }
   }

   return value;
}

static nir_const_value
per_component_min(nir_const_value src0, nir_const_value src1,
                  nir_alu_type type, unsigned num_components)
{
   nir_const_value value;
   for (unsigned i = 0; i < num_components; i++) {
      switch (type) {
      case nir_type_float:
         value.f[i] = MIN2(src0.f[i], src1.f[i]);
         break;
      case nir_type_int:
         value.i[i] = MIN2(src0.i[i], src1.i[i]);
         break;
      case nir_type_bool:
      case nir_type_unsigned:
         value.u[i] = MIN2(src0.u[i], src1.u[i]);
         break;
      }
   }

   return value;
}

static bool
component_is_true(lattice_entry *entry, unsigned component)
{
   /*
    * XXX: This check may be removed if the function is
    * never used for anything but is_entry_true.
    * It already checks if the entry s overdefined.
    */
   if (entry->type == overdefined)
      return false;

   switch (entry->range_type) {
   case nir_type_int:
      return entry->low.i[component] > 0 ||
             entry->high.i[component] < 0;
   case nir_type_unsigned:
   case nir_type_bool:
      return entry->low.u[component] > 0;
   case nir_type_float:
      return entry->low.f[component] > 0.0f ||
             entry->high.f[component] < 0.0f;
   }

   return false;
}

static bool
is_entry_true(lattice_entry *entry)
{
   bool is_true = true;

   if (entry->type == overdefined)
         return false;

   for (int i = 0; i < entry->ssa_def->num_components; i++)
      is_true = is_true && component_is_true(entry, i);

   return is_true;
}

static bool
component_is_false(lattice_entry *entry, unsigned component)
{
   if (entry->type == overdefined)
      return false;

   switch (entry->range_type) {
   case nir_type_int:
      return entry->low.i[component] == 0;
   case nir_type_unsigned:
   case nir_type_bool:
      return entry->low.u[component] == 0;
   case nir_type_float:
      return entry->low.f[component] == 0.0f;
   }

   return false;
}

static bool
is_entry_false(lattice_entry *entry)
{
   bool is_false = true;

   if (entry->type == overdefined)
         return false;

   for (int i = 0; i < entry->ssa_def->num_components; i++)
      is_false = is_false && component_is_false(entry, i);

   return is_false;
}

static bool
is_entry_overdefined(lattice_entry *entry)
{
   if (entry->type == overdefined)
      return true;

   boolean overdef = true;

   for (unsigned i = 0; i < entry.ssa_def->num_components; i++) {
      switch (entry->range_type) {
      case nir_type_int:
         overdef &= entry->low.i[i] == INT32_MIN &&
                    entry->high.i[i] == INT32_MAX;
         break;
      case nir_type_bool:
      case nir_type_unsigned:
         overdef &= entry->low.u[i] == 0 &&
                    entry->high.u[i] == UINT32_MAX;
         break;
      case nir_type_float:
         overdef &= entry->low.f[i] == -INFINITY &&
                    entry->high.f[i] == INFINITY;
         break;
      }
   }

   return overdef;
}

static bool
is_entry_constant(lattice_entry *entry)
{
   if (entry->type == constant)
      return true;

   for (unsigned i = 0; i < entry.ssa_def->num_components; i++) {
      if (entry->low.u[i] != entry->high.u[i])
         return false;
   }

   entry->type = constant;
   return true;
}

/*
 * XXX: This is not yet complete. It was intended to return
 * whether or not this was a change in status. This information
 * can be used by the main loop to decide whether or not to add
 * the instructions in the block to the ssa-def worklist or not.
 * That way we avoid having the check in the main loop, making it
 * smaller to the eye and hopefully less confusing.
 */
static bool
mark_block_reachable(nir_block *block, value_range_state *state)
{
   BITSET_SET(state->reachable_blocks, block->index);
}

static bool
is_block_reachable(nir_block *block, value_range_state *state)
{
   return BITSET_TEST(state->reachable_blocks, block->index);
}

/*
 * Returns true if this is a change in status of the entry.
 * This is used to simplify the check for whether or not
 * users of this entry should be added to the work-list.
 *
 * XXX: This functionality is probably not yet taken
 * advantage of (note to self to check this)
 */
static bool
set_as_overdefined(lattice_entry *entry, nir_alu_type type)
{
   if (entry->type == overdefined)
      return false;

   entry->type = overdefined;

   for (unsigned i = 0; i < entry->ssa_def->num_components; i++) {
      switch (type) {
      case nir_type_float:
         entry->low.f[i] = -INFINITY;
         entry->high.f[i] = INFINITY;
         break;
      case nir_type_int:
         entry->low.i[i] = INT32_MIN;
         entry->high.i[i] = INT32_MAX;
         break;
      case nir_type_bool:
      case nir_type_unsigned:
         entry->low.u[i] = 0;
         entry->high.u[i] = UINT32_MAX;
         break;
      default:
         break;
      }
   }

   return true;
}

static inline void
set_range_float_value(lattice_entry *entry, float value, boolean low)
{
   for (unsigned i = 0; i < entry->ssa_def->num_components; i++) {
      if (low) {
         entry->low.f[i] = value;
      } else {
         entry->high.f[i] = value;
      }
   }
}

static inline void
set_range_float_constant(lattice_entry *entry, float value)
{
   set_range_float_value(entry, value, true);
   set_range_float_value(entry, value, false);
   entry->type = constant;
}

/*
 * This could probably just as well be folded into the evaluate_alu function.
 */
static void
evaluate_constant_alu(nir_alu_instr *instr, value_range_state *state)
{
   lattice_entry *entry = get_lattice_entry(instr.dest.dest.ssa);
   lattice_entry *src[4];
   nir_const_value const_src[4];

   for (unsigned i = 0; i < nir_op_infos[instr->op].num_inputs; i++) {
      src[i] = get_lattice_entry(instr->src[i].src.ssa,
                                 instr->instr.block, state);
      const_src[i] = src[i].low;
   }

   nir_const_value dest =
         nir_eval_const_opcode(instr->op,
                               instr->dest.dest.ssa.num_components,
                               const_src);

   entry->type = constant;
   entry->low = dest;
   entry->high = dest;
   entry->range_type = nir_op_infos[instr->op].output_type;   // XXX:
Should figure out where we should set type
}

/*
 * XXX: I'm hoping to replace this with macros of the form
 * that I initially wrote for the glsl sibling of this pass.
 */
static boolean
is_range_float_less_than(lattice_entry *entry, float f)
{
   boolean less = true;
   for (unsigned i = 0; i < entry->ssa_def->num_components; i++) {
      less = less && (entry->high.f[i] < f);
   }
   return less;
}

/*
 * XXX: I'm hoping to replace this with macros of the form
 * that I initially wrote for the glsl sibling of this pass.
 */
static boolean
is_range_float_larger_than(lattice_entry *entry, float f)
{
   boolean higher = true;
   for (unsigned i = 0; i < entry->ssa_def->num_components; i++) {
      higher = higher && (entry->low.f[i] > f);
   }
   return higher;
}

static void
evaluate_alu_instr(nir_alu_instr *alu, value_range_state *state)
{
   lattice_entry *entry = get_lattice_entry(alu.dest.dest.ssa, state);
   lattice_entry *src = lattice_entry[4];
   boolean all_constant = true;

   for (unsigned i = 0; i < nir_op_infos[alu->op].num_inputs; i++) {
      src[i] = get_lattice_entry(alu->src[i].src.ssa, state);
      all_constant = all_constant && is_lattice_entry_constant(src[i]);
   }

   if (all_constant) {
      evaluate_constant_instruction(alu, state);
      continue;
   }

   switch(nir_op_infos[alu->op].output_type)
   case nir_op_fabs: {
      set_range_float_value(entry, 0.0f, true);
      entry->type = range;
      break;
   }
   case nir_op_fsat: {
      if (is_range_float_less_than(src[0], 0.0f)) {
         set_range_float_constant(entry, 0.0f);
         break;
      }

      if (is_range_float_larger_than(src[0], 1.0f)) {
         set_range_float_constant(entry, 1.0f);
         break;
      }

      set_range_float_value(entry, 0.0f, true);
      set_range_float_value(entry, 1.0f, false);
      entry->type = range;
      break;
   }
   case nir_op_fcos:
   case nir_op_fsin:
   case nir_op_fsign: {
      set_range_float_value(entry, -1.0f, true);
      set_range_float_value(entry, 1.0f, false);
      entry->type = range;
      break;
   }
   case nir_op_fexp:
   case nir_op_fexp2:
   case nir_op_fpow:
   case nir_op_fsqrt:
   case nir_op_frsq: {
      set_range_float_value(entry, 0.0f, true);
      entry->type = range;
      break;
   }
   case nir_op_fneg: {
      entry->high = src[0].low;
      entry->low = src[0].high;
      entry->type = src[0].type;
      break;
   }

   /*
    * I've been trying to solve this in some kind of automagical way
    * but there are so many special cases that we might just want to
    * implement all of them "the boring way" as we will probably be
    * able to do something "smart" for most of the opcodes.
    *
    * What may be interesting is to split things in two: One function
    * that handles the ranges that are applied from the operation
    * itself, like sat, sin, cos, sign, +++. A second function to
    * handle the stuff that we calculate or analyze out of the code.
    * This can be useful for sources that are located in a loop, as
    * I don't feel like tackling loops, at least not right now, but
    * marking values that are sure to be saturated as limited between
    * 0 and 1 might still prove useful for the places it is used.
    */
}

/*
 * We may want to return a bool here to signal if
 * there was a change in this phi's value, as that
 * will simplify adding things to work lists. If the
 * phi has changed we need to add its uses to the ssa-def
 * worklist, but if not we can skip it. This might also
 * be interesting to do for the other instruction types,
 * as that should reduce the amount of instructions we re-analyze.
 */
static void
evaluate_phi_instr(nir_phi_instr *phi, value_range_state *state)
{
   lattice_entry *entry = get_lattice_entry(phi.dest.ssa, state);
   bool overdefined = false;
   bool first_range = true;

   nir_const_value low;
   nir_const_value high;

   lattice_entry *src_entry;
   nir_foreach_phi_src(phi, src) {
      src_entry = get_lattice_entry(src->src.ssa, state);

      /*
       * If the block the source is in is not reachable we should not
       * add it to the total phi value as it may never be executed.
       * If it will it will eventually be marked executable,
       * the ssa-defs in the block, along with the phi's, will be processed,
       * and therefore this phi will be revisited, and so will be
       * resolved correctly. This is also the case if the source is
       * marked as undefined. (As they should be if they come from
       * a unreachable block)
       *
       * This could be redundant apart from one special condition:
       * If the source is a constant that we have initially set as
       * constant when initializing the entries. Therefore we will need
       * to check also if the block is reachable, not only if the
       * instruction is set as overdefined.
       */
      if (BITSET_TEST(state->reachable_blocks, src->pred->index) == false ||
          src_entry->type == undefined)
         continue;

      /*
       * If one of the sources is overdefined then we can't compute a
       * a valid range, and so we should mark it as overdefined and
       * bail out before doing more.
       */
      if (src_entry->type == overdefined) {
         overdefined = true;
         break;
      }

      if (src_entry->type == range || src_entry->type == constant) {
         if (first_range) {
            first_range = false;
            for (unsigned i = 0; i < entry->ssa_def->num_components; i++) {
               low.u[i] = src_entry->low.u[i];
               high.u[i] = src_entry->high.u[i];
            }
         } else {
            low = per_component_min(low, src_entry.low, entry->range_type,
                                    entry->ssa_def->num_components);
            high = per_component_max(high, src_entry.high, entry->range_type,
                                     entry->ssa_def->num_components);
         }
      }
   }
}

static void
evaluate_load_const_instr(nir_load_const_instr *instr,
                          value_range_state *state)
{
   lattice_entry *entry = get_lattice_entry(instr.def, state);

   /*
    * If this is already initialized then the value wont have
    * changed so there is no reason to do anything.
    * This is a question of how we want to handle the initialization
    * of the values.
    */
   if (entry->is_initialized)
      return;

   entry->is_initialized = true;
   entry->range_type = nir_type_invalid;
   entry->low = instr->value;
   entry->high = instr->value;
   entry->type = constant;
   entry->ssa_def = instr->def;
}

static void
evaluate_block(nir_block *block, value_range_state *state)
{
   /* Evaluate the possible range of each ssa-def in the block. */
   nir_instr *instr;
   nir_foreach_instr_safe(block, instr) {

      switch (instr.type)
      case nir_instr_type_alu: {
         nir_alu_instr *alu = nir_instr_as_alu(instr);
         assert(alu->dest.dest.is_ssa);

         evaluate_alu_instr(alu, state);
      }
      case nir_instr_type_phi: {
         nir_phi_instr *phi = nir_instr_as_phi(instr);
         assert(phi->dest.is_ssa);

         evaluate_phi_instr(phi, state);
      }
      case nir_instr_type_load_const: {
         nir_load_const_instr *lc = nir_instr_as_load_const(instr);

         evaluate_load_const_instr(lc, state);
      }
      default: {
         /*
          * XXX: we need to handle other users than alu's probably.
          * We can, for now, set that use as overdefined and bail.
          * Uniform loads may be of interest?
          */
      }
   }
}

static void
mark_block_overdefined(nir_block *block, value_range_state *state)
{
   /*
    * For each instruction in block mark the instruction
    * as overdefined and add to the start of the ssa-def list.
    *
    * This will not be perfectly correct; we should not mark
    * values as overdefined if they have a modifier like sat or abs,
    * or it is some special instruction like sin, cos etc that limits range.
    */
   nir_instr *instr;
   nir_foreach_instr_safe(block, instr) {

      switch (instr.type)
      case nir_instr_type_alu: {
         nir_alu_instr *alu = nir_instr_as_alu(instr);
         assert(alu->dest.dest.is_ssa);

         lattice_entry *entry = get_lattice_entry(alu.dest.dest.ssa, state);

         // Set as overdefined. If this is a change in state, add to worklist.
         if (set_as_overdefined(entry, entry->range_type))
            nir_ssa_def_worklist_push_head(state->ssa_worklist,
alu->dest.dest.ssa);

         // XXX: It is possible here to still set the range at least partly.
         // Some of the instructions, like sat and abs have defined ranges even
         // though they are in a loop.
      }
      case nir_instr_type_phi: {
         nir_phi_instr *phi = nir_instr_as_phi(instr);
         assert(phi->dest.is_ssa);

         lattice_entry *entry = get_lattice_entry(phi.dest.ssa, state);

         // Set as overdefined. If this is a change in state, add to worklist.
         if (set_as_overdefined(entry, entry->range_type))
            nir_ssa_def_worklist_push_head(state->ssa_worklist, phi->dest.ssa);

      }
      default: {
               // XXX: we need to handle other users than alu's probably.
               // We can, for now, set that use as overdefined and bail.
      }
   }
}


/*
 * Coordinates finding the uses of the ssa_def corresponding
 * to the entry and sticking them in the ssa_worklist.
 * Should be run on every lattice_entry that we change the
 * lattice_information of. The evaluate_xxx functions can return
 * bool if they change anything, thereby making it easy to decide
 * if we should run this function or not.
 */
static void
coordinate_uses(lattice_entry *entry, value_range_state *state)
{
   struct set_entry *set_entry;
   set_foreach(entry->ssa_def->uses, entry) {
      nir_instr *instr = (nir_instr *)set_entry.key;

      /*
       * Also, if the block has not been marked as reachable yet then
       * don't bother adding the use to the list. That will be done when
       * the block has been determined to be reachable.
       */
      if (is_block_reachable(instr->block, state) == false)
         continue;

      switch (instr.type)
      case nir_instr_type_alu: {
         nir_alu_instr *alu = nir_instr_as_alu(instr);
         assert(alu->dest.dest.is_ssa);

         /*
          * If it is overdefined we want to push it to head of the list.
          * That way we will push those down the tree fast,
          * so we don't process them multiple times.
          */
         if (entry->type == overdefined) {
            nir_ssa_def_worklist_push_head(state->ssa_worklist,
                                           alu->dest.dest.ssa);
         } else {
            nir_ssa_def_worklist_push_tail(state->ssa_worklist,
                                           alu->dest.dest.ssa);
         }
      }
      case nir_instr_type_phi: {
         nir_phi_instr *phi = nir_instr_as_phi(instr);
         assert(phi->dest.is_ssa);

         if (entry->type == overdefined) {
            nir_ssa_def_worklist_push_head(state->ssa_worklist,
                                           phi->dest.ssa);
         } else {
            nir_ssa_def_worklist_push_tail(state->ssa_worklist,
                                           phi->dest.ssa);
         }
      }
      default: {

         // XXX: we need to handle other users than alu's probably.
         // We can, for now, set that use as overdefined and bail.
      }
   }

   set_foreach(entry.ssa_def->if_uses, entry) {
      /*
       * If the condition was used for an if then we should
       * do something about the if to "push our range" into the
       * then and else branch of the if by inserting a copy
       * in each of the blocks where we apply the range implied by the
if-statement.
       * The then branch would have the range that would cause true,
       * and the potential else branch would have the range that
       * would cause a false to be set.
       *
       * How the conditional statement looks is something completely
       * different, and how we should handle this in a nice and
       * controlled manner is beyond my comprehension with the
       * level of coffee that has currently been consumed.
       *
       * We should also make sure we add one, the other, or both branches to
       * the block worklist, as is implied by the if-statement.
       * It could be done here, but we might just as well do it in the
       * main loop where we loop over the block list, as we will
       * still need to handle other cases there.
       */

      is_entry_true();
      is_entry_false();
   }
}

static bool
nir_opt_value_range_impl(nir_function_impl *impl)
{
   /*
    * We might want to run a pass to insert "pi-nodes" into the
    * ssa-tree before running the pass. This is esentially just
    * a self-assignment x2 = x1 that is used to have something
    * to "store" the range implied by things like if's.
    * This will also lead to a need of inserting more phi-nodes,
    * as one gets variables that diverge and then converge.
    *
    * x1 = ....; [-unlimited, unlimited]
    * if (x1 < 7)
    *    x2 = x1; [-unlimited, 7]
    *    |
    *    |
    * else
    *    x3 = x1; [7, unlimited]
    *    |
    *    |
    * x4 = phi(x2, x3);
    *
    */

   /*
    * Start with initializing things.
    * Mark the first block as executable.
    * Then we should run a pass over all lattice entries,
    * marking all of them as undefined.
    * All blocks should be marked as unreachable.
    */


   value_range_state *state;
   lattice_entry *entries = rzalloc_array(state.mem_ctx, struct lattice_entry,
                                          impl.ssa_alloc);

   state->impl = impl;
   state->entries = entries;
   nir_block_worklist_init(state->block_worklist, impl->num_blocks,
state->mem_ctx);
   nir_ssa_def_worklist_init(state->ssa_worklist, impl->ssa_alloc,
state->mem_ctx);

   nir_foreach_block(impl, initialize_block, state);

   nir_block_worklist_push_head(state->block_worklist, impl->start_block);

   /*
    * XXX: initialize the unreachable block bitset.
    */

   /*
    * Process the work lists until they are empty.
    */
   while (state->block_worklist->count > 0 ||
          state->ssa_worklist->count > 0) {

      /*
       * Process the instruction work list
       * If this should go before the block list or not
       * can be debated, but that's easy enough to change.
       * Probably the block work list should go first, as
       * is done in the paper (XXX: Is it?)
       *
       * This probably doesn't do things exactly like in the paper.
       * Instead of storing the instructions that have changed, and
       * going through each user we are instead adding to the list
       * all the users of instructions that have changed. In practice
       * there is no difference, apart from looping across each use is
       * moved out to a separate function.
       *
       * Overdefined values are added to the list at the head, while
       * others are added to the tail. That way we converge all the
       * overdefined instructions faster, and so will reduce the runtime
       * of the pass by not calculating ranges that will later be
       * determined to be overdefined. Since we should not treat an
       * overdefined instruction as endlessly overdefined this may
       * actually prove to be a wrong assumption. This should
       * be checked further, but it should not hurt and so is
       * probably fine to leave as is for now.
       */
      while (state->ssa_worklist->count > 0) {
         /*
          * All instructions in the list are here because
          * we got new information about the range of an operand.
          * If the instruction is overdefined we don't need to process
          * it as it has reached the "lowest status", and therefore
          * there should be no way it can be elevated again.
          * Should probably ensure that this logic actually ends up being true?
          * It may not, for things like "and", "or", or "min, max".
          * If it is range or constant, process the value.
          */
         nir_ssa_def *def;
         nir_ssa_def_worklist_pop_head(state.ssa_worklist, def);

         /*
          * If this belongs in a loop we don't want to do anything.
          * Setting the instruction to overdefined will be
          * handled when the loop is added to the block worklist.
          * XXX: This check is probably all wrong. Need to do someting
          * more complicated to detect this (note to self)
          */
         if (def->parent_instr->block->cf_node.type == nir_cf_node_loop)
            continue;

         lattice_entry *entry = get_lattice_entry(def, state);
         if (!entry->type == overdefined) {
            /*
             * Run the evaluate_expression function.
             * Can add a helper here, that checks if it is alu,
             * phi, const, or other, or we can have that switch-case here.
             */
         }

         /*
          * If the result of the evaluation differs from what it was initially
          * (if we have found anything new) add all uses of the expression to
          * the ssa work list.
          *
          * This should be done with the coordinate_uses function.
          *
          * If it is a  conditional statement we should add
          * the edge that the condition "points to". If the condition is set
          * to indeterminate we should add all the edges going out from the
          * conditional statement to the block work list. This may be done
          * here, in the coordinate_uses function, or somewhere else.
          * XXX: Note to self, figure out where we want to "solve" branches.
          */
      }

      /*
       * Process the basic block work list.
       */
      while (state->block_worklist->count > 0) {
         nir_block *block;
         nir_block_worklist_pop_head(state->block_worklist, block);

         /* XXX: If block is already reachable then we should revisit the
          * phi-nodes of the block, as we have come into this block
          * from a new direction. Or maybe we shouldn't, as that should
          * have already been done by the updates of the corresponding
          * sources, so we should not do anything
          *
          * Since we have our "coordinate_uses" function that also
          * handles phi functions we can just skip the block if it is
          * already set as reachable, as the phi's will get automagically
          * added to the ssa-def-worklist as they are users of the defs.
          */
         if (is_block_reachable(block, state))
            continue;

         /* Block has been determined to be reachable, mark it */
         mark_block_reachable(block, state);

         /* XXX: We don't yet handle loops. Just mark as overdefined
          * This is not perfectly correct. We want to mark as overdefined
          * everything except constants, or saturates, or other crazy stuff
          * that we for sure know at least part of the range of. We might
          * want to split the range detection in two parts. One that
          * "calculates" the ranges, and one that sets the ranges based on
          * the ranges that are implied by the operation. This in practice
          * means that we have a separate function for setting ranges based
          * on the operations inherit range, or also if it is a constant,
          * and one function for doing the "calculation" on the stuff that
          * we want to calculate the ranges of. This is something that should
          * be investigated and thoroughly thought through.
          */
         if (block->cf_node.type == nir_cf_node_loop) {
            mark_block_overdefined(block, state);
            continue;
         }

         /* XXX: Evaluate all phi's and expressions of the block. */
         evaluate_block();

         /*
          * If this block has only one successor then add it to the worklist.
          */
         if ((block->successors[0] != NULL) &&
             (block->successors[1] == NULL)) {
            nir_block_worklist_push_tail(state->block_worklist,
                                         block->successors[0]);
            continue;
         }

         // XXX: This should never occur, so can be removed
         // its left here for future reference
         if ((block->successors[0] == NULL) &&
             (block->successors[1] != NULL)) {
            nir_block_worklist_push_tail(state->block_worklist,
                                         block->successors[1]);
            continue;
         }

         /*
          * Evaluate conditional for which block to visit, or both
          * and add the reachable ones to the block worklist.
          */
         nir_if *if_statement = nir_block_get_following_if(block);

         if (if_statement) {
            nir_ssa_def *condition = if_statement->condition.ssa;
            lattice_entry *entry = get_lattice_entry(condition, state);

            if (is_entry_true(entry)) {
               // XXX: Need to check if this is if, loop, or block.
               // But it should be a block, shouldn't it?
               block_worklist_add_block(if_statement->then_list.head);
               continue;
            }

            if (is_entry_false(entry)) {
               // XXX: Need to check if this is if, loop, or block.
               // But it should be a block, shouldn't it?
               block_worklist_add_block(if_statement->else_list.head);
               continue;
            }

            if (is_entry_overdefined(entry)) {
               block_worklist_add_block(if_statement->then_list.head);
               block_worklist_add_block(if_statement->else_list.head);
               continue;
            }

            assert(get_lattice_entry(condition, state)->type != undefined);
         }

         /*
          * If the above failed we have ended up in a block that is either
          * the last cf_node, or it is an endless loop. The case with
          * the block being the last node is easy enough to test for,
          * but how we're gonna deal with an endless loop is a totally
          * different question. Probably endless loops are not really
          * all that common, and probably shouldn't occur?
          */
         if (nir_cf_node_is_last(&block->cf_node)) {
            /*
             * This is the last node. This probably doesn't mean that
             * the pass is done with its job of analyzing.
             */
         }
      }
   }

   /*
    * We can now traverse over blocks and delete those that
    * are still marked as unreachable. If we delete a basic block
    * we need to first rewrite the phi's that use the results from
    * the BB. This is tricky. One could also traverse all the phi's
    * and check for sources that are still undefined. This might
    * however not detect constants that are unused, if we decide
    * to initialize all the constants before running the pass.
    * This may however not be without issues.
    * The following is an excerpt from LLVM SCCP:
    *
    *  "ResolvedUndefsIn - While solving the dataflow for a function, we assume
    *   that branches on undef values cannot reach any of their successors.
    *   However, this is not a safe assumption.  After we solve dataflow, this
    *   method should be use to handle this.  If this returns true, the solver
    *   should be rerun. "
    */

   /*
    * Remove the block if it is possible.
    * Just make sure we rewrite the phi's first.
    *
    * XXX: We shouldn't actually do this. We should leave
    * it in a state where it is simple for Connors dead
    * control flow pass to find that it is dead and let that
    * fix it up for us. This in practice means leaving the
    * condition for the if statement as a boolean true/false.
    */
   nir_foreach_block(state->impl, remove_block_if_possible, state);

   /*
    * Check for all values that are proved to be constant,
    * and replace them with their constant instruction counterpart.
    */
   for (unsigned i = 0; i < state->impl->ssa_alloc; i++) {
      lattice_entry *entry = state.entries[i];
      if (entry->type == constant) {
         /*
          * This entry has been determined to be constant.
          * Let's solve it and replace its uses with the new
          * ssa_def that we have made.
          */
      }
      if (entry->can_be_predetermined) {
         /*
          * We have found that this entry can be predetermined.
          * However it is not constant. This calls for a bit more
          * difficult solving of the expression. Probably.
          * This needs some more thought on how to handle it.
          * Things like min/max with ranges that do not intersect
          * may end up here. Also things that can be determined due to sat,
          * and things that are known to be useless.
          */
      }
   }

   /*
    * It might be interesting to do simultaneous updating of ssa and analysis.
    * That might prove to be hard or infeasible. Another option is to
    * mark min/max instructions that we know the direction of to be
    * constant, but not constant as in value but rather solved. We
    * can then check if the expression is actually constant, or if
    * it is instead just "possible to determine", and then do the
    * elimination of the dead max instruction and rewriting the
    * uses and src'es after the pass has been run.
    */
}

bool
nir_opt_value_range(nir_shader *shader)
{
   bool progress = false;
   nir_foreach_overload(shader, overload) {
      if (overload->impl && nir_opt_value_range_impl(overload->impl))
         progress = true;
   }

   return progress;
}
_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Reply via email to