Planned vectorization enhancements for 4.2: 1. Recognize reduction patterns (Dorit). Some computations have specialized target support and can be vectorized more efficiently if the computation idiom is recognized and vectorized as a whole. This is especially true to idioms that involve multiple types - multiple-types require packing/unpacking of vector elements, unless the entire pattern is recognized. Examples for such patterns are summation into a result wider the arguments ("widening sum"), dot product, sum of absolute differences, and more. This project will include (1) a pattern recognition engine, to be used for patterns that the vectorizer can benefit from. (2) functions to recognize reduction patterns. (3) extend the current reduction support to handle reduction patterns. (4) more patterns that are not related to reduction, e.g. saturation. * Delivery Date: Stage 1 of 4.2. Most of the above already implemented, and most of that is already in autovect-branch. * Benefits: More loops vectorized. 2. Vectorize interleaved data (Ira). Currently the vectorizer supports only computations with stride 1 (consecutive data elements). Some important computations access data with stride other than 1 - for example complex data with the real and imaginary parts interleaved - the stride in this case is 2. We want to extend the vectorizer to support these computations. For that we will also need to introduce new tree-codes/optabs. * Delivery Date: Stage 2 of 4.2. * Benefits: More loops vectorized. 3. Vectorize in the presence of multiple data types (Dorit). Currently the vectorizer supports loops that operate on a single data type. In particular, the vectorizer doesn't support type casts, which in vectorized form require packing/unpacking of data elements between vectors. We want to extend the vectorizer to handle type conversions. This will require introducing some of the new tree-codes/optabs we discussed last year. * Delivery Date: Stage 2 of 4.2 * Benefits: More loops vectorized. Not sure when the rest of the items will be ready, and if they'll make it for 4.2, but it's high on our todo list: 4. Vectorization of induction (Dorit). The vectorizer currently doesn't support vectorization of induction, e.g. a[i] = i. We want to extend the vectorizer to handle such computations. We already have some of the required steps implemented as part of the reduction support. * Delivery Date: unknown * Benefits: More loops vectorized. 5. Versioning for aliasing (Dorit/Ira) It is often difficult/impossible to prove that two data-references in the loop don't overlap (e.g. when they are accessed using pointers). It is still possible to vectorize such loops using runtime dependence checks, much like the runtime alignment checks that were recently committed to mainline. I.e., use loop versioning and guard the vectorized version with a runtime aliasing test. * Delivery Date: unknown * Benefits: More loops vectorized. 6. Cost model (Dorit/Ira). We are currently vectorizing whenever we can. This can often hurt performance, for example, if the loop is very short, because of the overheads involved in vectorization (e.g. alignment handling, loop peeling, and epilog code for reduction). We also need the cost model to decide how to vectorize - for example - there are different ways we can handle alignment (versioning, peeling, misaligned vector accesses). We want to try to estimate the costs involved in vectorization, and make a decision based on that whether to vectorized or not, and how. * Delivery Date: unknown * Benefits: Improved performance when vectorizing. 7. Misaligned stores (Dorit/Ira). We currently don't handle misaligned stores. Instead we peel the loop to force the alignment of the store. This works only for one misaligned store; if there's more than one misaligned store and we can't prove that all the stores in the loop have the same misalignment, we can't vectorize the loop. We want to add the capability to vectorize misaligned stores. * Delivery Date: unknown * Benefits: More loops vectorized. Personnel * Dorit Nuzman * Ira Rosen Dependencies None. Modifications Required All modifications are local to the vectorizer pass, except for adding new tree-codes and optabs for the new patterns and misaligned stores. dorit