
Tobias Grosser wrote:
Sure. The same with the polyhedral representation in Graphite. It also should allow us to compose and generalize arbitrary transformations on the polyhedral data structures.
However I do not remember that we specified any 2d+1 encoding in our data structures. So at the moment we are more general, and as you might remember from early discussions it is very important for me to keep this until we have a discussion where we agree to trade of generality for performance.
Yes, this is a good thing to do. We should keep this generality as much as possible. Some part of the API may require stronger assumtions, however.
Ideally we would have an example, a compile time problem and a discussion where we agree that adding a assumption like 2d+1 will improve algorithmic complexity and that this improvement is required in real world code. This information should than be part of graphite-poly.h.
It is probably best to limit the assumption/restriction to a subset of the API, and to document it in the header files.
Sure. The idea I have of Graphite is that we have a set of transformations that all analyze GPOLY and try to improve it. However improve does not go only in one direction. For example there might be passes trading dependencies for memory usage, that improve memory usage, but add dependencies or remove dependencies and add memory usage.
Yes, like the expansion/contraction tradeoff.
I would like to have a clever scheduler (or some database) deciding which passes should be applied. E.g. first we trade some memory for less dependencies, than we try to expose more parallelism, maybe do some interchange to enable fusion and fission, and finally parallelize all the stuff.
To allow this free ordering we have to ensure, that every pass is able to analyze GPOLY to see if it can apply e.g. interchange. So e.g. after strip mining it might still be useful to do some more transformations to allow vectorization. If the vectorization pass can not analyze the GPOLY after strip mining is applied this will not work.
This is a good argument.
So what I want is:
- We have one representation GPOLY where we do our optimizations on
- Every optimization pass works only on GPOLY
2.1 Every pass has to be able to read GPOLY and to generate GPOLY 2.2 Every pass has to decide only by looking and analyzing GPOLY if it can be applied. 3. Optimizing a program is applying a lot of PASSES on GPOLY to generate better scattering and/or better data accesses. 4. Graphite will read GPOLY and following the description in GPOLY generate the GIMPLE code.
It might be possible that this will not work or is to slow. If there is a limitation in GPOLY the solution I see is to extend GPOLY officially (e.g. to handle arbitrary conditions, allow changing the data accesses, ...).
Sounds good.
Of course, analyzing GPOLY does not necessarily imply recomputing the dependence polyhedra.
But I agree it should always be possible to re-analyze dependence polyhedra.
Without an exact Z-polyhedral enveloppe, we'll never guarantee that we do not lose key information...
Unfortunately, stripmining will ALWAYS be one of the worst transformations in terms of information loss on a rational enveloppe. Whatever the encoding you can think of. The only workaround is to delay it as much as possible. For example, Pluto does not effectively strip-mine until all the heuristic transformation (finding hyperplanes and fusion/distribution options) is complete.
Parametric tiling approaches remove strip-mining from the polyhedral representation altogether, but they have limitations, see Renganarayanan et al. (multidimensional tiling for the price of one, PLDI'07, SC'07) and Hartono et al. (ICS'09). We can discuss it later, this is still too incomplete.
I thought the mail I was referring to is that we put restrictions on parameters in the context of CLooG. As we know that a parameter "k" is of type unsigned integer we know it the constraint 0 <= k <= MAX_INT is always true. By adding this constraint to the context we allow CLooG to remove some conditions, as the generated code assume that k will be in this limits.
This should be quite good in general: MAX_INT kind of constants only occur as the affine constant part, not as coefficient of variables. They allow to cut branches in complex unions of polyhedra and simplify the computation, without adding overhead or information loss. If those constants occur as coefficients, like for the strip-mining case, it hurts a lot (performancewise and precisionwise).
If I got your comments right, you warned that adding large constants like MAX_INT might be bad for compile time performance. That is something I did not understand and as adding the constraint to the context improves the code generated by CLooG, I would like to keep adding these constraints as long as compile time allows.
Albert