
On Mon, 2009-07-13 at 12:07 +0200, Albert Cohen wrote:
Tobias Grosser wrote:
On Fri, 2009-07-10 at 23:57 +0200, Albert Cohen wrote:
Hi Tobias.
Tobias Grosser wrote:
Stop. Do I understand this correctly. You do not want to check dependencies after strip mining? I am very concerned about that, as one of the benefits of working on the polytop model I thought is, that we can schedule transformations in an arbitrary order. So it is required to to reanalyze strip mined loops. This is also necessary for the dependency checking we use for autopar.
As presented in Austin in November, it is much preferable to follow the transformation composition approach of Girbal and Vasilache that does not require intermediate steps of dependence analysis and legality checking.
You are referring to "Facilitating the search for compositions of program transformations"?
Yes. The 2d+1 encoding and the additional invariants/rules allow to compose and generalize classical transformations to a polyhedral representation that may have diverged arbitrarily far from the original syntax tree.
Sure. The same with the polyhedral representation in Graphite. It also should allow us to compose and generalize arbitrary transformations on the polyhedral data structures.
However I do not remember that we specified any 2d+1 encoding in our data structures. So at the moment we are more general, and as you might remember from early discussions it is very important for me to keep this until we have a discussion where we agree to trade of generality for performance. Ideally we would have an example, a compile time problem and a discussion where we agree that adding a assumption like 2d+1 will improve algorithmic complexity and that this improvement is required in real world code. This information should than be part of graphite-poly.h.
(...)
Konrad is working on Step 2.1 (enhancing strip-mine to update the dependence info). Then, if you agree, he will help restructure the API such a that dependence computation and dependence checking are ALWAYS called explicitely, rather than implicitely in the transformation primitives.
Sure. I would like us to be even able to call them after any single transformation.
A loop like would be nice for the optimizations.
while (select_and_apply_transform (scop)) { check_dependencies(); }
Yes. Skipping some checks require stronger technology to guarantee that transformation steps will be legal or will eventually be legal/checked, or even corrected/completed via enabling transformations (see Vasilache's PACT'07 paper).
As a concrete, more original example: together with Razya and Konrad, we are thinking about enabling parloops and interchange with an automatic privatization "corrective step". We would ignore memory-based dependences when looking for interchange combinations that improve both parloops and autovect (using an extended cost model), then check which ones were eventually violated in the selected transformation sequence, and eventually privatize only those scalars/arrays that need to be privatized to effectively remove those memory-based dependences.
That sounds interesting. We also thought about memory optimizations, so I would like the poly_drs to not be read only, but read/write. This means that we can analyze the data accesses on the polyhedral model, change them in the polyhedral data accesses and graphite will take care of creating the required GIMPLE code.
This should allow a lot of nice optimizations, like reduce memory footprint if only a part of an array is written to or allow more parallelism by trading memory for less dependencies.
I know this will be expensive and we want to disable this in the production compiler. But I want us to be able to use this loop for debugging.
Sure.
I totally agree that it is useful to avoid polyhedral computations, as they tend to be expensive. However I still want to be able to do CHECK and ANALYSIS at every single step, as it will show us assumptions that we took to optimize compile time. If we take any assumptions (that might be necessary), I would like them to not be hidden by avoiding dependency ANALYSIS, but I want our (nightly) test to fail.
This will allow a analysis/discussion of the assumptions we took. Maybe we can avoid them, if not we know we have to document them. Every assumption has to be written down at a well known place so that future developers will not be surprised.
OK. But running the ANALYSIS itself after transformation makes little sense to me, even in debugging mode. Can you explain why?
Sure. The idea I have of Graphite is that we have a set of transformations that all analyze GPOLY and try to improve it. However improve does not go only in one direction. For example there might be passes trading dependencies for memory usage, that improve memory usage, but add dependencies or remove dependencies and add memory usage.
I would like to have a clever scheduler (or some database) deciding which passes should be applied. E.g. first we trade some memory for less dependencies, than we try to expose more parallelism, maybe do some interchange to enable fusion and fission, and finally parallelize all the stuff.
To allow this free ordering we have to ensure, that every pass is able to analyze GPOLY to see if it can apply e.g. interchange. So e.g. after strip mining it might still be useful to do some more transformations to allow vectorization. If the vectorization pass can not analyze the GPOLY after strip mining is applied this will not work.
So what I want is:
1. We have one representation GPOLY where we do our optimizations on 2. Every optimization pass works only on GPOLY 2.1 Every pass has to be able to read GPOLY and to generate GPOLY 2.2 Every pass has to decide only by looking and analyzing GPOLY if it can be applied. 3. Optimizing a program is applying a lot of PASSES on GPOLY to generate better scattering and/or better data accesses. 4. Graphite will read GPOLY and following the description in GPOLY generate the GIMPLE code.
It might be possible that this will not work or is to slow. If there is a limitation in GPOLY the solution I see is to extend GPOLY officially (e.g. to handle arbitrary conditions, allow changing the data accesses, ...).
Another thing I do not understand is that you are often referring to "large constants". I thought complexity problems mainly appear because of a high number of dimensions, but not because of high values in a (in)equality. Can you explain this to me?
As mentioned several times, big constants tend to break the "structure" of ILP problems or other polyhedral operations, leading to combinatorial explosions. This is well known even in the case of simple algorithms like Fourier Motzkin. They also increase the risk of memory overflow.
OK. I definitely believe you, but as this seems to become one of the important points for Graphite performance I would like to understand this well. Maybe we can also write a Wiki page to explain which constructs should be avoided. We already learned NNC_Polyhedron are bad, but what else is dangerous.
Here I am still not getting the problem. I looked again into Fourier Motzkin and on the first look I could not find anything in it's complexity depending on the values of the dimensions of the (in)equalities. Any pointer or example that might help me understanding this problem would be really helpful.
FM blows away when it generates too many inequalities. To avoid this, people generally favor eliminations that generate 0 or 1 inequality, first.
Sure. I understand that the number of dimensions and/or (in)equalities is a limitation. What I do not understand is why the values in an (in)equality are problematic. This is why
1 > i might be easier to handle than 99999999 > i
Or did I get you wrong and you did not say anything about this?
Eventually, the bigger the constant the higher the loss of information when relaxing to rational polyhedra.
OK. However when we do not put the additional restrictions the information would never have been available. Or are we loosing other information because of putting additional restrictions?
Sorry, I missed that part. Can you re-explain?
I thought the mail I was referring to is that we put restrictions on parameters in the context of CLooG. As we know that a parameter "k" is of type unsigned integer we know it the constraint 0 <= k <= MAX_INT is always true. By adding this constraint to the context we allow CLooG to remove some conditions, as the generated code assume that k will be in this limits.
If I got your comments right, you warned that adding large constants like MAX_INT might be bad for compile time performance. That is something I did not understand and as adding the constraint to the context improves the code generated by CLooG, I would like to keep adding these constraints as long as compile time allows.
All of this is very informal/intuitive of course. There are some/few theoretical results about this, but Roberto and other experts are better placed than myself to answer it!
@Roberto: Can you give us some hints, what is really expensive and what should we avoid when constructing polyhedrons in the PPL?
Tobi