Re: [PPL-devel] integer versus rational solutions

13 Jul 2009


      On Mon, 2009-07-13 at 12:07 +0200, Albert Cohen wrote:
...
Tobias Grosser wrote:
...
On Fri, 2009-07-10 at 23:57 +0200, Albert Cohen wrote:
...
Hi Tobias.
Tobias Grosser wrote:
...
Stop. Do I understand this correctly. You do not want to check
dependencies after strip mining? I am very concerned about that, as one
of the benefits of working on the polytop model I thought is, that we
can schedule transformations in an arbitrary order. So it is required to
to reanalyze strip mined loops. This is also necessary for the
dependency checking we use for autopar.
As presented in Austin in November, it is much preferable to follow the 
transformation composition approach of Girbal and Vasilache that does 
not require intermediate steps of dependence analysis and legality checking.
You are referring to "Facilitating the search for compositions of
program transformations"?
Yes. The 2d+1 encoding and the additional invariants/rules allow to 
compose and generalize classical transformations to a polyhedral 
representation that may have diverged arbitrarily far from the original 
syntax tree.
Sure. The same with the polyhedral representation in Graphite. It also
should allow us to compose and generalize arbitrary transformations on
the polyhedral data structures.
However I do not remember that we specified any 2d+1 encoding in our
data structures. So at the moment we are more general, and as you might
remember from early discussions it is very important for me to keep this
until we have a discussion where we agree to trade of generality for
performance.
Ideally we would have an example, a compile time problem and a
discussion where we agree that adding a assumption like 2d+1 will
improve algorithmic complexity and that this improvement is required in
real world code.
This information should than be part of graphite-poly.h.
...
...
(...)
...
...
Konrad is working on Step 2.1 (enhancing strip-mine to update the 
dependence info). Then, if you agree, he will help restructure the API 
such a that dependence computation and dependence checking are ALWAYS 
called explicitely, rather than implicitely in the transformation 
primitives.
Sure. I would like us to be even able to call them after any single
transformation.
A loop like would be nice for the optimizations.
while (select_and_apply_transform (scop)) {
   check_dependencies();
}
Yes. Skipping some checks require stronger technology to guarantee that 
transformation steps will be legal or will eventually be legal/checked, 
or even corrected/completed via enabling transformations (see 
Vasilache's PACT'07 paper).
As a concrete, more original example: together with Razya and Konrad, we 
are thinking about enabling parloops and interchange with an automatic 
privatization "corrective step". We would ignore memory-based 
dependences when looking for interchange combinations that improve both 
parloops and autovect (using an extended cost model), then check which 
ones were eventually violated in the selected transformation sequence, 
and eventually privatize only those scalars/arrays that need to be 
privatized to effectively remove those memory-based dependences.
That sounds interesting. We also thought about memory optimizations, so
I would like the poly_drs to not be read only, but read/write. This
means that we can analyze the data accesses on the polyhedral model,
change them in the polyhedral data accesses and graphite will take care
of creating the required GIMPLE code.
This should allow a lot of nice optimizations, like reduce memory
footprint if only a part of an array is written to or allow more
parallelism by trading memory for less dependencies.
...
...
I know this will be expensive and we want to disable this in the
production compiler. But I want us to be able to use this loop for
debugging.
Sure.
...
I totally agree that it is useful to avoid polyhedral computations, as
they tend to be expensive. However I still want to be able to do CHECK
and ANALYSIS at every single step, as it will show us assumptions that
we took to optimize compile time.
If we take any assumptions (that might be necessary), I would like them
to not be hidden by avoiding dependency ANALYSIS, but I want our
(nightly) test to fail.
This will allow a analysis/discussion of the assumptions we took. Maybe
we can avoid them, if not we know we have to document them. Every
assumption has to be written down at a well known place so that future
developers will not be surprised.
OK. But running the ANALYSIS itself after transformation makes little 
sense to me, even in debugging mode. Can you explain why?
Sure. The idea I have of Graphite is that we have a set of
transformations that all analyze GPOLY and try to improve it.
However improve does not go only in one direction. For example there
might be passes trading dependencies for memory usage, that improve
memory usage, but add dependencies or remove dependencies and add memory
usage.
I would like to have a clever scheduler (or some database) deciding
which passes should be applied. E.g. first we trade some memory for less
dependencies, than we try to expose more parallelism, maybe do some
interchange to enable fusion and fission, and finally parallelize all
the stuff.
To allow this free ordering we have to ensure, that every pass is able
to analyze GPOLY to see if it can apply e.g. interchange. So e.g. after
strip mining it might still be useful to do some more transformations to
allow vectorization. If the vectorization pass can not analyze the GPOLY
after strip mining is applied this will not work.
So what I want is:
1. We have one representation GPOLY where we do our optimizations on
2. Every optimization pass works only on GPOLY
2.1 Every pass has to be able to read GPOLY and to generate GPOLY
2.2 Every pass has to decide only by looking and analyzing GPOLY if it
can be applied.
3. Optimizing a program is applying a lot of PASSES on GPOLY to generate
better scattering and/or better data accesses.
4. Graphite will read GPOLY and following the description in GPOLY
generate the GIMPLE code.
It might be possible that this will not work or is to slow. If there is
a limitation in GPOLY the solution I see is to extend GPOLY officially
(e.g. to handle arbitrary conditions, allow changing the data
accesses, ...).
...
...
...
...
Another thing I do not understand is that you are often referring to
"large constants". I thought complexity problems mainly appear because
of a high number of dimensions, but not because of high values in a
(in)equality. Can you explain this to me?
As mentioned several times, big constants tend to break the "structure" 
of ILP problems or other polyhedral operations, leading to combinatorial 
explosions. This is well known even in the case of simple algorithms 
like Fourier Motzkin. They also increase the risk of memory overflow.
OK. I definitely believe you, but as this seems to become one of the
important points for Graphite performance I would like to understand
this well. Maybe we can also write a Wiki page to explain which
constructs should be avoided. We already learned NNC_Polyhedron are bad,
but what else is dangerous.
Here I am still not getting the problem. I looked again into Fourier
Motzkin and on the first look I could not find anything in it's
complexity depending on the values of the dimensions of the
(in)equalities. Any pointer or example that might help me understanding
this problem would be really helpful.
FM blows away when it generates too many inequalities. To avoid this, 
people generally favor eliminations that generate 0 or 1 inequality, first.
Sure. I understand that the number of dimensions and/or (in)equalities
is a limitation. What I do not understand is why the values in an
(in)equality are problematic. This is why
1 > i might be easier to handle than 99999999 > i
Or did I get you wrong and you did not say anything about this?
...
...
...
Eventually, the bigger the constant the higher the loss of information 
when relaxing to rational polyhedra.
OK. However when we do not put the additional restrictions the
information would never have been available. Or are we loosing other
information because of putting additional restrictions?
Sorry, I missed that part. Can you re-explain?
I thought the mail I was referring to is that we put restrictions on
parameters in the context of CLooG. As we know that a parameter "k" is
of type unsigned integer we know it the constraint 0 <= k <= MAX_INT is
always true. By adding this constraint to the context we allow CLooG to
remove some conditions, as the generated code assume that k will be in
this limits.
If I got your comments right, you warned that adding large constants
like MAX_INT might be bad for compile time performance. That is
something I did not understand and as adding the constraint to the
context improves the code generated by CLooG, I would like to keep
adding these constraints as long as compile time allows.
...
...
...
All of this is very informal/intuitive of course. There are some/few 
theoretical results about this, but Roberto and other experts are better 
placed than myself to answer it!
@Roberto:
Can you give us some hints, what is really expensive and what should we
avoid when constructing polyhedrons in the PPL?
Tobi