Hi Enea,

I have installed the developmental version of ppl and configured it with thread-safety on. It seems to work just as you say it will, but I am having issues getting the expected speedups. To demonstrate the speedup issue, I have included a sample program below. This program creates a user inputted number of threads, and in each thread it intersects two NNC_Polyhedron a user inputted number of times. For timing comparisons, I also made a code path in the test program that does not call PPL but rather computes logarithms.

#include <ppl.hh>

#include "Thread_Pool_defs.hh"

using namespace Parma_Polyhedra_Library;

namespace Parma_Polyhedra_Library {using IO_Operators::operator<<;}

using namespace std;

void TestIntersections(int RepCount, bool TestPPL) {

double x = 10.0;

double b = 0.0;

for (size_t i = 0; i != j; k++) {

if (TestPPL == false) {

x += i;

b += log(x);

} else {

Variable x0(0);Variable x1(1);Variable x2(2);Variable x3(3);Variable x4(4);

Variable x5(5);Variable x6(6);Variable x7(7);Variable x8(8);Variable x9(9);

Constraint_System cs1;

cs1.insert(x8-x9==0);cs1.insert(x2-x9>=0);cs1.insert(x3-x9>=0);

cs1.insert(x4-x9>=0);cs1.insert(x5-x9>=0);cs1.insert(x1-x9>=0);

cs1.insert(x6-x9>=0);cs1.insert(x7-x9>=0);cs1.insert(x0-x9>=0);

NNC_Polyhedron ph1(cs1);

Constraint_System cs2;

cs2.insert(x7-x9==0);cs2.insert(x2+x3-x8-x9>=0);cs2.insert(x1+x2-x8-x9>=0);

cs2.insert(x3+x4-x8-x9>=0);cs2.insert(x0-x8>=0);cs2.insert(x5+x6-x8-x9>=0);

cs2.insert(x6-x8>=0);cs2.insert(x0+x1-x8-x9>=0);cs2.insert(x4+x5-x8-x9>=0);

NNC_Polyhedron ph2(cs2);

NNC_Polyhedron ph3(cs1);

ph2.add_constraints(ph2.minimized_constraints());

ph2.minimized_constraints();

ph2.affine_dimension();

};

};

}

int main(int argc, char* argv[]) {

int TotalProcessCount = atoi(argv[1]);

int RepCount = atoi(argv[2]);

bool TestPPL = atoi(argv[3]);

typedef std::function<void()> work_type;

Thread_Pool<work_type> thread_pool(TotalProcessCount);

for (size_t i = 0; i != TotalProcessCount; i++) {

work_type work = std::bind(TestIntersections, RepCount, TestPPL);

thread_pool.submit(make_threadable(work));

};

thread_pool.finalize();

return 0;

}

This is how I compiled:

g++ -std=c++11 -pthread file_name.cpp -l:libtcmalloc_minimal.so.4.2.6 -lppl -lgmpxx -lgmp

I tested this on a new machine with 44 cores and hyperthreading (thread::hardware_concurrency() = 88), run with RepCount = 10,000 and TestPPL = true. Here are the timings:

#thread,real time (from time)

1,0m0.925s

5,0m1.820s

10,0m3.041s

20,0m3.758s

40,0m6.775s

By way of comparison, here are the timings for RepCount = 50,000,000 and TestPPL = false:

#thread,real time (from time)

1,0m1.767s

5,0m1.854s

10,0m2.012s

20,0m2.139s

40,0m2.206s

Assuming sufficient hardware, I would expect it to take the same amount of time for 1 thread as 40 threads, though I know that that is not quite realistic. Am I doing something incorrectly in the PPL code branch that is causing it to slow down so much as the number of threads increases? I am not very experienced with parallel C++ programming, so please forgive me if I am doing something foolish. Thanks so much for all of the help.

Best,

Jeff

On Sat, Oct 8, 2016 at 4:15 AM, Enea Zaffanella <zaffanella@cs.unipr.it> wrote:

Hello John.

On 10/07/2016 06:54 PM, John Paulson wrote:

Hi Enea and Roberto,

Thank you very much for all of the work you have done on threading with PPL. I have a couple of questions about the current status of the thread safety. A project I am currently working on needs to have multiple threads working simultaneously on distinct PPL objects. Is that possible with the new thread safe version?

Yes, it is possible.

If the PPL objects are distinct, so that there is no concurrent access to the *same* object, then things should not be difficult (e.g., no need at all for synchronization). You can have a look to the ppl_lcdd and ppl_lpsol demos or to the recently added tests in tests/Polyhedron/threadsafe*.cc

I made an implementation of my current project where I fork processes instead of using multiple threads. However, if I fork and have two processes manipulating distinct PPL objects, I do not get a speedup by a factor of two like I would expect. Is there anyway to get that type of speedup?

If you use many process, then the PPL should be working fine "as is" (i.e., with thread-safety off).
As for the missing speedup, I doubt it has something to do with the library:
there should be something else going on, but I have no information to guess what it could be.

Cheers,
Enea.
Best,

John C. Paulson
_______________________________________________
PPL-devel mailing list
PPL-devel@cs.unipr.it
http://www.cs.unipr.it/mailman/listinfo/ppl-devel