Local spark scheduling is bad

I've been trying to improve the performance of parallel programs because by default it appears to be terrible...

In particular I've been using a modified version of the icfp_2000 ray tracer. It has been modified to be trivially parallelisable. And it is therefore reasonable to expect decent performance when parallelising it. It has been modified to render a row at a time, and within each row render a pixel at a time. A render_rows predicate has two independent calls in a conjunction, render_row and render_rows (recursive). These calls are independent and therefore a call must exist to merge their results (we use concatenation of cords), therefore render_rows is not tail-recursive. By making this conjunction a parallel conjunction we can easily parallelise this program. This code can be found at progs/icfp2000_par_pbone within the benchmarks CVS module.

When running this program with MERCURY_OPTIONS="-P4" in a parallel grade it performs marginally better than a sequential version. Although we can show that such a small improvement can come from using a parallel-mark phase in the garbage collector (which is enabled in all parallel grades). The performance continues to improve as we increase --max-contexts-per-thread which allows for more parallelism by scheduling more computations on the global spark queue.

Graph showing wall-time for icfp2000 with varing values for --max-contexts-per-spark

The above graph shows boxplots of the wall time (from 10 samples) of the icfp_2000 ray-tracer as we vary the value of --max-contexts-per-thread. The first boxplot shows the execution of the same program compiled for sequential execution. The other plots double the number of --max-contexts-per-spark starting from the default of two.

  mean standard deviation
main_asmfast-gc 85.23 0.39
main_asmfast-gc-par_p4_c2 76.76 0.19
main_asmfast-gc-par_p4_c4 73.74 0.32
main_asmfast-gc-par_p4_c8 70.08 1.04
main_asmfast-gc-par_p4_c16 66.41 0.51
main_asmfast-gc-par_p4_c32 63.35 1.20
AttachmentSize
icfp2000_max-contexts-per-thread.png6.73 KB