mightymandel v16

GPU-based Mandelbrot set explorer


See also Results for overall test suite timings.



cat /proc/cpuinfo | grep model\ name


free -h | grep ^Mem: | ( read header total rest && echo "${total}" )


lspci | grep VGA


glxinfo | grep OpenGL\ version

VRAM: check your hardware documentation or driver-specfic tools (mightymandel prints out the available memory on startup with --verbose info, which might be a clue)


CPU: AMD Athlon(tm) II X4 640 Processor (4 cores at 3.3GHz)


GPU: NVIDIA Corporation GF116 [GeForce GTX 550 Ti]

DRV: OpenGL 4.4.0 NVIDIA 340.65



CPU: Intel(R) Core(TM)2 Duo CPU P7550 (2 cores at 2.26GHz)


GPU: NVIDIA Corporation G98M [GeForce G 105M]

DRV: OpenGL 3.3.0 NVIDIA 340.65

Hardware Comparison

Parameter file:


Image size: 1280x720


machine mode time
frappuccino de 0.973s
frappuccino no-de 0.976s
latte de 9.528s
latte no-de 7.646s

GPU Step Iterations

Increasing FP___STEP_ITERS reduces running time but can make system laggy. Here are some timings for mightymandel v15-7-gb9e218e (recompiled after each change).

step fp32 fp64 fpxx
64 0.879 9.982 392.396
128 0.753 8.313 328.182
256 0.754 7.661 306.599
512 0.749 7.469 285.568
1024 0.779 7.262 279.090
2048 0.839 7.209

All times are in seconds. Timings are average of three runs (fp32, fp64) or one run (fpxx). Timings have the average of three runs of –overhead subtracted (0.375). The real wall-clock time reported by time was used. Run-time options:

time ./src/mightymandel --one-shot --verbose fatal "${file}"

Benchmarking parameter files:


At 512, fpxx was lagging a bit, and by 1024 lagging severely, so 2048 was not tested. After 512, fp32 starts increasing in time, presumably because the view region has a low average iteration count for exterior pixels. Benchmarks were performed on frappuccino .

With the values fixed in config.glsl, instead of using uniform variables, the table becomes:

step fp32 fp64 fpxx
256 292.500
512 0.720
1024 6.885


Slicing Comparison

Choosing the right --slice value for your available video memory is important. If it is too low, mightymandel might exceed the space and the OS would have to swap data between video memory and system memory. Too high, and mightymandel does more work than necessary coordinating the calculation process.

Command line for benchmarks below:

time ./src/mightymandel --one-shot ./examples/mm/fpxx.mm --glitch \
--geometry 1280x720 --size 7680x4320 --slice "${slice}"
slice time vram
0 abort
1 230.1 1944
2 70.3 972
3 57.6 729
4 67.2 668
5 83.0 653
6 error

Time is in seconds (real wall-clock time elapsed), vram is allocated video memory in MB. Some slice values failed:

  • abort: more than 2GB is needed in a single allocation, which overflows OpenGL signed 32bit size type and becomes negative.
  • error: height is not a multiple of slice factor 64

Hardware: frappuccino

Progressive Rendering Comparison

In --interactive mode, you can increase the --slice value to get a quick lofi preview image, which later refines into the final hifi image.

Command line for benchmarks below:

time ./src/mightymandel ./examples/mm/implementation-comparison.mm \
--glitch --slice "${slice}"
slice preview complete one-shot
0 7.972 13.704 12.561
1 2.991 15.017 12.950
2 1.259 22.689 19.951

Time is in seconds (real wall-clock time elapsed). Preview is time taken until first pixel visibly escape, complete is time taken for image to finish rendering and one-shot is the time taken in --one-shot mode (preview and complete timings are in --interactive mode).

Hardware: frappuccino

Implementation Comparison

Hardware: frappuccino


impl description
mm-de-0 mightymandel –one-shot –glitch –max-glitch 0
mm-no-de-0 mightymandel –one-shot –glitch –max-glitch 0 –no-de
mm-de mightymandel –one-shot –glitch
mm-no-de mightymandel –one-shot –glitch –no-de
kf-auto Kalles Fraktaler 2.7.3 running in wine32 (16 threads on CPU)
kf-1ref as above but with max references set to 1 (default is 69)

Parameter files:


Image size: 1280x720


impl time
kf-auto 35.064s
kf-1ref 4.948s
mm-de-0 15.220s
mm-no-de-0 10.758s
mm-de 13.475s
mm-no-de 9.036s

Timings before de-inversion of control:

impl time
mm-de 15.778s
mm-no-de 11.451s

Timings before GPU Step Iterations were optimized:

impl time
mm-de 16.672s
mm-no-de 12.417s

Timings before reference point finding was improved (no early glitch escape):

impl time
mm-de 21.110s
mm-no-de 15.583s

Timings before reference point finding was simplified:

impl time
mm-de 34.614s
mm-no-de 28.403s

Timings before automatic glitch correction was added (so 1-ref point only):

impl time
mm-de 12.063s
mm-no-de 8.239s