See also Results for overall test suite timings.

Hardware

CPU:

cat /proc/cpuinfo | grep model\ name

RAM:

free -h | grep ^Mem: | ( read header total rest && echo "${total}" )

GPU:

lspci | grep VGA

DRV:

glxinfo | grep OpenGL\ version

VRAM: check your hardware documentation or driver-specfic tools (mightymandel prints out the available memory on startup with --verbose info, which might be a clue)

frappuccino

CPU: AMD Athlon(tm) II X4 640 Processor (4 cores at 3.3GHz)

RAM: 8GB

GPU: NVIDIA Corporation GF116 [GeForce GTX 550 Ti]

DRV: OpenGL 4.4.0 NVIDIA 340.65

VRAM: 1GB

latte

CPU: Intel(R) Core(TM)2 Duo CPU P7550 (2 cores at 2.26GHz)

RAM: 4GB

GPU: NVIDIA Corporation G98M [GeForce G 105M]

DRV: OpenGL 3.3.0 NVIDIA 340.65

Hardware Comparison

Parameter file:

examples/mm/fp32-large-minibrot.mm

Image size: 1280x720

Timings:

machine	mode	time
frappuccino	de	0.973s
frappuccino	no-de	0.976s
latte	de	9.528s
latte	no-de	7.646s

GPU Step Iterations

Increasing FP___STEP_ITERS reduces running time but can make system laggy. Here are some timings for mightymandel v15-7-gb9e218e (recompiled after each change).

step	fp32	fp64	fpxx
64	0.879	9.982	392.396
128	0.753	8.313	328.182
256	0.754	7.661	306.599
512	0.749	7.469	285.568
1024	0.779	7.262	279.090
2048	0.839	7.209

All times are in seconds. Timings are average of three runs (fp32, fp64) or one run (fpxx). Timings have the average of three runs of –overhead subtracted (0.375). The real wall-clock time reported by time was used. Run-time options:

time ./src/mightymandel --one-shot --verbose fatal "${file}"

Benchmarking parameter files:

examples/mm/fp32-large-minibrot.mm
examples/mm/fp64-large-minibrot.mm
examples/mm/fpxx-large-minibrot.mm

At 512, fpxx was lagging a bit, and by 1024 lagging severely, so 2048 was not tested. After 512, fp32 starts increasing in time, presumably because the view region has a low average iteration count for exterior pixels. Benchmarks were performed on frappuccino .

With the values fixed in config.glsl, instead of using uniform variables, the table becomes:

step	fp32	fp64	fpxx
256			292.500
512	0.720
1024		6.885

See: FP32_STEP_ITERS, FP64_STEP_ITERS, FPXX_STEP_ITERS.

Slicing Comparison

Choosing the right --slice value for your available video memory is important. If it is too low, mightymandel might exceed the space and the OS would have to swap data between video memory and system memory. Too high, and mightymandel does more work than necessary coordinating the calculation process.

Command line for benchmarks below:

time ./src/mightymandel --one-shot ./examples/mm/fpxx.mm --glitch \
--geometry 1280x720 --size 7680x4320 --slice "${slice}"

slice	time	vram
0	abort
1	230.1	1944
2	70.3	972
3	57.6	729
4	67.2	668
5	83.0	653
6	error

Time is in seconds (real wall-clock time elapsed), vram is allocated video memory in MB. Some slice values failed:

abort: more than 2GB is needed in a single allocation, which overflows OpenGL signed 32bit size type and becomes negative.

error: height is not a multiple of slice factor 64

Hardware: frappuccino

Progressive Rendering Comparison

In --interactive mode, you can increase the --slice value to get a quick lofi preview image, which later refines into the final hifi image.

Command line for benchmarks below:

time ./src/mightymandel ./examples/mm/implementation-comparison.mm \
--glitch --slice "${slice}"

slice	preview	complete	one-shot
0	7.972	13.704	12.561
1	2.991	15.017	12.950
2	1.259	22.689	19.951

Time is in seconds (real wall-clock time elapsed). Preview is time taken until first pixel visibly escape, complete is time taken for image to finish rendering and one-shot is the time taken in --one-shot mode (preview and complete timings are in --interactive mode).

Hardware: frappuccino

Implementation Comparison

Hardware: frappuccino

Implementations:

impl	description
mm-de-0	mightymandel –one-shot –glitch –max-glitch 0
mm-no-de-0	mightymandel –one-shot –glitch –max-glitch 0 –no-de
mm-de	mightymandel –one-shot –glitch
mm-no-de	mightymandel –one-shot –glitch –no-de
kf-auto	Kalles Fraktaler 2.7.3 running in wine32 (16 threads on CPU)
kf-1ref	as above but with max references set to 1 (default is 69)

Parameter files:

examples/mm/implementation-comparison.mm
examples/kfr/implementation-comparison.kfr

Image size: 1280x720

Timings:

impl	time
kf-auto	35.064s
kf-1ref	4.948s
mm-de-0	15.220s
mm-no-de-0	10.758s
mm-de	13.475s
mm-no-de	9.036s

Timings before de-inversion of control:

impl	time
mm-de	15.778s
mm-no-de	11.451s

Timings before GPU Step Iterations were optimized:

impl	time
mm-de	16.672s
mm-no-de	12.417s

Timings before reference point finding was improved (no early glitch escape):

impl	time
mm-de	21.110s
mm-no-de	15.583s

Timings before reference point finding was simplified:

impl	time
mm-de	34.614s
mm-no-de	28.403s

Timings before automatic glitch correction was added (so 1-ref point only):

impl	time
mm-de	12.063s
mm-no-de	8.239s

mightymandel v16

GPU-based Mandelbrot set explorer

Hardware

frappuccino

latte

Hardware Comparison

GPU Step Iterations

Slicing Comparison

Progressive Rendering Comparison

Implementation Comparison