See also Results for overall test suite timings.
CPU:
cat /proc/cpuinfo | grep model\ name
RAM:
free -h | grep ^Mem: | ( read header total rest && echo "${total}" )
GPU:
lspci | grep VGA
DRV:
glxinfo | grep OpenGL\ version
VRAM: check your hardware documentation or driver-specfic tools (mightymandel prints out the available memory on startup with --verbose info
, which might be a clue)
CPU: AMD Athlon(tm) II X4 640 Processor (4 cores at 3.3GHz)
RAM: 8GB
GPU: NVIDIA Corporation GF116 [GeForce GTX 550 Ti]
DRV: OpenGL 4.4.0 NVIDIA 340.65
VRAM: 1GB
CPU: Intel(R) Core(TM)2 Duo CPU P7550 (2 cores at 2.26GHz)
RAM: 4GB
GPU: NVIDIA Corporation G98M [GeForce G 105M]
DRV: OpenGL 3.3.0 NVIDIA 340.65
Parameter file:
examples/mm/fp32-large-minibrot.mm
Image size: 1280x720
Timings:
machine | mode | time |
---|---|---|
frappuccino | de | 0.973s |
frappuccino | no-de | 0.976s |
latte | de | 9.528s |
latte | no-de | 7.646s |
Increasing FP___STEP_ITERS
reduces running time but can make system laggy. Here are some timings for mightymandel v15-7-gb9e218e (recompiled after each change).
step | fp32 | fp64 | fpxx |
---|---|---|---|
64 | 0.879 | 9.982 | 392.396 |
128 | 0.753 | 8.313 | 328.182 |
256 | 0.754 | 7.661 | 306.599 |
512 | 0.749 | 7.469 | 285.568 |
1024 | 0.779 | 7.262 | 279.090 |
2048 | 0.839 | 7.209 |
All times are in seconds. Timings are average of three runs (fp32, fp64) or one run (fpxx). Timings have the average of three runs of –overhead subtracted (0.375). The real wall-clock time reported by time was used. Run-time options:
time ./src/mightymandel --one-shot --verbose fatal "${file}"
Benchmarking parameter files:
examples/mm/fp32-large-minibrot.mm examples/mm/fp64-large-minibrot.mm examples/mm/fpxx-large-minibrot.mm
At 512, fpxx was lagging a bit, and by 1024 lagging severely, so 2048 was not tested. After 512, fp32 starts increasing in time, presumably because the view region has a low average iteration count for exterior pixels. Benchmarks were performed on frappuccino .
With the values fixed in config.glsl
, instead of using uniform variables, the table becomes:
step | fp32 | fp64 | fpxx |
---|---|---|---|
256 | 292.500 | ||
512 | 0.720 | ||
1024 | 6.885 |
See: FP32_STEP_ITERS
, FP64_STEP_ITERS
, FPXX_STEP_ITERS
.
Choosing the right --slice
value for your available video memory is important. If it is too low, mightymandel might exceed the space and the OS would have to swap data between video memory and system memory. Too high, and mightymandel does more work than necessary coordinating the calculation process.
Command line for benchmarks below:
time ./src/mightymandel --one-shot ./examples/mm/fpxx.mm --glitch \ --geometry 1280x720 --size 7680x4320 --slice "${slice}"
slice | time | vram |
---|---|---|
0 | abort | |
1 | 230.1 | 1944 |
2 | 70.3 | 972 |
3 | 57.6 | 729 |
4 | 67.2 | 668 |
5 | 83.0 | 653 |
6 | error |
Time is in seconds (real wall-clock time elapsed), vram is allocated video memory in MB. Some slice values failed:
Hardware: frappuccino
In --interactive
mode, you can increase the --slice
value to get a quick lofi preview image, which later refines into the final hifi image.
Command line for benchmarks below:
time ./src/mightymandel ./examples/mm/implementation-comparison.mm \ --glitch --slice "${slice}"
slice | preview | complete | one-shot |
---|---|---|---|
0 | 7.972 | 13.704 | 12.561 |
1 | 2.991 | 15.017 | 12.950 |
2 | 1.259 | 22.689 | 19.951 |
Time is in seconds (real wall-clock time elapsed). Preview is time taken until first pixel visibly escape, complete is time taken for image to finish rendering and one-shot is the time taken in --one-shot
mode (preview and complete timings are in --interactive
mode).
Hardware: frappuccino
Hardware: frappuccino
Implementations:
impl | description |
---|---|
mm-de-0 | mightymandel –one-shot –glitch –max-glitch 0 |
mm-no-de-0 | mightymandel –one-shot –glitch –max-glitch 0 –no-de |
mm-de | mightymandel –one-shot –glitch |
mm-no-de | mightymandel –one-shot –glitch –no-de |
kf-auto | Kalles Fraktaler 2.7.3 running in wine32 (16 threads on CPU) |
kf-1ref | as above but with max references set to 1 (default is 69) |
Parameter files:
examples/mm/implementation-comparison.mm examples/kfr/implementation-comparison.kfr
Image size: 1280x720
Timings:
impl | time |
---|---|
kf-auto | 35.064s |
kf-1ref | 4.948s |
mm-de-0 | 15.220s |
mm-no-de-0 | 10.758s |
mm-de | 13.475s |
mm-no-de | 9.036s |
Timings before de-inversion of control:
impl | time |
---|---|
mm-de | 15.778s |
mm-no-de | 11.451s |
Timings before GPU Step Iterations were optimized:
impl | time |
---|---|
mm-de | 16.672s |
mm-no-de | 12.417s |
Timings before reference point finding was improved (no early glitch escape):
impl | time |
---|---|
mm-de | 21.110s |
mm-no-de | 15.583s |
Timings before reference point finding was simplified:
impl | time |
---|---|
mm-de | 34.614s |
mm-no-de | 28.403s |
Timings before automatic glitch correction was added (so 1-ref point only):
impl | time |
---|---|
mm-de | 12.063s |
mm-no-de | 8.239s |