
Benchmark results for basic vector support using scalar loop at various levels of unrolling
| Parent ↗ |
Summary
All math functions with vector support were implemented via scalar loops, at three levels of unrolling (1, i.e. none, 2, and 4).
- Not unrolled was generally worst.
- 4-unrolled was generally best, and where not, it was not worse than without unrolling.
- 2-unrolled was generally between the two above.
It is noted that unrolling generally did not help for the more
complicated functions, like sin, cos, etc. Simple math on the
other hand showed clear differences with the advantage to the
4-unrolled loops.
It is suspected that the more complicated functions could/were not inlined into the loop, causing the unrolled loops to not take proper advantage of super-scalar execution, only of reduced loop overhead. Which was not strong enough to show up in the numbers.
Chosen to keep the 4-unrolled loops across the board, in the hope of future compiler changes allowing for inlining and optimization.