Video games released later in a console's life cycle often look better is an urban statement frequently encountered on the internet. It is based on the deduction that if visual fidelity improves between successives releases on a platform with fixed specifications, it must be primarily because of the increasing developer familiarity with the platform.
While reductive statements of such sort are often misinformation at worst and of limited merit at best, it is indisputable that there are marked advantages for developing on a platform that is known.
For example, an often used memory optimization technique is cleverly partioning data into blocks that, best case, fit into cache. The resulting lower latency of accessing these blocks allow for a much more
performant computation. If one asks me to learn these techniques, I always refer to the Basic Linear Algebra Subroutines, specifically dense matrix-matrix multiplication dgemm
.
To obtain peak performance with most blocked algorithms an optimal block size must be found which is platform dependent. In C++ integer templates are a useful abstraction to generate presets of block sizes at compile time:
template<size_t BLOCK_SIZE>
float reduction(size_t n, float* a){
//partition a into blocks of size BLOCK_SIZE
//do computations
}
From a programmer's perspective, BLOCK_SIZE
cannot always be freely chosen. There may be the restriction that BLOCK_SIZE
is evenly divisible by two. In the past, such important nuances
where usually signaled via comments or branching statements. C++20 constraints and concepts allow to do this much cleaner and more syntactically elegant:
template<size_t X>
concept IsPositivePowerOfTwo = std::is_integral<decltype(X)>::value && ((X & (X-1)) == 0) && (X != 0);
template<size_t BLOCK_SIZE> requires IsPositivePowerOfTwo<BLOCK_SIZE>
float reduction(size_t n, float* a){
//partition a into blocks of size BLOCK_SIZE
//do computations
}