This allows using a much smaller (1.5 KB) lookup table, in exchange for a small amount of extra work per frame.
The extra work (a few extra loads/mul/adds) is negligible, and can execute in parallel.
The reduction in cache misses almost certainly outweighs any added cost.
The table is generated at runtime, and takes less than 0.02ms on my computer.