This makes a tremendous (2x with SSE2, 3x with AVX2) difference on big
datasets on my system, but this may be hardware-dependent (e.g.
instruction cache sizes).
Naturally, this also results in somewhat larger code for the large-data
case (~75% larger).
This includes various minor things that didn't seem right or could be
improved, including:
- XXH3_state is documented to have a strict alignment requirement of 64
bytes, and thus came with a disclaimer not to use `new` because it
wouldn't be aligned correctly. It now has an `#align(64)` so that it
will.
- An _internal proc being marked #force_no_inline (every other one is
#force_inline)
- Unnecessarily casting the product of two u32s through u128 (and
ultimately truncating to u64 anyway)
This uses compile-time features to decide how large of a SIMD vector to
use. It currently has checks for amd64/i386 to size its vectors for
SSE2/AVX2/AVX512 as necessary.
The generalized SIMD functions could also be useful for multiversioning
of the hash procs, to allow for run-time dispatch based on available CPU
features.
Added support for NSBitmapImageRep class.
Added ability to set contents to a CALayer.
I needed these to support a port of Handmade Hero, but they are of general use.
The fix was adding `is_constant = false;`
I also removed the unnecessary check regarding the first element of the
BitSet, since it's checked inside the loop, and also fixed a typo in the
message.
sdl.OpenAudioDevice was incorrectly using a bool instead of a c.int for it's last parameter. To make the proc call more idiomatic and inline with other bindings a new bit_set was introduced to be used in place of the constants