The last address of "data" is not "cast(uintptr)raw_data(data)+cast(uintptr)size" but
"cast(uintptr)raw_data(data)+cast(uintptr)(size-1)".
The original assert would fail when for example the allocation size requested and the buddy allocator allignment were both 64.
Currently without this scoped event names are not displaying correctly when auto-tracing is enabled.
The buffer_destroy event, obviously, fails to be completed (as theres no buffer to write the end event to, and context_destroy should happen after all the buffers are destroyed so there's, again, no buffers to write to.
This makes a tremendous (2x with SSE2, 3x with AVX2) difference on big
datasets on my system, but this may be hardware-dependent (e.g.
instruction cache sizes).
Naturally, this also results in somewhat larger code for the large-data
case (~75% larger).
This includes various minor things that didn't seem right or could be
improved, including:
- XXH3_state is documented to have a strict alignment requirement of 64
bytes, and thus came with a disclaimer not to use `new` because it
wouldn't be aligned correctly. It now has an `#align(64)` so that it
will.
- An _internal proc being marked #force_no_inline (every other one is
#force_inline)
- Unnecessarily casting the product of two u32s through u128 (and
ultimately truncating to u64 anyway)
This uses compile-time features to decide how large of a SIMD vector to
use. It currently has checks for amd64/i386 to size its vectors for
SSE2/AVX2/AVX512 as necessary.
The generalized SIMD functions could also be useful for multiversioning
of the hash procs, to allow for run-time dispatch based on available CPU
features.
Added support for NSBitmapImageRep class.
Added ability to set contents to a CALayer.
I needed these to support a port of Handmade Hero, but they are of general use.