Currently without this scoped event names are not displaying correctly when auto-tracing is enabled.
The buffer_destroy event, obviously, fails to be completed (as theres no buffer to write the end event to, and context_destroy should happen after all the buffers are destroyed so there's, again, no buffers to write to.
[nightly] docker container for linux arm
[nightly] removed setup-alpine for arm
[nightly] docker container for all linux + linux arm dep for upload
[nightly] x86 remove arm reference
[nightly] final fixes
- Fixes the code so SSEUp is grouped/skipped over properly (Fixes#5429)
- Fixes f16 vectors using garbage widths, because it would call
LLVMGetIntTypeWidth and an f16 is not an int so doesn't have that
function
This makes a tremendous (2x with SSE2, 3x with AVX2) difference on big
datasets on my system, but this may be hardware-dependent (e.g.
instruction cache sizes).
Naturally, this also results in somewhat larger code for the large-data
case (~75% larger).
This includes various minor things that didn't seem right or could be
improved, including:
- XXH3_state is documented to have a strict alignment requirement of 64
bytes, and thus came with a disclaimer not to use `new` because it
wouldn't be aligned correctly. It now has an `#align(64)` so that it
will.
- An _internal proc being marked #force_no_inline (every other one is
#force_inline)
- Unnecessarily casting the product of two u32s through u128 (and
ultimately truncating to u64 anyway)
This uses compile-time features to decide how large of a SIMD vector to
use. It currently has checks for amd64/i386 to size its vectors for
SSE2/AVX2/AVX512 as necessary.
The generalized SIMD functions could also be useful for multiversioning
of the hash procs, to allow for run-time dispatch based on available CPU
features.