This makes a tremendous (2x with SSE2, 3x with AVX2) difference on big
datasets on my system, but this may be hardware-dependent (e.g.
instruction cache sizes).
Naturally, this also results in somewhat larger code for the large-data
case (~75% larger).
This includes various minor things that didn't seem right or could be
improved, including:
- XXH3_state is documented to have a strict alignment requirement of 64
bytes, and thus came with a disclaimer not to use `new` because it
wouldn't be aligned correctly. It now has an `#align(64)` so that it
will.
- An _internal proc being marked #force_no_inline (every other one is
#force_inline)
- Unnecessarily casting the product of two u32s through u128 (and
ultimately truncating to u64 anyway)
This uses compile-time features to decide how large of a SIMD vector to
use. It currently has checks for amd64/i386 to size its vectors for
SSE2/AVX2/AVX512 as necessary.
The generalized SIMD functions could also be useful for multiversioning
of the hash procs, to allow for run-time dispatch based on available CPU
features.
Randomize size used with `update`.
It'll print "Using user-selected seed {18109872483301276539,2000259725719371} for update size randomness."
If a streaming test then fails, you can repeat it using:
`odin run . -define:RAND_STATE=18109872483301276539 -define:RAND_INC=2000259725719371`
Test XXH32, XXH64, XXH3-64 and XXH3-128 for large inputs, with both all-at-once and streaming APIs.
XXH32_create_state and XXH64_create_state now implicitly call their "reset state" variants to simplify the streaming API to 3 steps:
- create state / defer destroy
- update
- digest (finalize)
These are tested with an array of 1, 2, 4, 8 and 16 megabytes worth of zeroes.
All return the same hashes as do both the one-shot version, as well as that of the official xxhsum tool.
3778/3778 tests successful.