- Deprecate the u64/u32 implementation so we can use fewer nails and have an easier time of maintaining and optimizing the package going forward. The remaining implementation still works on 32-bit targets, it's just a smidge less efficient.
- Use only 1 nail instead of 4. The tests now run 3.5% faster as a result.
Future optimizations may including using fully packed backing (no nails) using `intrinsics.overflow_*` to handle borrow and carry safely.