Fixes#5128
p->builder is created in lb_begin_procedure_body, but that isn't called
if there is no body, and we were still calling dispose at that point.
Moved it into lb_end_procedure_body to match.
The _set_env procedure in core/os/os2/env_posix.odin was
incorrectly cloning the 'key' argument for 'cval' instead of
the 'value' argument. This resulted in set_env effectively
setting the environment variable's value to its own key.
This commit corrects the typo to use the 'value' argument.
The new reduce_add/reduce_mul procs perform the corresponding arithmetic
reduction in different orders than sequential order. These alternative
orders can often offer better SIMD hardware utilization.
Two different orders are added: pair-wise (operating on pairs of
adjacent elements) or bisection-wise (operating element-wise on the
first and last N/2 elements of the vector).
Previously, it implied that these are different types:
```
W:/Scratch/scratch.odin(17:5) Error: Cannot compare expression, operator '==' not defined between the types 'Handle_Map($T=u32, $HT=u32, $Max=10000)' and 'Handle_Map($T=u32, $HT=u32, $Max=10000)'
if m == {} {
^~~~~~^
```
Now:
```
W:/Scratch/scratch.odin(20:5) Error: Cannot compare expression. Type 'Handle_Map($T=u32, $HT=u32, $Max=10000)' is not simply comparable, so operator '==' is not defined for it.
if m == {} {
^~~~~~^
```
The indices proc simply creates a vector where each lane contains its
own lane index. This can be useful for use in generating masks for loads
and stores at the beginning/end of slices, among other things.
The new reduce_add/reduce_mul procs perform the corresponding arithmetic
reduction, in different orders than just "in sequential order". These
alternative orders can often be faster to calculate, as they can offer
better SIMD hardware utilization.
Two different orders are added for these: pair-wise (operating on
adjacent pairs of elements) or split-wise (operating element-wise on the
two halves of the vector).
This doesn't actually cover the *fastest* way for arbitrarily-sized
vectors. That would be an ordered reduction across the native vector
width, then reducing the resulting vector to a scalar in an appropriate
parallel fashion. I'd created an implementation of that, but it required
multiple procs and a fair bit more trickery than I was comfortable with
submitting to `core`, so it's not included yet. Maybe in the future.