[rlsw] Micro-optimizations, tighter pipeline and cleanup (#5673)

* auto generates all combinations of blending factors
This adds a macro system that generate a function for each possible combination of blending factors, resulting in 11*11 functions, hence 121.
This then allows for only one indirection and function call instead of two previously (assuming the first call was inlined).

* rename dispatch tables for consistency

* change blend funcs validity check
Simplifies the validation of blend functions.
Can allow `SW_SRC_ALPHA_SATURATE` as dst factor, but hey

* disables blending when it requires alpha and there is none

* review immediate rendering functions and attribute layout

* prevent state changes during immediate record

* reduce number of op for each vertex push + review primitive struct

* simplified draw functions

* review `sw_vertex_t`
removes `float screen[2]`; each step stores the transformed coordinates in `float coord[4]`.
This also simplifies vertex interpolation during triangle rasterization.

* reduces unnecessary interpolation costs during triangle rasterization + cleanup

* extends the simd color conversion to more cases

* affine interpolation per blocks

* long side check for each triangle line
My mistake in a previous commit

* style tweaks

* select the read function on texture load
This removes the per-pixel switch; it's slightly more efficient on my hardware, but probably a poor prediction
Should remain profitable or at worst the same

* use optionnal LUT for uint8_t -> float conversion

* sets internal the number of vertices post-clipping and the epsilon clipping + a little cleanup

* moves color conversion to math part

* prevents sampling if it's a depth texture that is bound
This commit is contained in:
Le Juez Victor
2026-03-19 23:39:52 +01:00
committed by GitHub
parent 04c5dc4493
commit 7ae46fb2e6

2349
src/external/rlsw.h vendored

File diff suppressed because it is too large Load Diff