Parse SUPPORT_ defines from src/config.h by their actual 0/1 values so CUSTOMIZE_BUILD exposes the correct defaults. Apply INCLUDE_EVERYTHING explicitly when registering dependent options.
* fix warnings: goto label not used outside of SW_ENABLE_DEPTH_TEST
* comment out x coordinates that aren't used in SW_RASTER_TRIANGLE
* silence warnings: unused DrmModeConnector functions in rcore_drm.c when using GRAPHICS_API_OPENGL_SOFTWARE
* [rlsw] Add sw_rcp helper using Xtensa recip0.s for hot-path divisions
Adds a `sw_rcp(x)` inline reciprocal that on Xtensa (ESP32 / ESP32-S3
LX6/LX7) emits a `recip0.s` seed plus two Newton-Raphson refinement
steps -- 1-ULP accurate in ~7 instructions, all in FPU registers.
On every other target it expands to plain `1.0f/x`, so generated code
is byte-identical to before for non-Xtensa builds.
Replaces the hot-path `1.0f/x` calls that were previously compiling to
the `__divsf3` software helper on Xtensa:
- perspective divide (1/w) in triangle clip-and-project (PCT and PC paths)
- line and point clip-and-project NDC conversion
- triangle span setup: dxRcp, blockLenRcp, wRcpA, wRcpB
- triangle scanline setup: h02Rcp, h01Rcp, h12Rcp
- axis-aligned quad: wRcp, hRcp
- line rasterizer: stepRcp
Other `1.0f/x` uses (matrix translate/normalize, texture init `tx`/`ty`,
sw_matrix_rotate inverse-length) are not on the per-pixel hot path and
are left untouched.
Measured on ESP32-S3 @ 240 MHz, R5G6B5 240x240, textured 3D model:
contributes to a ~10-15% rasterization speedup.
Made-with: Cursor
* [rlsw] Use ESP-DSP for 4x4 matrix multiply and per-vertex MVP transform
Adds an opt-in ESP-DSP code path for ESP32 / ESP32-S3 builds. ESP-DSP is
ESP-IDF's official optimized math library and ships hand-vectorized
kernels that beat the scalar implementations on Xtensa.
Two integration points:
1. `sw_matrix_mul_rst` -> `dspm_mult_4x4x4_f32` for any 4x4*4x4 multiply
(used for MVP build, gluLookAt, push/multiply, etc.). rlsw stores
matrices column-major and ESP-DSP reads row-major; the comment on the
call site explains why the flat-buffer call still produces the
correct column-major product (transpose-of-transposes equivalence).
2. `sw_immediate_push_vertex` -> `dspm_mult_4x4x1_f32` for the per-vertex
clip-space transform. Because ESP-DSP expects a row-major matrix in
this case, a row-major copy `matMVP_rm[16]` is maintained alongside
`matMVP` and refreshed once per `isDirtyMVP` rebuild in
`sw_immediate_begin`. Cost is 16 scalar copies per matrix update,
amortized over thousands of vertices per frame.
Detection is **opt-in** via `SW_USE_ESP_DSP` so existing ESP-IDF projects
that don't depend on the `esp-dsp` component keep building unchanged.
A user enables it from CMakeLists.txt (or anywhere before including
rlgl.h):
target_compile_definitions(${COMPONENT_LIB} PRIVATE SW_USE_ESP_DSP=1)
and adds the dependency to `idf_component.yml`:
espressif/esp-dsp: "^1.4.0"
Measured on ESP32-S3 @ 240 MHz, R5G6B5 240x240, textured 3D model:
contributes meaningfully to the overall frame-time improvement
(combined with sw_rcp).
Made-with: Cursor
* [rlsw] Add SW_TEXTURE_REPEAT_POT_FAST opt-in for POT bitmask wrap
Adds an opt-in compile-time flag that replaces the SW_REPEAT wrap chain
with a bitmask (`x & (size-1)`) for power-of-two textures. NPOT textures
keep using the original `sw_fract` / signed-modulo paths via a runtime
`(size & (size-1)) == 0` check, so SW_REPEAT remains correct for them.
Affects two samplers:
- `sw_texture_sample_nearest`: drops the `floorf` + multiply + cast for
POT textures in REPEAT mode (saves a software call on Xtensa).
- `sw_texture_sample_linear`: replaces the `(x % w + w) % w` two-step
modulo (a software divide on Xtensa) with a single bitwise AND for
POT textures in REPEAT mode. Two's-complement int wrap covers
negative coordinates correctly.
Off by default: for POT textures sampled with negative UVs, bitmask wrap
can differ from `sw_fract` wrap by one texel at the boundary. That is
imperceptible at typical resolutions but technically a behavior change,
so existing users get bit-for-bit identical output. Opt in if you
control your asset UVs and want the speedup:
#define SW_TEXTURE_REPEAT_POT_FAST
This addresses the long-standing TODO comment "If the textures are POT,
avoid the division for SW_REPEAT" in `sw_texture_sample_linear`.
Made-with: Cursor
* Adds a missing macro definition check
* Shifts __cplusplus check to a higher level for bool define
* Copy same change to rgestures.h and rlgl.h
* Fixes float comparison issues in raymath functions
* Revert "Fixes float comparison issues in raymath functions"
This reverts commit a266d0bbaa.
* Include resource preloads when building wasm examples with Zig.
List of resources derived from examples/Makefile.Web
* Move resource list to zon file to reduce build.zig bloat