* [rmodels] Fix glTF skinning when joints have non-joint parent nodes
Some glTF exporters (notably wow.export, but also various other DCC pipelines) place skin joints under intermediate non-joint transform nodes that carry part of the bind-pose offset. raylib's existing LoadBoneInfoGLTF and LoadModelAnimationsGLTF only inspected a joint's immediate parent and only sampled joint-local TRS, so any transform stored on an intermediate non-joint ancestor was silently dropped, producing exploded or stretched meshes at runtime.
Two surgical changes:
LoadBoneInfoGLTF: walk the parent chain past any non-joint ancestors when looking up parentIndex, instead of comparing only against node.parent. Joints whose direct parent is a non-joint were previously treated as skeleton roots.
LoadModelAnimationsGLTF: precompute a per-joint extOffset matrix that bakes in the static TRS contribution of any intermediate non-joint nodes between the joint and its nearest joint ancestor. Apply it to each frame's joint TRS before BuildPoseFromParentJoints so the per-frame world transforms match the bind-pose world transforms (LoadGLTF already used cgltf_node_transform_world for bindPose, so this aligns the two code paths).
The replaced root-only worldTransform adjustment is a strict subset of the new per-joint extOffset machinery, so it has been removed.
Spec-compliant files (the six skeletal-skinning .glb examples shipped with raylib) render bit-identically before and after; previously broken files (e.g. wow.export's babyoctopus.gltf) now match the reference rendering from f3d, the Khronos sample viewer, and three.js.
* Resolve review: NULL-check joint offset allocation, fail fast
[rmodels] Address review feedback on glTF joint offset handling
Resolve the open review points raised on PR #5876 for the per-joint
extOffset precompute in LoadModelAnimationsGLTF.
* NULL check: validate the extOffset RL_MALLOC result before use, which
was previously missing.
* Fail-fast handling: detect the allocation failure at its source and
abort animation loading (log a warning, free resources, return NULL)
instead of propagating a NULL pointer into the per-frame loop.
* Brace formatting: expand the collapsed one-line joint-match check
(if (...) { isJoint = true; break; }) into raylib's standard Allman
brace style.
* Readability: break the deeply nested MatrixMultiply(MatrixMultiply(...))
into named nodeScale/nodeRotation/nodeTranslation/nodeTransform locals,
mirroring the existing S/R/T composition later in the function.
* Spacing/line breaks: add blank lines within the precompute block to
match the surrounding code style.
There isn't anything in 3.25 that we actually need, so we can reduce the minimum requirement. 3.22 is commonly available across many package managers.
For a list of the changes in `FetchContent`, see:
https://cmake.org/cmake/help/latest/module/FetchContent.html
* fix triangle and quad spans applying pixels out of bounds
* remove off by one errors on x/y LoopMax
* apply the RASTER_QUAD offset at the loop start so it increments correctly
* fix missing endif
* remove include guard to allow dyMin usage
* early exit if nothing to draw on a span
* incorporate dxStart into xSubstep to make xOffset calculate a single time
* remove ghost comment
* early exit for quads, with a float cast on the left and top distance calculation
* remove duplicate xLoopEnd
* Improve GetClipboardImage implementation under X11
* Remove code for creating new connection, handle selection in GLFW connection instead.
* `GetClipboardImage()`: Small fix to remove unnecessary boolean
* port glfw's behaviour on size 0 window creation to rgfw
* updates with change suggestions
* don't do the FLAG_FULLSCREEN_MODE check twice for no reason
---------
Co-authored-by: CrackedPixel <5776225+CrackedPixel@users.noreply.github.com>
The KEYBOARD-source veto added in #5439 drops face-button key
events that arrive with both AINPUT_SOURCE_KEYBOARD and
AINPUT_SOURCE_GAMEPAD set on the source bitmask. Confirmed
reproducible on GameSir X2 Type-C and 8BitDo Ultimate Bluetooth,
both reporting source 0x501 on every face-button key event.
This source-bit pattern is general AOSP behaviour since Android
3.2 (commit 6f2fba4 in frameworks/base, Feb 2011): EventHub adds
InputDeviceClass::KEYBOARD to any device whose evdev keyBitmask
claims gamepad buttons (BTN_JOYSTICK..BTN_DIGI), and
KeyboardInputMapper::getEventSource stamps the resulting
KEYBOARD|GAMEPAD source on every outgoing key event.
Use AndroidTranslateGamepadButton(keycode) as the discriminator
instead. Recognised gamepad keycodes route to the gamepad path;
unknown keycodes fall through to the keyboard handler.
Assisted-by: Claude:claude-opus-4-7
The example audio_sound_multi was leaking memory every single time the spacebar was pressed.
```c
Direct leak of 576 byte(s) in 9 object(s) allocated from:
#0 0x758a41019447 in calloc (/usr/lib/liblsan.so.0+0x19447) (BuildId: 8ee115309adc591d231c961c43d245cfa68d9aa7)
#1 0x562dfbd2c4f3 in LoadAudioBuffer (/home/peter/raylib/examples/audio/audio_sound_multi+0xfa4f3) (BuildId: ea2a6f45d724abeccf904143a32012266f259f93)
```
This patch fixes that leak.
CMake now checks if -latomic is required for 64-bit atomics, and links
it if it's required. Miniaudio is the only thing in raylib that needs
it, so it's put behind SUPPORT_MODULE_RAUDIO.
Parse SUPPORT_ defines from src/config.h by their actual 0/1 values so CUSTOMIZE_BUILD exposes the correct defaults. Apply INCLUDE_EVERYTHING explicitly when registering dependent options.
* fix warnings: goto label not used outside of SW_ENABLE_DEPTH_TEST
* comment out x coordinates that aren't used in SW_RASTER_TRIANGLE
* silence warnings: unused DrmModeConnector functions in rcore_drm.c when using GRAPHICS_API_OPENGL_SOFTWARE
* [rlsw] Add sw_rcp helper using Xtensa recip0.s for hot-path divisions
Adds a `sw_rcp(x)` inline reciprocal that on Xtensa (ESP32 / ESP32-S3
LX6/LX7) emits a `recip0.s` seed plus two Newton-Raphson refinement
steps -- 1-ULP accurate in ~7 instructions, all in FPU registers.
On every other target it expands to plain `1.0f/x`, so generated code
is byte-identical to before for non-Xtensa builds.
Replaces the hot-path `1.0f/x` calls that were previously compiling to
the `__divsf3` software helper on Xtensa:
- perspective divide (1/w) in triangle clip-and-project (PCT and PC paths)
- line and point clip-and-project NDC conversion
- triangle span setup: dxRcp, blockLenRcp, wRcpA, wRcpB
- triangle scanline setup: h02Rcp, h01Rcp, h12Rcp
- axis-aligned quad: wRcp, hRcp
- line rasterizer: stepRcp
Other `1.0f/x` uses (matrix translate/normalize, texture init `tx`/`ty`,
sw_matrix_rotate inverse-length) are not on the per-pixel hot path and
are left untouched.
Measured on ESP32-S3 @ 240 MHz, R5G6B5 240x240, textured 3D model:
contributes to a ~10-15% rasterization speedup.
Made-with: Cursor
* [rlsw] Use ESP-DSP for 4x4 matrix multiply and per-vertex MVP transform
Adds an opt-in ESP-DSP code path for ESP32 / ESP32-S3 builds. ESP-DSP is
ESP-IDF's official optimized math library and ships hand-vectorized
kernels that beat the scalar implementations on Xtensa.
Two integration points:
1. `sw_matrix_mul_rst` -> `dspm_mult_4x4x4_f32` for any 4x4*4x4 multiply
(used for MVP build, gluLookAt, push/multiply, etc.). rlsw stores
matrices column-major and ESP-DSP reads row-major; the comment on the
call site explains why the flat-buffer call still produces the
correct column-major product (transpose-of-transposes equivalence).
2. `sw_immediate_push_vertex` -> `dspm_mult_4x4x1_f32` for the per-vertex
clip-space transform. Because ESP-DSP expects a row-major matrix in
this case, a row-major copy `matMVP_rm[16]` is maintained alongside
`matMVP` and refreshed once per `isDirtyMVP` rebuild in
`sw_immediate_begin`. Cost is 16 scalar copies per matrix update,
amortized over thousands of vertices per frame.
Detection is **opt-in** via `SW_USE_ESP_DSP` so existing ESP-IDF projects
that don't depend on the `esp-dsp` component keep building unchanged.
A user enables it from CMakeLists.txt (or anywhere before including
rlgl.h):
target_compile_definitions(${COMPONENT_LIB} PRIVATE SW_USE_ESP_DSP=1)
and adds the dependency to `idf_component.yml`:
espressif/esp-dsp: "^1.4.0"
Measured on ESP32-S3 @ 240 MHz, R5G6B5 240x240, textured 3D model:
contributes meaningfully to the overall frame-time improvement
(combined with sw_rcp).
Made-with: Cursor
* [rlsw] Add SW_TEXTURE_REPEAT_POT_FAST opt-in for POT bitmask wrap
Adds an opt-in compile-time flag that replaces the SW_REPEAT wrap chain
with a bitmask (`x & (size-1)`) for power-of-two textures. NPOT textures
keep using the original `sw_fract` / signed-modulo paths via a runtime
`(size & (size-1)) == 0` check, so SW_REPEAT remains correct for them.
Affects two samplers:
- `sw_texture_sample_nearest`: drops the `floorf` + multiply + cast for
POT textures in REPEAT mode (saves a software call on Xtensa).
- `sw_texture_sample_linear`: replaces the `(x % w + w) % w` two-step
modulo (a software divide on Xtensa) with a single bitwise AND for
POT textures in REPEAT mode. Two's-complement int wrap covers
negative coordinates correctly.
Off by default: for POT textures sampled with negative UVs, bitmask wrap
can differ from `sw_fract` wrap by one texel at the boundary. That is
imperceptible at typical resolutions but technically a behavior change,
so existing users get bit-for-bit identical output. Opt in if you
control your asset UVs and want the speedup:
#define SW_TEXTURE_REPEAT_POT_FAST
This addresses the long-standing TODO comment "If the textures are POT,
avoid the division for SW_REPEAT" in `sw_texture_sample_linear`.
Made-with: Cursor