* SVE2 was actually disabled in fdfbbce, this issue is fixed
- The macro __ARM_FEATURE_SVE is only defined when the compilation target is set as -march=armv8-m+sve2
* Improves 8888 alpha-blending performance
- Now, in In-Order AArch64 processors, e.g. A520, SVE2 is better than NEON with the 128bit vector width
- For Out-of-order processors, NEON is still better than SVE2 (We could improve this in the future), the performance is improved from 3.0 to 3.6.
* The 8888 -> RGB565 performance is also improved (from 7.4 to 9.3)
This readds the "opengles" renderer, updated from SDL2 to work on SDL3.
Fixes#15661.
Co-authored-by: Ryan C. Gordon <icculus@icculus.org>
Co-authored-by: Cameron Cawley <ccawley2011@gmail.com>
Previously default audio on Apple platforms would duck other audio streams. This is unexpected, so by default we won't do that and you can use the hint SDL_HINT_AUDIO_DUCK_OTHERS to re-enable that behavior.
SVE/SVE2 is a new SIMD extension for AArch64. Compared to NEON, SVE/SVE2 brings the following benefits that are good for SDL projects:
- Lane prediction: we don't have to treat the tail part of a stride separately when the width is n times the hardware vector size
- Although the performance is almost no difference from NEON when the hardware vector size is 128bits, when the hardware provides a longer vector size, e.g. 256, 512, ... 2048, we can enjoy the large performance gain without modifying the source code or recompiling a library.
The functional correctness is validated in a dedicated [qemu project](https://github.com/GorgonMeducer/aarch64_qemu_mac_template/tree/SDL-SVE2-Acceleration-Validation).
The performance is tested on [Radxa Orion 6 N](https://radxa.com/products/orion/o6n/), which provides 4x A720 and 4x A520 processors. Since the vector size is 128 bits, which is the same as NEON, the performance is almost the same (or no worse than) the NEON acceleration.
Under the right conditions, this extension can result is smoother resizing when rendering with OpenGL, however, it is known to cause problems in certain cases, such as when handling presentation externally.
Gate it behind a hint, and disable it by default. Developers can selectively enable it when they verify that they meet the criteria for using it, and that it behaves correctly in their apps/games.
On some platforms, SDL_MemoryBarrierRelease() is defined to
SDL_CompilerBarrier(). If SDL_CompilerBarrier() is also defined to
the fallback spinlock acquire/release, then we will infinitely
recurse in SDL_UnlockSpinlock(). Avoid this by not unlocking the
temporary spinlock we create.