Added SSE support for Quaternion operations (#97 ) (#98 )

* Added SSE support for Quaternion operations (#97) * Added SSE support for Quaternion operations O2 | Function | SSE | NO SSE | ==================================================== | Inverse | 163 (0.89s) | 165 (1.89s) | | NLerp | 330 (1.70s) | 330 (1.75s) | | Normalize | 169 (1.03s) | 169 (1.06s) | | Dot | 22 (1.15s) | 23 (1.14s) | | DivF | 23 (0.72s) | 23 (0.82s) | | MulF | 22 (0.75s) | 22 (0.79s) | | Mul | 24 (1.14s) | 23 (1.24s) | | Sub | 23 (1.17s) | 37 (1.20s) | | Add | 23 (1.20s) | 24 (1.19s) | O0 | Function | SSE | NO SSE | ==================================================== | Inverse | 394 (1.62s) | 430 (3.05s) | | NLerp | 694 (2.71s) | 1035(4.81s) | | Normalize | 374 (1.58s) | 412 (2.95s) | | Dot | 81 (1.83s) | 23 (2.50s) | | DivF | 61 (1.12s) | 25 (2.37s) | | MulF | 58 (1.09s) | 23 (2.31s) | | Mul | 94 (1.97s) | 42 (2.88s) | | Sub | 75 (1.83s) | 23 (2.82s) | | Add | 75 (1.81s) | 23 (2.81s) | * Fixed quaternion multiplication Old quaternion multiplication had a bug, this is a different approach. * Added release notes and version for 1.9.0
Add fast vector normalization (#94 )
2025-12-29 16:04:33 +00:00 · 2019-03-11 13:12:48 -05:00 · 2018-11-29 22:02:41 -08:00 · 2018-11-29 13:21:05 -06:00 · 2018-11-29 09:32:12 -08:00 · 2018-08-17 11:02:44 -07:00
6 changed files with 352 additions and 142 deletions
--- a/HandmadeMath.h
+++ b/HandmadeMath.h
@@ -1,5 +1,5 @@
 /*
-  HandmadeMath.h v1.5.0
+  HandmadeMath.h v1.9.0
  
  This is a single header file with a bunch of useful functions for game and
  graphics math operations.
@@ -66,108 +66,6 @@
  
  =============================================================================

-  Version History:
-      0.2 (*) Updated documentation
-          (*) Better C compliance
-          (*) Prefix all handmade math functions 
-          (*) Better operator overloading
-      0.2a
-          (*) Prefixed Macros
-      0.2b
-          (*) Disabled warning 4201 on MSVC as it is legal is C11
-          (*) Removed the f at the end of HMM_PI to get 64bit precision
-      0.3
-          (*) Added +=, -=, *=, /= for hmm_vec2, hmm_vec3, hmm_vec4
-      0.4
-          (*) SSE Optimized HMM_SqrtF
-          (*) SSE Optimized HMM_RSqrtF
-          (*) Removed CRT
-      0.5
-          (*) Added scalar multiplication and division for vectors
-              and matrices
-          (*) Added matrix subtraction and += for hmm_mat4
-          (*) Reconciled all headers and implementations
-          (*) Tidied up, and filled in a few missing operators
-      0.5.1
-          (*) Ensured column-major order for matrices throughout
-          (*) Fixed HMM_Translate producing row-major matrices
-      0.5.2
-          (*) Fixed SSE code in HMM_SqrtF
-          (*) Fixed SSE code in HMM_RSqrtF
-      0.6
-          (*) Added Unit testing
-          (*) Made HMM_Power faster
-          (*) Fixed possible efficiency problem with HMM_Normalize 
-          (*) RENAMED HMM_LengthSquareRoot to HMM_LengthSquared
-          (*) RENAMED HMM_RSqrtF to HMM_RSquareRootF
-          (*) RENAMED HMM_SqrtF to HMM_SquareRootF
-          (*) REMOVED Inner function (user should use Dot now)
-          (*) REMOVED HMM_FastInverseSquareRoot function declaration
-      0.7 
-          (*) REMOVED HMM_LengthSquared in HANDMADE_MATH_IMPLEMENTATION (should
-              use HMM_LengthSquaredVec3, or HANDMADE_MATH_CPP_MODE for function
-              overloaded version)
-          (*) REMOVED HMM_Length in HANDMADE_MATH_IMPLEMENTATION (should use
-              HMM_LengthVec3, HANDMADE_MATH_CPP_MODE for function
-              overloaded version)
-          (*) REMOVED HMM_Normalize in HANDMADE_MATH_IMPLEMENTATION (should use
-              HMM_NormalizeVec3, or HANDMADE_MATH_CPP_MODE for function
-              overloaded version)
-          (*) Added HMM_LengthSquaredVec2
-          (*) Added HMM_LengthSquaredVec4
-          (*) Addd HMM_LengthVec2
-          (*) Added HMM_LengthVec4
-          (*) Added HMM_NormalizeVec2
-          (*) Added HMM_NormalizeVec4
-     1.0
-          (*) Lots of testing!
-     1.1
-          (*) Quaternion support
-          (*) Added type hmm_quaternion
-          (*) Added HMM_Quaternion
-          (*) Added HMM_QuaternionV4
-          (*) Added HMM_AddQuaternion
-          (*) Added HMM_SubtractQuaternion
-          (*) Added HMM_MultiplyQuaternion
-          (*) Added HMM_MultiplyQuaternionF
-          (*) Added HMM_DivideQuaternionF
-          (*) Added HMM_InverseQuaternion
-          (*) Added HMM_DotQuaternion
-          (*) Added HMM_NormalizeQuaternion
-          (*) Added HMM_Slerp
-          (*) Added HMM_QuaternionToMat4
-          (*) Added HMM_QuaternionFromAxisAngle
-     1.1.1
-          (*) Resolved compiler warnings on gcc and g++
-     1.1.2
-          (*) Fixed invalid HMMDEF's in the function definitions
-     1.1.3
-          (*) Fixed compile error in C mode
-     1.1.4
-          (*) Fixed SSE being included on platforms that don't support it
-          (*) Fixed divide-by-zero errors when normalizing zero vectors.
-     1.1.5
-          (*) Add Width and Height to HMM_Vec2
-          (*) Made it so you can supply your own SqrtF 
-     1.2.0
-          (*) Added equality functions for HMM_Vec2, HMM_Vec3, and HMM_Vec4.
-              (*) Added HMM_EqualsVec2, HMM_EqualsVec3, and HMM_EqualsVec4
-              (*) Added C++ overloaded HMM_Equals for all three
-              (*) Added C++ == and != operators for all three
-          (*) SSE'd HMM_MultiplyMat4 (this is _WAY_ faster)
-          (*) SSE'd HMM_Transpose
-     1.3.0
-          (*) Remove need to #define HANDMADE_MATH_CPP_MODE
-     1.4.0
-          (*) Fixed bug when using HandmadeMath in C mode
-          (*) SSEd all vec4 operations          
-          (*) Removed all zero-ing
-     1.5.0
-          (*) Changed internal structure for better performance and inlining.
-          (*) As a result, HANDMADE_MATH_NO_INLINE has been removed and no
-              longer has any effect.
-          
-          
  LICENSE
  
  This software is in the public domain. Where that dedication is not
@@ -185,6 +83,7 @@
   Gingerbill (@TheGingerBill)
   Ben Visness (@bvisness) 
   Trinton Bullard (@Peliex_Dev)
+   @AntonDan
   
  Fixes:
   Jeroen van Rijn (@J_vanRijn)
@@ -312,6 +211,13 @@ typedef union hmm_vec2
    };

    float Elements[2];
+
+#ifdef __cplusplus
+    inline float &operator[](const int &Index)
+    {
+        return Elements[Index];
+    }
+#endif
 } hmm_vec2;

 typedef union hmm_vec3
@@ -356,6 +262,13 @@ typedef union hmm_vec3
    };

    float Elements[3];
+
+#ifdef __cplusplus
+    inline float &operator[](const int &Index)
+    {
+        return Elements[Index];
+    }
+#endif
 } hmm_vec3;

 typedef union hmm_vec4
@@ -413,6 +326,13 @@ typedef union hmm_vec4
 #ifdef HANDMADE_MATH__USE_SSE    
    __m128 InternalElementsSSE;
 #endif
+
+#ifdef __cplusplus
+    inline float &operator[](const int &Index)
+    {
+        return Elements[Index];
+    }
+#endif
 } hmm_vec4;

 typedef union hmm_mat4
@@ -420,8 +340,27 @@ typedef union hmm_mat4
    float Elements[4][4];
        
 #ifdef HANDMADE_MATH__USE_SSE
+    __m128 Columns[4];
+
+    // DEPRECATED. Our matrices are column-major, so this was named
+    // incorrectly. Use Columns instead.
    __m128 Rows[4];
 #endif
+
+#ifdef __cplusplus
+    inline hmm_vec4 operator[](const int &Index)
+    {
+        float* col = Elements[Index];
+
+        hmm_vec4 result;
+        result.Elements[0] = col[0];
+        result.Elements[1] = col[1];
+        result.Elements[2] = col[2];
+        result.Elements[3] = col[3];
+
+        return result;
+    }
+#endif
 } hmm_mat4;

 typedef union hmm_quaternion
@@ -441,6 +380,10 @@ typedef union hmm_quaternion
    };
    
    float Elements[4];
+
+#ifdef HANDMADE_MATH__USE_SSE    
+    __m128 InternalElementsSSE;
+#endif
 } hmm_quaternion;

 typedef int32_t hmm_bool;
@@ -1078,6 +1021,21 @@ HMM_INLINE hmm_vec4 HMM_NormalizeVec4(hmm_vec4 A)
    return (Result);
 }

+HMM_INLINE hmm_vec2 HMM_FastNormalizeVec2(hmm_vec2 A)
+{
+    return HMM_MultiplyVec2f(A, HMM_RSquareRootF(HMM_DotVec2(A, A)));
+}
+
+HMM_INLINE hmm_vec3 HMM_FastNormalizeVec3(hmm_vec3 A)
+{
+    return HMM_MultiplyVec3f(A, HMM_RSquareRootF(HMM_DotVec3(A, A)));
+}
+
+HMM_INLINE hmm_vec4 HMM_FastNormalizeVec4(hmm_vec4 A)
+{
+    return HMM_MultiplyVec4f(A, HMM_RSquareRootF(HMM_DotVec4(A, A)));
+}
+

 /*
 * SSE stuff
@@ -1087,10 +1045,10 @@ HMM_INLINE hmm_vec4 HMM_NormalizeVec4(hmm_vec4 A)
 HMM_INLINE __m128 HMM_LinearCombineSSE(__m128 Left, hmm_mat4 Right)
 {
    __m128 Result;
-    Result = _mm_mul_ps(_mm_shuffle_ps(Left, Left, 0x00), Right.Rows[0]);
-    Result = _mm_add_ps(Result, _mm_mul_ps(_mm_shuffle_ps(Left, Left, 0x55), Right.Rows[1]));
-    Result = _mm_add_ps(Result, _mm_mul_ps(_mm_shuffle_ps(Left, Left, 0xaa), Right.Rows[2]));
-    Result = _mm_add_ps(Result, _mm_mul_ps(_mm_shuffle_ps(Left, Left, 0xff), Right.Rows[3]));
+    Result = _mm_mul_ps(_mm_shuffle_ps(Left, Left, 0x00), Right.Columns[0]);
+    Result = _mm_add_ps(Result, _mm_mul_ps(_mm_shuffle_ps(Left, Left, 0x55), Right.Columns[1]));
+    Result = _mm_add_ps(Result, _mm_mul_ps(_mm_shuffle_ps(Left, Left, 0xaa), Right.Columns[2]));
+    Result = _mm_add_ps(Result, _mm_mul_ps(_mm_shuffle_ps(Left, Left, 0xff), Right.Columns[3]));
    
    return (Result);
 }
@@ -1125,7 +1083,7 @@ HMM_INLINE hmm_mat4 HMM_Transpose(hmm_mat4 Matrix)
 {
    hmm_mat4 Result = Matrix;
    
-    _MM_TRANSPOSE4_PS(Result.Rows[0], Result.Rows[1], Result.Rows[2], Result.Rows[3]);
+    _MM_TRANSPOSE4_PS(Result.Columns[0], Result.Columns[1], Result.Columns[2], Result.Columns[3]);

    return (Result);
 }
@@ -1138,10 +1096,10 @@ HMM_INLINE hmm_mat4 HMM_AddMat4(hmm_mat4 Left, hmm_mat4 Right)
 {
    hmm_mat4 Result;

-    Result.Rows[0] = _mm_add_ps(Left.Rows[0], Right.Rows[0]);
-    Result.Rows[1] = _mm_add_ps(Left.Rows[1], Right.Rows[1]);
-    Result.Rows[2] = _mm_add_ps(Left.Rows[2], Right.Rows[2]);
-    Result.Rows[3] = _mm_add_ps(Left.Rows[3], Right.Rows[3]);    
+    Result.Columns[0] = _mm_add_ps(Left.Columns[0], Right.Columns[0]);
+    Result.Columns[1] = _mm_add_ps(Left.Columns[1], Right.Columns[1]);
+    Result.Columns[2] = _mm_add_ps(Left.Columns[2], Right.Columns[2]);
+    Result.Columns[3] = _mm_add_ps(Left.Columns[3], Right.Columns[3]);    

    return (Result);
 }
@@ -1154,10 +1112,10 @@ HMM_INLINE hmm_mat4 HMM_SubtractMat4(hmm_mat4 Left, hmm_mat4 Right)
 {
    hmm_mat4 Result;

-    Result.Rows[0] = _mm_sub_ps(Left.Rows[0], Right.Rows[0]);
-    Result.Rows[1] = _mm_sub_ps(Left.Rows[1], Right.Rows[1]);
-    Result.Rows[2] = _mm_sub_ps(Left.Rows[2], Right.Rows[2]);
-    Result.Rows[3] = _mm_sub_ps(Left.Rows[3], Right.Rows[3]);
+    Result.Columns[0] = _mm_sub_ps(Left.Columns[0], Right.Columns[0]);
+    Result.Columns[1] = _mm_sub_ps(Left.Columns[1], Right.Columns[1]);
+    Result.Columns[2] = _mm_sub_ps(Left.Columns[2], Right.Columns[2]);
+    Result.Columns[3] = _mm_sub_ps(Left.Columns[3], Right.Columns[3]);

    return (Result);
 }
@@ -1173,10 +1131,10 @@ HMM_INLINE hmm_mat4 HMM_MultiplyMat4f(hmm_mat4 Matrix, float Scalar)
    hmm_mat4 Result;

    __m128 SSEScalar = _mm_set1_ps(Scalar);
-    Result.Rows[0] = _mm_mul_ps(Matrix.Rows[0], SSEScalar);
-    Result.Rows[1] = _mm_mul_ps(Matrix.Rows[1], SSEScalar);
-    Result.Rows[2] = _mm_mul_ps(Matrix.Rows[2], SSEScalar);
-    Result.Rows[3] = _mm_mul_ps(Matrix.Rows[3], SSEScalar);
+    Result.Columns[0] = _mm_mul_ps(Matrix.Columns[0], SSEScalar);
+    Result.Columns[1] = _mm_mul_ps(Matrix.Columns[1], SSEScalar);
+    Result.Columns[2] = _mm_mul_ps(Matrix.Columns[2], SSEScalar);
+    Result.Columns[3] = _mm_mul_ps(Matrix.Columns[3], SSEScalar);

    return (Result);
 }
@@ -1192,10 +1150,10 @@ HMM_INLINE hmm_mat4 HMM_DivideMat4f(hmm_mat4 Matrix, float Scalar)
    hmm_mat4 Result;
    
    __m128 SSEScalar = _mm_set1_ps(Scalar);
-    Result.Rows[0] = _mm_div_ps(Matrix.Rows[0], SSEScalar);
-    Result.Rows[1] = _mm_div_ps(Matrix.Rows[1], SSEScalar);
-    Result.Rows[2] = _mm_div_ps(Matrix.Rows[2], SSEScalar);
-    Result.Rows[3] = _mm_div_ps(Matrix.Rows[3], SSEScalar);    
+    Result.Columns[0] = _mm_div_ps(Matrix.Columns[0], SSEScalar);
+    Result.Columns[1] = _mm_div_ps(Matrix.Columns[1], SSEScalar);
+    Result.Columns[2] = _mm_div_ps(Matrix.Columns[2], SSEScalar);
+    Result.Columns[3] = _mm_div_ps(Matrix.Columns[3], SSEScalar);    

    return (Result);
 }
@@ -1275,10 +1233,14 @@ HMM_INLINE hmm_quaternion HMM_Quaternion(float X, float Y, float Z, float W)
 {
    hmm_quaternion Result;

+#ifdef HANDMADE_MATH__USE_SSE
+    Result.InternalElementsSSE = _mm_setr_ps(X, Y, Z, W);
+#else
    Result.X = X;
    Result.Y = Y;
    Result.Z = Z;
    Result.W = W;
+#endif

    return (Result);
 }
@@ -1287,10 +1249,14 @@ HMM_INLINE hmm_quaternion HMM_QuaternionV4(hmm_vec4 Vector)
 {
    hmm_quaternion Result;

+#ifdef HANDMADE_MATH__USE_SSE
+    Result.InternalElementsSSE = Vector.InternalElementsSSE;
+#else
    Result.X = Vector.X;
    Result.Y = Vector.Y;
    Result.Z = Vector.Z;
    Result.W = Vector.W;
+#endif

    return (Result);
 }
@@ -1299,10 +1265,15 @@ HMM_INLINE hmm_quaternion HMM_AddQuaternion(hmm_quaternion Left, hmm_quaternion
 {
    hmm_quaternion Result;

+#ifdef HANDMADE_MATH__USE_SSE
+    Result.InternalElementsSSE = _mm_add_ps(Left.InternalElementsSSE, Right.InternalElementsSSE);
+#else
+
    Result.X = Left.X + Right.X;
    Result.Y = Left.Y + Right.Y;
    Result.Z = Left.Z + Right.Z;
    Result.W = Left.W + Right.W;
+#endif

    return (Result);
 }
@@ -1311,10 +1282,15 @@ HMM_INLINE hmm_quaternion HMM_SubtractQuaternion(hmm_quaternion Left, hmm_quater
 {
    hmm_quaternion Result;

+#ifdef HANDMADE_MATH__USE_SSE
+    Result.InternalElementsSSE = _mm_sub_ps(Left.InternalElementsSSE, Right.InternalElementsSSE);
+#else
+
    Result.X = Left.X - Right.X;
    Result.Y = Left.Y - Right.Y;
    Result.Z = Left.Z - Right.Z;
    Result.W = Left.W - Right.W;
+#endif

    return (Result);
 }
@@ -1323,10 +1299,28 @@ HMM_INLINE hmm_quaternion HMM_MultiplyQuaternion(hmm_quaternion Left, hmm_quater
 {
    hmm_quaternion Result;

+#ifdef HANDMADE_MATH__USE_SSE
+        __m128 SSEResultOne = _mm_xor_ps(_mm_shuffle_ps(Left.InternalElementsSSE, Left.InternalElementsSSE, _MM_SHUFFLE(0, 0, 0, 0)), _mm_setr_ps(0.f, -0.f, 0.f, -0.f));
+        __m128 SSEResultTwo = _mm_shuffle_ps(Right.InternalElementsSSE, Right.InternalElementsSSE, _MM_SHUFFLE(0, 1, 2, 3));
+        __m128 SSEResultThree = _mm_mul_ps(SSEResultTwo, SSEResultOne);
+
+        SSEResultOne = _mm_xor_ps(_mm_shuffle_ps(Left.InternalElementsSSE, Left.InternalElementsSSE, _MM_SHUFFLE(1, 1, 1, 1)) , _mm_setr_ps(0.f, 0.f, -0.f, -0.f));
+        SSEResultTwo = _mm_shuffle_ps(Right.InternalElementsSSE, Right.InternalElementsSSE, _MM_SHUFFLE(1, 0, 3, 2));
+        SSEResultThree = _mm_add_ps(SSEResultThree, _mm_mul_ps(SSEResultTwo, SSEResultOne));
+
+        SSEResultOne = _mm_xor_ps(_mm_shuffle_ps(Left.InternalElementsSSE, Left.InternalElementsSSE, _MM_SHUFFLE(2, 2, 2, 2)), _mm_setr_ps(-0.f, 0.f, 0.f, -0.f));
+        SSEResultTwo = _mm_shuffle_ps(Right.InternalElementsSSE, Right.InternalElementsSSE, _MM_SHUFFLE(2, 3, 0, 1));
+        SSEResultThree = _mm_add_ps(SSEResultThree, _mm_mul_ps(SSEResultTwo, SSEResultOne));
+
+        SSEResultOne = _mm_shuffle_ps(Left.InternalElementsSSE, Left.InternalElementsSSE, _MM_SHUFFLE(3, 3, 3, 3));
+        SSEResultTwo = _mm_shuffle_ps(Right.InternalElementsSSE, Right.InternalElementsSSE, _MM_SHUFFLE(3, 2, 1, 0));
+        Result.InternalElementsSSE = _mm_add_ps(SSEResultThree, _mm_mul_ps(SSEResultTwo, SSEResultOne));
+#else
    Result.X = (Left.X * Right.W) + (Left.Y * Right.Z) - (Left.Z * Right.Y) + (Left.W * Right.X);
    Result.Y = (-Left.X * Right.Z) + (Left.Y * Right.W) + (Left.Z * Right.X) + (Left.W * Right.Y);
    Result.Z = (Left.X * Right.Y) - (Left.Y * Right.X) + (Left.Z * Right.W) + (Left.W * Right.Z);
    Result.W = (-Left.X * Right.X) - (Left.Y * Right.Y) - (Left.Z * Right.Z) + (Left.W * Right.W);
+#endif

    return (Result);
 }
@@ -1335,10 +1329,15 @@ HMM_INLINE hmm_quaternion HMM_MultiplyQuaternionF(hmm_quaternion Left, float Mul
 {
    hmm_quaternion Result;

+#ifdef HANDMADE_MATH__USE_SSE
+    __m128 Scalar = _mm_set1_ps(Multiplicative);
+    Result.InternalElementsSSE = _mm_mul_ps(Left.InternalElementsSSE, Scalar);
+#else
    Result.X = Left.X * Multiplicative;
    Result.Y = Left.Y * Multiplicative;
    Result.Z = Left.Z * Multiplicative;
    Result.W = Left.W * Multiplicative;
+#endif

    return (Result);
 }
@@ -1347,10 +1346,15 @@ HMM_INLINE hmm_quaternion HMM_DivideQuaternionF(hmm_quaternion Left, float Divid
 {
    hmm_quaternion Result;

+#ifdef HANDMADE_MATH__USE_SSE
+    __m128 Scalar = _mm_set1_ps(Dividend);
+    Result.InternalElementsSSE = _mm_div_ps(Left.InternalElementsSSE, Scalar);
+#else
    Result.X = Left.X / Dividend;
    Result.Y = Left.Y / Dividend;
    Result.Z = Left.Z / Dividend;
    Result.W = Left.W / Dividend;
+#endif

    return (Result);
 }
@@ -1359,7 +1363,18 @@ HMM_EXTERN hmm_quaternion HMM_InverseQuaternion(hmm_quaternion Left);

 HMM_INLINE float HMM_DotQuaternion(hmm_quaternion Left, hmm_quaternion Right)
 {
-    float Result = (Left.X * Right.X) + (Left.Y * Right.Y) + (Left.Z * Right.Z) + (Left.W * Right.W);
+    float Result;
+
+#ifdef HANDMADE_MATH__USE_SSE
+    __m128 SSEResultOne = _mm_mul_ps(Left.InternalElementsSSE, Right.InternalElementsSSE);
+    __m128 SSEResultTwo = _mm_shuffle_ps(SSEResultOne, SSEResultOne, _MM_SHUFFLE(2, 3, 0, 1));
+    SSEResultOne = _mm_add_ps(SSEResultOne, SSEResultTwo);
+    SSEResultTwo = _mm_shuffle_ps(SSEResultOne, SSEResultOne, _MM_SHUFFLE(0, 1, 2, 3));
+    SSEResultOne = _mm_add_ps(SSEResultOne, SSEResultTwo);
+    _mm_store_ss(&Result, SSEResultOne);
+#else
+    Result = (Left.X * Right.X) + (Left.Y * Right.Y) + (Left.Z * Right.Z) + (Left.W * Right.W);
+#endif

    return (Result);
 }
@@ -1378,11 +1393,18 @@ HMM_INLINE hmm_quaternion HMM_NLerp(hmm_quaternion Left, float Time, hmm_quatern
 {
    hmm_quaternion Result;

+#ifdef HANDMADE_MATH__USE_SSE
+    __m128 ScalarLeft = _mm_set1_ps(1.0f - Time);
+    __m128 ScalarRight = _mm_set1_ps(Time);
+    __m128 SSEResultOne = _mm_mul_ps(Left.InternalElementsSSE, ScalarLeft);
+    __m128 SSEResultTwo = _mm_mul_ps(Right.InternalElementsSSE, ScalarRight);
+    Result.InternalElementsSSE = _mm_add_ps(SSEResultOne, SSEResultTwo);
+#else
    Result.X = HMM_Lerp(Left.X, Time, Right.X);
    Result.Y = HMM_Lerp(Left.Y, Time, Right.Y);
    Result.Z = HMM_Lerp(Left.Z, Time, Right.Z);
    Result.W = HMM_Lerp(Left.W, Time, Right.W);
-
+#endif
    Result = HMM_NormalizeQuaternion(Result);

    return (Result);
@@ -1461,6 +1483,27 @@ HMM_INLINE hmm_vec4 HMM_Normalize(hmm_vec4 A)
    return (Result);
 }

+HMM_INLINE hmm_vec2 HMM_FastNormalize(hmm_vec2 A)
+{
+    hmm_vec2 Result = HMM_FastNormalizeVec2(A);
+
+    return (Result);
+}
+
+HMM_INLINE hmm_vec3 HMM_FastNormalize(hmm_vec3 A)
+{
+    hmm_vec3 Result = HMM_FastNormalizeVec3(A);
+
+    return (Result);
+}
+
+HMM_INLINE hmm_vec4 HMM_FastNormalize(hmm_vec4 A)
+{
+    hmm_vec4 Result = HMM_FastNormalizeVec4(A);
+
+    return (Result);
+}
+
 HMM_INLINE hmm_quaternion HMM_Normalize(hmm_quaternion A)
 {
    hmm_quaternion Result = HMM_NormalizeQuaternion(A);
@@ -2210,10 +2253,10 @@ hmm_mat4 HMM_MultiplyMat4(hmm_mat4 Left, hmm_mat4 Right)

 #ifdef HANDMADE_MATH__USE_SSE

-    Result.Rows[0] = HMM_LinearCombineSSE(Right.Rows[0], Left);
-    Result.Rows[1] = HMM_LinearCombineSSE(Right.Rows[1], Left);
-    Result.Rows[2] = HMM_LinearCombineSSE(Right.Rows[2], Left);
-    Result.Rows[3] = HMM_LinearCombineSSE(Right.Rows[3], Left);     
+    Result.Columns[0] = HMM_LinearCombineSSE(Right.Columns[0], Left);
+    Result.Columns[1] = HMM_LinearCombineSSE(Right.Columns[1], Left);
+    Result.Columns[2] = HMM_LinearCombineSSE(Right.Columns[2], Left);
+    Result.Columns[3] = HMM_LinearCombineSSE(Right.Columns[3], Left);     
    
 #else
    int Columns;
@@ -2334,14 +2377,17 @@ hmm_mat4 HMM_LookAt(hmm_vec3 Eye, hmm_vec3 Center, hmm_vec3 Up)
    Result.Elements[0][0] = S.X;
    Result.Elements[0][1] = U.X;
    Result.Elements[0][2] = -F.X;
+    Result.Elements[0][3] = 0.0f;

    Result.Elements[1][0] = S.Y;
    Result.Elements[1][1] = U.Y;
    Result.Elements[1][2] = -F.Y;
+    Result.Elements[1][3] = 0.0f;

    Result.Elements[2][0] = S.Z;
    Result.Elements[2][1] = U.Z;
    Result.Elements[2][2] = -F.Z;
+    Result.Elements[2][3] = 0.0f;

    Result.Elements[3][0] = -HMM_DotVec3(S, Eye);
    Result.Elements[3][1] = -HMM_DotVec3(U, Eye);
@@ -2366,10 +2412,7 @@ hmm_quaternion HMM_InverseQuaternion(hmm_quaternion Left)
    Norm = HMM_SquareRootF(HMM_DotQuaternion(Left, Left));
    NormSquared = Norm * Norm;

-    Result.X = Conjugate.X / NormSquared;
-    Result.Y = Conjugate.Y / NormSquared;
-    Result.Z = Conjugate.Z / NormSquared;
-    Result.W = Conjugate.W / NormSquared;
+    Result = HMM_DivideQuaternionF(Conjugate, NormSquared);

    return (Result);
 }
--- a/README.md
+++ b/README.md
@@ -10,6 +10,12 @@ To get started, go download [the latest release](https://github.com/HandmadeMath

 Version         | Changes        |
 ----------------|----------------|
+**1.9.0** | Added SSE versions of quaternion operations. |
+**1.8.0** | Added fast vector normalization routines that use fast inverse square roots.
+**1.7.1** | Changed operator[] to take a const ref int instead of an int.
+**1.7.0** | Renamed the 'Rows' member of hmm_mat4 to 'Columns'. Since our matrices are column-major, this should have been named 'Columns' from the start. 'Rows' is still present, but has been deprecated.
+**1.6.0** | Added array subscript operators for vector and matrix types in C++. This is provided as a convenience, but be aware that it may incur an extra function call in unoptimized builds.
+**1.5.1** | Fixed a bug with uninitialized elements in HMM_LookAt.
 **1.5.0** | Changed internal structure for better performance and inlining. As a result, `HANDMADE_MATH_NO_INLINE` has been removed and no longer has any effect.
 **1.4.0** | Fixed bug when using C mode. SSE'd all vec4 operations. Removed zeroing for better performance.
 **1.3.0** | Removed need to `#define HANDMADE_MATH_CPP_MODE`. C++ definitions are now included automatically in C++ environments.
--- a/test/categories/Initialization.h
+++ b/test/categories/Initialization.h
@@ -18,6 +18,10 @@ TEST(Initialization, Vectors)
    EXPECT_FLOAT_EQ(v2.Height, 2.0f);
    EXPECT_FLOAT_EQ(v2.Elements[0], 1.0f);
    EXPECT_FLOAT_EQ(v2.Elements[1], 2.0f);
+#ifdef __cplusplus
+    EXPECT_FLOAT_EQ(v2[0], 1.0f);
+    EXPECT_FLOAT_EQ(v2[1], 2.0f);
+#endif

    EXPECT_FLOAT_EQ(v2i.X, 1.0f);
    EXPECT_FLOAT_EQ(v2i.Y, 2.0f);
@@ -29,6 +33,10 @@ TEST(Initialization, Vectors)
    EXPECT_FLOAT_EQ(v2i.Height, 2.0f);
    EXPECT_FLOAT_EQ(v2i.Elements[0], 1.0f);
    EXPECT_FLOAT_EQ(v2i.Elements[1], 2.0f);
+#ifdef __cplusplus
+    EXPECT_FLOAT_EQ(v2i[0], 1.0f);
+    EXPECT_FLOAT_EQ(v2i[1], 2.0f);
+#endif

    //
    // Test vec3
@@ -56,6 +64,11 @@ TEST(Initialization, Vectors)
    EXPECT_FLOAT_EQ(v3.UV.Elements[1], 2.0f);
    EXPECT_FLOAT_EQ(v3.VW.Elements[0], 2.0f);
    EXPECT_FLOAT_EQ(v3.VW.Elements[1], 3.0f);
+#ifdef __cplusplus
+    EXPECT_FLOAT_EQ(v3[0], 1.0f);
+    EXPECT_FLOAT_EQ(v3[1], 2.0f);
+    EXPECT_FLOAT_EQ(v3[2], 3.0f);
+#endif

    EXPECT_FLOAT_EQ(v3i.X, 1.0f);
    EXPECT_FLOAT_EQ(v3i.Y, 2.0f);
@@ -77,6 +90,11 @@ TEST(Initialization, Vectors)
    EXPECT_FLOAT_EQ(v3i.UV.Elements[1], 2.0f);
    EXPECT_FLOAT_EQ(v3i.VW.Elements[0], 2.0f);
    EXPECT_FLOAT_EQ(v3i.VW.Elements[1], 3.0f);
+#ifdef __cplusplus
+    EXPECT_FLOAT_EQ(v3i[0], 1.0f);
+    EXPECT_FLOAT_EQ(v3i[1], 2.0f);
+    EXPECT_FLOAT_EQ(v3i[2], 3.0f);
+#endif

    //
    // Test vec4
@@ -107,6 +125,12 @@ TEST(Initialization, Vectors)
    EXPECT_FLOAT_EQ(v4.RGB.Elements[0], 1.0f);
    EXPECT_FLOAT_EQ(v4.RGB.Elements[1], 2.0f);
    EXPECT_FLOAT_EQ(v4.RGB.Elements[2], 3.0f);
+#ifdef __cplusplus
+    EXPECT_FLOAT_EQ(v4[0], 1.0f);
+    EXPECT_FLOAT_EQ(v4[1], 2.0f);
+    EXPECT_FLOAT_EQ(v4[2], 3.0f);
+    EXPECT_FLOAT_EQ(v4[3], 4.0f);
+#endif

    EXPECT_FLOAT_EQ(v4i.X, 1.0f);
    EXPECT_FLOAT_EQ(v4i.Y, 2.0f);
@@ -130,6 +154,12 @@ TEST(Initialization, Vectors)
    EXPECT_FLOAT_EQ(v4i.RGB.Elements[0], 1.0f);
    EXPECT_FLOAT_EQ(v4i.RGB.Elements[1], 2.0f);
    EXPECT_FLOAT_EQ(v4i.RGB.Elements[2], 3.0f);
+#ifdef __cplusplus
+    EXPECT_FLOAT_EQ(v4i[0], 1.0f);
+    EXPECT_FLOAT_EQ(v4i[1], 2.0f);
+    EXPECT_FLOAT_EQ(v4i[2], 3.0f);
+    EXPECT_FLOAT_EQ(v4i[3], 4.0f);
+#endif

    EXPECT_FLOAT_EQ(v4v.X, 1.0f);
    EXPECT_FLOAT_EQ(v4v.Y, 2.0f);
@@ -153,6 +183,12 @@ TEST(Initialization, Vectors)
    EXPECT_FLOAT_EQ(v4v.RGB.Elements[0], 1.0f);
    EXPECT_FLOAT_EQ(v4v.RGB.Elements[1], 2.0f);
    EXPECT_FLOAT_EQ(v4v.RGB.Elements[2], 3.0f);
+#ifdef __cplusplus
+    EXPECT_FLOAT_EQ(v4v[0], 1.0f);
+    EXPECT_FLOAT_EQ(v4v[1], 2.0f);
+    EXPECT_FLOAT_EQ(v4v[2], 3.0f);
+    EXPECT_FLOAT_EQ(v4v[3], 4.0f);
+#endif
 }

 TEST(Initialization, MatrixEmpty)
@@ -163,6 +199,9 @@ TEST(Initialization, MatrixEmpty)
        for (int Row = 0; Row < 4; ++Row)
        {
            EXPECT_FLOAT_EQ(m4.Elements[Column][Row], 0.0f);
+#ifdef __cplusplus
+            EXPECT_FLOAT_EQ(m4[Column][Row], 0.0f);
+#endif
        }
    }
 }
--- a/test/categories/SSE.h
+++ b/test/categories/SSE.h
@@ -8,10 +8,10 @@ TEST(SSE, LinearCombine)
    hmm_mat4 MatrixTwo = HMM_Mat4d(4.0f);
    hmm_mat4 Result;
    
-    Result.Rows[0] = HMM_LinearCombineSSE(MatrixOne.Rows[0], MatrixTwo);
-    Result.Rows[1] = HMM_LinearCombineSSE(MatrixOne.Rows[1], MatrixTwo);
-    Result.Rows[2] = HMM_LinearCombineSSE(MatrixOne.Rows[2], MatrixTwo);
-    Result.Rows[3] = HMM_LinearCombineSSE(MatrixOne.Rows[3], MatrixTwo);
+    Result.Columns[0] = HMM_LinearCombineSSE(MatrixOne.Columns[0], MatrixTwo);
+    Result.Columns[1] = HMM_LinearCombineSSE(MatrixOne.Columns[1], MatrixTwo);
+    Result.Columns[2] = HMM_LinearCombineSSE(MatrixOne.Columns[2], MatrixTwo);
+    Result.Columns[3] = HMM_LinearCombineSSE(MatrixOne.Columns[3], MatrixTwo);
    
    {
        EXPECT_FLOAT_EQ(Result.Elements[0][0], 8.0f);
@@ -24,13 +24,11 @@ TEST(SSE, LinearCombine)
        EXPECT_FLOAT_EQ(Result.Elements[1][2], 0.0f);
        EXPECT_FLOAT_EQ(Result.Elements[1][3], 0.0f);
                        
-                        
        EXPECT_FLOAT_EQ(Result.Elements[2][0], 0.0f);
        EXPECT_FLOAT_EQ(Result.Elements[2][1], 0.0f);                
        EXPECT_FLOAT_EQ(Result.Elements[2][2], 8.0f);
        EXPECT_FLOAT_EQ(Result.Elements[2][3], 0.0f);

-        
        EXPECT_FLOAT_EQ(Result.Elements[3][0], 0.0f);
        EXPECT_FLOAT_EQ(Result.Elements[3][1], 0.0f);                
        EXPECT_FLOAT_EQ(Result.Elements[3][2], 0.0f);
--- a/test/categories/Transformation.h
+++ b/test/categories/Transformation.h
@@ -51,3 +51,27 @@ TEST(Transformations, Scale)
    EXPECT_FLOAT_EQ(scaled.Z, 1.5f);
    EXPECT_FLOAT_EQ(scaled.W, 1.0f);
 }
+
+TEST(Transformations, LookAt)
+{
+    const float abs_error = 0.0001f;
+
+    hmm_mat4 result = HMM_LookAt(HMM_Vec3(1.0f, 0.0f, 0.0f), HMM_Vec3(0.0f, 2.0f, 1.0f), HMM_Vec3(2.0f, 1.0f, 1.0f));
+
+    EXPECT_NEAR(result.Elements[0][0], 0.169031f, abs_error);
+    EXPECT_NEAR(result.Elements[0][1], 0.897085f, abs_error);
+    EXPECT_NEAR(result.Elements[0][2], 0.408248f, abs_error);
+    EXPECT_FLOAT_EQ(result.Elements[0][3], 0.0f);
+    EXPECT_NEAR(result.Elements[1][0], 0.507093f, abs_error);
+    EXPECT_NEAR(result.Elements[1][1], 0.276026f, abs_error);
+    EXPECT_NEAR(result.Elements[1][2], -0.816497f, abs_error);
+    EXPECT_FLOAT_EQ(result.Elements[1][3], 0.0f);
+    EXPECT_NEAR(result.Elements[2][0], -0.845154f, abs_error);
+    EXPECT_NEAR(result.Elements[2][1], 0.345033f, abs_error);
+    EXPECT_NEAR(result.Elements[2][2], -0.408248f, abs_error);
+    EXPECT_FLOAT_EQ(result.Elements[2][3], 0.0f);
+    EXPECT_NEAR(result.Elements[3][0], -0.169031f, abs_error);
+    EXPECT_NEAR(result.Elements[3][1], -0.897085f, abs_error);
+    EXPECT_NEAR(result.Elements[3][2], -0.408248f, abs_error);
+    EXPECT_FLOAT_EQ(result.Elements[3][3], 1.0f);
+}
--- a/test/categories/VectorOps.h
+++ b/test/categories/VectorOps.h
@@ -134,6 +134,106 @@ TEST(VectorOps, NormalizeZero)
 #endif
 }

+TEST(VectorOps, FastNormalize)
+{
+    hmm_vec2 v2 = HMM_Vec2(1.0f, -2.0f);
+    hmm_vec3 v3 = HMM_Vec3(1.0f, -2.0f, 3.0f);
+    hmm_vec4 v4 = HMM_Vec4(1.0f, -2.0f, 3.0f, -1.0f);
+
+    {
+        hmm_vec2 result = HMM_FastNormalizeVec2(v2);
+        EXPECT_NEAR(HMM_LengthVec2(result), 1.0f, 0.001f);
+        EXPECT_GT(result.X, 0.0f);
+        EXPECT_LT(result.Y, 0.0f);
+    }
+    {
+        hmm_vec3 result = HMM_FastNormalizeVec3(v3);
+        EXPECT_NEAR(HMM_LengthVec3(result), 1.0f, 0.001f);
+        EXPECT_GT(result.X, 0.0f);
+        EXPECT_LT(result.Y, 0.0f);
+        EXPECT_GT(result.Z, 0.0f);
+    }
+    {
+        hmm_vec4 result = HMM_FastNormalizeVec4(v4);
+        EXPECT_NEAR(HMM_LengthVec4(result), 1.0f, 0.001f);
+        EXPECT_GT(result.X, 0.0f);
+        EXPECT_LT(result.Y, 0.0f);
+        EXPECT_GT(result.Z, 0.0f);
+        EXPECT_LT(result.W, 0.0f);
+    }
+
+#ifdef __cplusplus
+    {
+        hmm_vec2 result = HMM_FastNormalize(v2);
+        EXPECT_NEAR(HMM_LengthVec2(result), 1.0f, 0.001f);
+        EXPECT_GT(result.X, 0.0f);
+        EXPECT_LT(result.Y, 0.0f);
+    }
+    {
+        hmm_vec3 result = HMM_FastNormalize(v3);
+        EXPECT_NEAR(HMM_LengthVec3(result), 1.0f, 0.001f);
+        EXPECT_GT(result.X, 0.0f);
+        EXPECT_LT(result.Y, 0.0f);
+        EXPECT_GT(result.Z, 0.0f);
+    }
+    {
+        hmm_vec4 result = HMM_FastNormalize(v4);
+        EXPECT_NEAR(HMM_LengthVec4(result), 1.0f, 0.001f);
+        EXPECT_GT(result.X, 0.0f);
+        EXPECT_LT(result.Y, 0.0f);
+        EXPECT_GT(result.Z, 0.0f);
+        EXPECT_LT(result.W, 0.0f);
+    }
+#endif
+}
+
+TEST(VectorOps, FastNormalizeZero)
+{
+    hmm_vec2 v2 = HMM_Vec2(0.0f, 0.0f);
+    hmm_vec3 v3 = HMM_Vec3(0.0f, 0.0f, 0.0f);
+    hmm_vec4 v4 = HMM_Vec4(0.0f, 0.0f, 0.0f, 0.0f);
+
+    {
+        hmm_vec2 result = HMM_FastNormalizeVec2(v2);
+        EXPECT_FLOAT_EQ(result.X, 0.0f);
+        EXPECT_FLOAT_EQ(result.Y, 0.0f);
+    }
+    {
+        hmm_vec3 result = HMM_FastNormalizeVec3(v3);
+        EXPECT_FLOAT_EQ(result.X, 0.0f);
+        EXPECT_FLOAT_EQ(result.Y, 0.0f);
+        EXPECT_FLOAT_EQ(result.Z, 0.0f);
+    }
+    {
+        hmm_vec4 result = HMM_FastNormalizeVec4(v4);
+        EXPECT_FLOAT_EQ(result.X, 0.0f);
+        EXPECT_FLOAT_EQ(result.Y, 0.0f);
+        EXPECT_FLOAT_EQ(result.Z, 0.0f);
+        EXPECT_FLOAT_EQ(result.W, 0.0f);
+    }
+
+#ifdef __cplusplus
+    {
+        hmm_vec2 result = HMM_FastNormalize(v2);
+        EXPECT_FLOAT_EQ(result.X, 0.0f);
+        EXPECT_FLOAT_EQ(result.Y, 0.0f);
+    }
+    {
+        hmm_vec3 result = HMM_FastNormalize(v3);
+        EXPECT_FLOAT_EQ(result.X, 0.0f);
+        EXPECT_FLOAT_EQ(result.Y, 0.0f);
+        EXPECT_FLOAT_EQ(result.Z, 0.0f);
+    }
+    {
+        hmm_vec4 result = HMM_FastNormalize(v4);
+        EXPECT_FLOAT_EQ(result.X, 0.0f);
+        EXPECT_FLOAT_EQ(result.Y, 0.0f);
+        EXPECT_FLOAT_EQ(result.Z, 0.0f);
+        EXPECT_FLOAT_EQ(result.W, 0.0f);
+    }
+#endif
+}
+
 TEST(VectorOps, Cross)
 {
    hmm_vec3 v1 = HMM_Vec3(1.0f, 2.0f, 3.0f);
Author	SHA1	Message	Date
Ben Visness	45c91702a9	Added SSE support for Quaternion operations (#97 ) (#98 ) * Added SSE support for Quaternion operations (#97) * Added SSE support for Quaternion operations O2 \| Function \| SSE \| NO SSE \| ==================================================== \| Inverse \| 163 (0.89s) \| 165 (1.89s) \| \| NLerp \| 330 (1.70s) \| 330 (1.75s) \| \| Normalize \| 169 (1.03s) \| 169 (1.06s) \| \| Dot \| 22 (1.15s) \| 23 (1.14s) \| \| DivF \| 23 (0.72s) \| 23 (0.82s) \| \| MulF \| 22 (0.75s) \| 22 (0.79s) \| \| Mul \| 24 (1.14s) \| 23 (1.24s) \| \| Sub \| 23 (1.17s) \| 37 (1.20s) \| \| Add \| 23 (1.20s) \| 24 (1.19s) \| O0 \| Function \| SSE \| NO SSE \| ==================================================== \| Inverse \| 394 (1.62s) \| 430 (3.05s) \| \| NLerp \| 694 (2.71s) \| 1035(4.81s) \| \| Normalize \| 374 (1.58s) \| 412 (2.95s) \| \| Dot \| 81 (1.83s) \| 23 (2.50s) \| \| DivF \| 61 (1.12s) \| 25 (2.37s) \| \| MulF \| 58 (1.09s) \| 23 (2.31s) \| \| Mul \| 94 (1.97s) \| 42 (2.88s) \| \| Sub \| 75 (1.83s) \| 23 (2.82s) \| \| Add \| 75 (1.81s) \| 23 (2.81s) \| * Fixed quaternion multiplication Old quaternion multiplication had a bug, this is a different approach. * Added release notes and version for 1.9.0	2019-03-11 13:12:48 -05:00
Ben Visness	f7c8e1f7d1	Add fast vector normalization (#94 ) * Add fast normalization routines * Update readme and remove version history from main file * Update version at top of file	2018-11-29 22:02:41 -08:00
Ben Visness	5ca1d58b36	Improve grammar/spelling	2018-11-29 13:21:05 -06:00
Zak Strange	5bf727dbd5	Removed copy in operator[] (#93 ) * Removed copy in operator[] * Updated version info	2018-11-29 09:32:12 -08:00
Ben Visness	295f6c476f	Rename Rows to Columns on hmm_mat4 (#91 )	2018-08-17 11:02:44 -07:00
Ben Visness	e095aefaf7	Bump file version	2018-06-10 15:32:12 -04:00
Ben Visness	4e2f47db55	Add array subscript operators for all types (#88 ) * Add array subscript operators for all types * Taking the parameter for the operator[] as a reference. This should allow it to be inlined * I guess you can't do that. * Update version and readme	2018-06-10 15:26:48 -04:00
Ben Visness	bee0e0c569	WIP: Properly initialize all elements of LookAt matrix (#84 ) * Properly initialize all elements of LookAt matrix * Update version and readme * Add a test for LookAt good enough	2018-06-03 18:42:09 -05:00