Microsoft Visual Studio 2017 Supports Intel AVX-512

Good news! I been labouring a library just for that purpose; https://github.com/t0rakka/mango

Check out the include/mango/math/ folder for C++ operator overloaded SIMD vector classes. The underlying low-level code is in simd/ folder and has implementations for various architectures so the code also acts as portable SIMD abstraction. The different levels: native -> simd -> math levels interact seamlessly together in case some super exotic instruction must be used.

I can't stress the fact that these abstractions are crafted to be no overhead whatsoever when using the primitives. The calling conventions, parameter passing, everything has been single-mindedly crafted to be as efficient as possible. We don't want spilling; everything runs in-registers as much as humanly possible - at least not because of doing bad choices!

The low-level API can be described as "Functional" ; nothing is passed by reference - objects are not modified by the functions - result of calculation is always a returned by value.

Here's some random code I wrote recently so that can take a peek at what the API looks like to use:

        int32x4 coverageMask = (cx0 & cx1 & cx2) < zero;
        uint32 mask = coverageMask.mask();
        if (mask)
        {
            float32x4 c0 = convert<float32x4>(cx0);
            float32x4 c1 = convert<float32x4>(cx1);
            float32x4 c2 = convert<float32x4>(cx2);
            float32x4 w = 1.0f / (c0 * block.w[0] + c1 * block.w[1] + c2 * block.w[2]);

            float32x4 depth = (c0 * block.depth[0] + c1 * block.depth[1] + c2 * block.depth[2]) * w;
            float32x4 depthMask = (depth < depthBuffer[0]) & reinterpret<float32x4>(coverageMask);
            int32x4 colorMask = reinterpret<int32x4>(depthMask);

            float32x4 r = (c0 * block.color[0].xxxx + c1 * block.color[1].xxxx + c2 * block.color[2].xxxx) * w;
            float32x4 g = (c0 * block.color[0].yyyy + c1 * block.color[1].yyyy + c2 * block.color[2].yyyy) * w;
            float32x4 b = (c0 * block.color[0].zzzz + c1 * block.color[1].zzzz + c2 * block.color[2].zzzz) * w;
            int32x4 v0 = convert<int32x4>(r);
            int32x4 v1 = convert<int32x4>(g);
            int32x4 v2 = convert<int32x4>(b);
            int32x4 color = v2 | (v1 << 8) | (v0 << 16);

            colorBuffer[0] = select(colorMask, color, colorBuffer[0]);
            depthBuffer[0] = select(depthMask, depth, depthBuffer[0]);
        }
/r/cpp Thread Parent Link - blogs.msdn.microsoft.com