Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NativeAOT-LLVM] Vector.IsHardwareAccelerated true? #2515

Open
jasonthorsness opened this issue Feb 11, 2024 · 4 comments
Open

[NativeAOT-LLVM] Vector.IsHardwareAccelerated true? #2515

jasonthorsness opened this issue Feb 11, 2024 · 4 comments

Comments

@jasonthorsness
Copy link

I understand that browsers mostly support WebAssembly SIMD and so does Emscripten

I am seeing Vector.IsHardwareAccelerated return false from my app compiled with

<PackageReference Include="Microsoft.DotNet.ILCompiler.LLVM; runtime.win-x64.Microsoft.DotNet.ILCompiler.LLVM" Version="9.0.0-*" />
dotnet publish -r browser-wasm -c Release /p:MSBuildEnableWorkloadResolver=false --self-contained /p:NativeDebugSymbols=false /p:EmccExtraArgs="-s EXPORTED_FUNCTIONS=""[_malloc,_Answer]"" -s EXPORTED_RUNTIME_METHODS=cwrap --post-js=run.js"

Is this expected at this time? Thanks!

@SingleAccretion
Copy link

SingleAccretion commented Feb 11, 2024

Is this expected at this time?

Yes, we don't support SIMD yet.

@jasonthorsness
Copy link
Author

jasonthorsness commented Feb 11, 2024

Well I tried the same code in Blazor AOT which supposedly supports SIMD and it's not any faster; they must not support these operations - on my system this is 120ms natively compiled, 2400 ms Blazor, 1600 ms NativeAOT-LLVM

    public class Class1
    {
        [UnmanagedCallersOnly(EntryPoint = "Alloc")]
        public static unsafe byte* Alloc(int length)
        {
            return (byte*)NativeMemory.AlignedAlloc((nuint)length, (nuint)Vector<byte>.Count);
        }

        [UnmanagedCallersOnly(EntryPoint = "Answer")]
        public static unsafe int Answer(byte* f, int l)
        {
            for (int i = 0; i < 10000; ++i)
            {
                for (byte* ptr = f; ptr != f + l; ptr += Vector<byte>.Count)
                {
                    (~Vector.LoadAligned(ptr)).StoreAligned(ptr);
                }
            }

            return Vector<byte>.Count + (Vector.IsHardwareAccelerated ? 100 : 1000);
        }
    }
    ```
    
Blazor used:
    ```
    <RunAOTCompilation>true</RunAOTCompilation>
    <WasmEnableSIMD>true</WasmEnableSIMD>

Would it be straightforward to link in a C or C++ file with SSE2 intrinsics and have Emscripten translate it? Any examples? (sorry this doesn't seem appropriate for issue; not sure where else to discuss/ask questions)

@SingleAccretion
Copy link

SingleAccretion commented Feb 12, 2024

Would it be straightforward to link in a C or C++ file with SSE2 intrinsics and have Emscripten translate it? Any examples?

With NativeAOT-LLVM, you would first need to compile the native code into a native library. For the case of a single .c file, it can be as simple as:

; See https://emscripten.org/docs/porting/simd.html#compiling-simd-code-targeting-x86-sse-instruction-sets for SSE compatibility flags.
emcc -msimd128 -c lib.c -O2 -o lib.o

<NativeLibrary Include="lib.o" /> ; Statically linked code, use direct PInvoke to invoke it.

You do need to use a matching version of Emscripten, however.

https://learn.microsoft.com/en-us/aspnet/core/blazor/webassembly-native-dependencies?view=aspnetcore-8.0 is the documentation for how to do the same using the upstream toolchain - it supports compiling source files directly.

@jasonthorsness
Copy link
Author

Just wanted to note; this works great - same test above using the WASM SIMD functions directly is only ~330 ms which seems expected; the natively-compiled version code is twice as fast (likely because it gets to use 256-bit vectors on my machine instead of 128-bit) and the WASM SIMD version is roughly 4 times faster than the Vector version.

In case anyone sees this I just put this in my project file:

  <ItemGroup>
    <DirectPInvoke Include="lib" />
    <NativeLibrary Include="lib.o" />
  </ItemGroup>

  <Target Name="CompileNativeLibrary" BeforeTargets="BeforeBuild">
    <Exec Command="emcc -msimd128 -c lib.c -O2 -o lib.o" />
  </Target>

Then in the code

        [LibraryImport("lib")]
        internal static unsafe partial void bar(byte* ptr, int n);

And for this test lib.c file is just this:

#include <stddef.h>
#include <wasm_simd128.h>

void bar(uint8_t* ptr, int length) {
    v128_t* simd_ptr = (v128_t*)ptr;
    size_t num_vectors = length / sizeof(v128_t);
    v128_t ones = wasm_i32x4_splat(~0);
    for (size_t i = 0; i < num_vectors; ++i) {
        v128_t current_vector = wasm_v128_load(simd_ptr + i);
        v128_t inverted_vector = wasm_v128_xor(current_vector, ones);
        wasm_v128_store(simd_ptr + i, inverted_vector);
    }
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants