This issue is for tracking an experiment for a pure-Zig WebGPU implementation.
Tradeoffs of using Google's Dawn
Today Mach relies on Dawn (Google Chrome's WebGPU implementation) as it's graphics abstraction layer, specifically we maintain a fork of Dawn and we do a lot of heavy lifting in
mach/gpu to translate all of Dawn's build configuration files to Zig's build system, maintain native system SDKs, etc.
This has several benefits:
- Dawn will be shipped with Google Chrome sometime in 2022 and will thus be one of, if not the most, battle-tested WebGPU implementations in existence.
- Google, Intel, and others throw significant resources behind Dawn and it's shader compiler, Tint. Look at the commit history and you'll see there are multiple improvements daily and several engineers working on it.
- Dawn seems to be the most mature WebGPU and actively developed implementation today (this is debatable and just my personal opinion.)
- It is all C++/C/ObjC so we can compile it with the
zig compiler, we already have cross-compilation working for macOS and Linux, Windows is not far off.
There are drawbacks:
- On very modern laptops, Dawn takes 3-8 minutes to compile currently We've managed to reduce this already, and have identified ways to reduce it further - but doing so is quite a chore. In some cases, we're not sure changes will be accepted upstream to eliminate dependencies as Chrome gets them for virtually free.
- Dawn produces massive binaries, the static lib out of the box with our build setup is ~1GB in size. This increases link times. We've reduced it to ~50MB with omission of debug symbols, but still.
- Compilation on Windows is something we will solve, but is a real challenge. Dawn wants to support Windows UWP apps in the past, and so it depends on UWP headers and we need to patch these dependencies out. Dawn's shader compiler, Tint, targets HLSL and so needs an HLSL compiler still - DirectXShaderCompiler - which is a fork of LLVM and further adds to the chonkyness.
- Dawn needs to target the widest array of devices: OpenGL and OpenGL ES fallbacks, DirectX 11 in addition to DirectX 12, older versions of macOS / Metal, etc.
Overall, Dawn is a battle-tested production-worthy WebGPU implementation. There are good and bad aspects to that.
Tradeoffs of using gfx-rs/wgpu-native?
- Compared to Dawn, gfx-rs has much better compilation times - less than a minute compared to Dawn's 3-8 minutes on macOS. If you eliminate Dawn's dependencies on spirv-tools (which is perfectly doable on macOS) they are comparable, unsure about on other OS.
- gfx-rs requires a full Rust toolchain, and complications the cross compilation story significantly (need to manage Rust cross compilation toolchains, etc.)
- Binaries are available, but that's not much of an advantage (we could do the same with Dawn easily)
My assessment is that gfx-rs is a quite strong WebGPU implementation, likely to be on par with Dawn in the future, but overall compile times are still slow, cross compilation would be harder, and I do not want a hard dependency on a Rust toolchain.
The case for a pure-Zig WebGPU implementation
- Blazing fast compile times. That's a big one.
- Having a pure-Zig implementation would allow for ourselves to contribute to the WebGPU implementation more easily, fix bugs when they are present, etc. It's fun and pleasant to dive into Zig code.
- Having a pure-Zig implementation would open the door for someone to easily add Nintendo Switch and PS5 support using their native graphics APIs.
Of course, it cannot be understated that this is still a massive undertaking. And so:
- We could start by targeting just D3D12, Vulkan, and Metal (no D3D11, OpenGL, or OpenGL ES fallbacks.) In theory, this would make our implementation simpler, lighter weight, easier to cross-compile, etc.
- We can in the short/medium-term still utilize Google's Tint shader compiler for WGSL->(SPIRV/HLSL/MSL), as the shader compiler does appear to be by far the most complex aspect of a functioning WebGPU implementation. We can also leverage Tint as a test bed to compare our own shader compiler against.
- Tint and Naga both aim for ultra widespread hardware compatibility, and so for e.g. DirectX and Metal backends they perform WGSL->HLSL->DXIL, WGSL->MSL->AIR. We could aim to skip this intermediate text representation and target DXIL (DirectX IL, a subset of LLVM IR) and AIR (Apple IR, also LLVM IR-like, but we'd need to do a cleanroom reverse engineered implementation of it a bit.)
Lastly, we will still have Dawn as an option - potentially even with binary builds to work around the compilation speed issue - so that one can just flip a build switch and go between the pure-Zig or Dawn implementation.
How this will work
mach/gpu expose a Zig WebGPU interface (similar to the
std.mem.Allocator interface) which can plug various implementations:
- In the case of Dawn, there will be a
- In the case of browsers, there will be a JS-backed implementation.
- In the case of our pure-Zig implementation, we will implement the interface directly.
Such an interface is useful for many reasons: one could implement a WebGPU interface that wraps another and provides API tracing/perf measurements, record/replay, serializing over a network, etc.
- Begin toying around with implementing this for Metal, DirectX 12, and/or Vulkan. The sky is the limit here, really, so help is very welcome.
I've began toying with a Metal implementation (not very far at all, just far enough to realize how large an undertaking this is) and will work on completing #1 so we have "something" in place.
It's very possible we learn this is too much work and/or not worth it. In such a case, most of it would be scrapped!