Hi yuz-ers! We’re very excited to offer you one of the biggest code rewrites in yuzu’s history: The Texture Cache Rewrite! Now available to our Early Access members, continue reading to learn more.
yuzu started as a fork of Citra, so Citra’s texture cache (or rasterizer cache, as it was called at the time) was used in the early days of yuzu. However, this cache only supported OpenGL, so one of the first efforts when adding support for Vulkan was to make the code more generic, helping in GPU emulation.
When this was being worked on, we were still learning how the Nintendo Switch’s GPU worked (we still are, but even more so then). Some design decisions taken at the time stuck with the codebase making things harder to change in the future. It was also easier to break with unrelated changes.
So out with the old, in with the new. The previous implementation was no longer sufficient, so Rodrigo started working on a complete rewrite from scratch. This includes but is not limited to:
In short: it fixes a lot of graphical bugs, improves performance, and is not limited to any hardware configuration or driver in use. Improvements for everyone, once all parts are finished.
Before we talk about performance, here are just a few examples of the rendering fixes you can expect to see with this release:
Shadows in Splatoon 2 are now rendered correctly, finally allowing us to admire the beautiful cityscape
Lighting and stencil shadow corruption is now fixed in Luigi's Mansion 3
Astral Chain no longer exhibits black texture corruption
Depth of field issues are gone in Animal Crossing: New Horizons
Xenoblade Chronicles 2 is free from vertex explosions on AMD Vulkan drivers
Texture swapping & flickering issues are fixed in all Xenoblade Chronicles games
Jumbotrons now display correctly in Super Smash Bros. Ultimate. Here’s an example running in the radeonsi mesa OpenGL Linux drivers
yuzu now has Multisample anti-aliasing (MSAA) support, as shown in SONIC FORCES here
Slow rainfall fixed in The Legend of Zelda: Breath of the Wild (Requires High GPU accuracy)
Rune transportation renders just like native hardware in The Legend of Zelda: Breath of the Wild
On top of the rendering improvements, many games show a 10-30% improvement to framerate, with greatly improved frametime stability as demonstrated below:
Luigi’s Mansion 3 received some huge leaps in rendering accuracy *and* performance, notice the frametime graph
Super Mario Odyssey
The Legend of Zelda: Breath of the Wild
Animal Crossing: New Horizons
Due to these changes, hardware lacking the
VK_EXT_robustness2 extension will not produce the optimal experience. In Windows, this includes AMD graphics cards older than Vega (Polaris and older series) and all Intel iGPUs to date. You can check the current support here. Games requesting this extension on unsupported hardware may behave randomly or crash in rare instances. A fallback code path is being worked on. Make sure to be up to date with your drivers, as the GPU vendor may be able to add support in the future if the hardware allows it.
Bindless Texture support was expected to be added, but several difficulties emerged during development. One of the problems is the lack of native hardware support for ASTC texture decoding. If we used uncompressed textures, GPUs with less than 8GB of VRAM would not be able to load all the game assets, and if we recompressed them in another texture format to avoid this problem, image quality would degrade. True bindless texture support can be considered again in the future.
Depth Stencil Blits are not implemented on Vulkan for devices that don’t offer native support (any AMD and Intel GPU).
Another complication that emerged during development is related to memory management. The idea was to release the Texture Cache Rewrite with what the team calls the
Texture Reaper, a way to remove textures from VRAM that have not been used after some time. While this has been almost working in OpenGL during testing, managing to run Luigi’s Mansion 3 in under 300MB of VRAM, Vulkan on the other hand received no benefit.
Vulkan faces one main problem: it fragments the memory when textures need to be mapped to contiguous video memory. There’s no tolerance for fragmentation, so freeing blocks will not help at all if the next texture doesn’t fit in the new empty space. This will require the development of a VRAM defragmentation routine, work that can take quite some time. So we can say that today marks the day
Project Texture Reaper starts.
A feature that will be added shortly later will be
Accelerated Texture Decoding, which will handle any texture format via
Compute Shaders, even formats the GPU doesn’t support natively. epicboy is working on the ASTC compute decoder.
The next project Rodrigo is working on is the
Buffer Cache Rewrite. This work promises to solve more rendering issues (for example font rendering problems) and seriously improve performance, especially on memory bandwidth starved hardware like integrated GPUs.