Progress Report July 2021

Written by GoldenX86 and Honghoa on August 12 2021

Bienvenidos, yuz-ers, to our latest monthly progress report! We have a lot to talk about this month, so buckle up, ‘cause this will be one good ride!

Project Hades

 Why Hades? Well here's why!

Why Hades? Well here's why!

Kept you waiting, huh? After being in development for six months, and spanning almost 50,000 lines of new code, Project Hades has finally been released. This massive rewrite of the shader decompiler is brought to you by Rodrigo, Blinkhawk, and epicboy.

Fixing an innumerable amount of rendering bugs, reducing shader build times, improving compatibility, and increasing performance by over 30% for all GPU vendors, Hades is easily one of the biggest changes made to yuzu to date.

We have a dedicated article explaining the process in technical detail, so we will be focusing only on the end user changes and some recommendations to help you get the best experience out of this new feature that both Early Access and Mainline users can enjoy.

While we keep OpenGL as the default graphics API for compatibility reasons (outdated drivers won’t affect it as much, and it lets Nvidia Fermi GPU users run yuzu out of the box), we strongly recommend testing your games with the Vulkan API first. Vulkan performance and compatibility have improved significantly (especially if paired with the Texture Reaper, the GPU Cache Garbage Collector), additionally, rendering and shader build performance almost always beat OpenGL. This applies not only for AMD and Intel GPU users, but also Nvidia users.

There is an exception, however. The Intel Linux Vulkan driver is not stable at the moment, but we’re investigating the cause of the issue. For now, Intel Linux users should stick to OpenGL.

 Integrated GPU users benefit the most from Hades

Integrated GPU users benefit the most from Hades

Hades implements a Pipeline Cache for both Vulkan and OpenGL, meaning that regardless of which API you are using, all shaders are now stored and reused the next time the game is started. This functions similarly to how the old OpenGL Shader Cache behaved. Needless to say, this means that all previous Shader Caches are no longer valid, and will be discarded if someone tries to use them.

The difference in terminology lies in the fact that the whole Graphics Pipeline is now stored, not just a specific set of Shader stages. An important detail, the OpenGL pipeline cache is not interchangeable with the Vulkan pipeline cache and vice versa. Two separate sets of shaders are generated if you use both APIs.

Vulkan now also benefits from parallel shader building, meaning all CPU threads will be able to handle all upcoming shaders in a parallel fashion, instead of asynchronously, avoiding graphical issues and building faster. The end result is the shortest build times of all API and shader backends (more on this later). Thus, on a fresh game with no previously built cache, more CPU threads will provide a smoother experience, with no imposed limit. Someone should test running Super Smash Bros. Ultimate on a big server CPU!

First time gameplay has never been smoother!

All CPU threads, save for one, are used to build shaders. The remaining free thread either handles shader saving to the pipeline cache, or continues the rendering process, depending on if all shaders have been dealt with at the moment. This decision was made not only to improve performance, but also to improve overall system response times while building several shaders simultaneously, and to avoid certain “gaming” laptops from overheating the CPU while keeping all threads busy.

Note that this is a hardware design flaw by the laptop vendors, not an issue with the emulator. The product should provide enough cooling performance to keep its components cool enough even at full demand, not just for reaching advertised turbo clock speeds in short bursts. (Writer note: basically, if you want good gaming performance and longevity, buy thicc laptops.)

Now, not all games will perform or render the best in Vulkan, some will still show better results with OpenGL instead. For the old API, we have some changes too.

 When selecting OpenGL, new options show up!

When selecting OpenGL, new options show up!

We have introduced a new drop list option in the Graphics settings. Replacing the device selection of Vulkan when using OpenGL, Shader backend shows up, giving three different options to choose from.

Out of the box, yuzu uses GLSL, the default backend for OpenGL. On good performing OpenGL GPU drivers (only Nvidia and the Linux Mesa drivers as of now), it has the best performance, but also takes the longest time to build shaders, resulting in noticeable stuttering when accessing new areas or performing new attacks. This is the recommended option for Nvidia users with a previously built cache searching for the highest performance. While it has its uses in some edge cases, we recommend Windows AMD and Intel users to run Vulkan instead. Linux Mesa users don’t have this limitation and can enjoy GLSL without issues, thanks to far more mature drivers.

Next up is GLASM, what in the past was called Assembly Shaders. This is an Nvidia only feature, it provides lower performance than GLSL or Vulkan, but the second fastest shader build times, only behind Vulkan’s parallel shaders. We recommend Nvidia users interested in using OpenGL to first run games in GLASM in order to build their pipeline cache, and once done, move to GLSL to get the best performance without suffering the shader stuttering associated with GLSL. Any other GPU vendor will skip GLASM and default to GLSL.

Finally, SPIR-V, the default backend of Vulkan, which is a valid option in OpenGL since the release of version 4.6. Originally, we wanted OpenGL to use this backend, discontinuing support for GLSL.

Reality always hits back like the laws of thermodynamics, delaying the release of Hades for several months. Driver support for SPIR-V in Windows is very bad (especially for Nvidia), with only the Linux Mesa drivers having a correct and fast implementation. So we decided to keep the option as an experimental feature, focusing on the old GLSL and GLASM backend first. We plan to improve SPIR-V rendering and performance later. Ideally, SPIR-V in OpenGL should be a jack-of-all-trades, a mix of the performance of GLSL and the shader build times of GLASM.

So to ease our user’s decision on what to choose, here’s a chart of all possible options for the most common GPU vendors.

Another important change is in how GPU accuracy operates. In the past, certain games like Pokémon Sword & Shield required using High GPU accuracy to get the best performance. This is no longer the case. Now, Normal consistently produces the best performance, at a low cost in accuracy, while High produces better particle effects and lighting, at a low performance cost.

We removed values that should be enabled by default from the bottom left action buttons of the user interface, like Asynchronous GPU shaders and Multicore. In their place, users can now switch between Normal and High GPU accuracy while playing. A fast and easy way to test what’s better for each game, GPU vendor, and API.

 Old (top) vs new (bottom)

Old (top) vs new (bottom)

Thank you, Captain Obv — er, I mean, Captain Vortex

Communication is vital for any project, and it is essential that we make our configuration options even more explanatory than they already are.

We want to thank our fellow developer, Vortex, for the marvelous change of rewording the explanation of GLASM. This change was made in order to elaborate that it is, effectively, a shorthand for OpenGL Assembly Shaders.

 This is critical

This is critical

Fear not, my fellow yuz-ers, for we always have the most serious and capable people doing only the best work for your benefit. Rest assured that if a similar situation were to arise again in the future, Vortex will have your back. I salute you, my dear friend, and pray that you may ennoble yuzu even further with your future contributions.

Graphical fixes

epicboy was very busy during the development of Hades, and continues to be busy after it was finished.

World 1-5 of Super Mario 3D World + Bowser's Fury used to crash when loading on AMD and Intel GPU equipped systems running Vulkan. A depth image was being cleared as a regular colour image and, while OpenGL is totally fine with this, Vulkan is more strict which lead to a crash. By only clearing valid colour images, epicboy resolved the issue.

 Affected world in Super Mario 3D World + Bowser's Fury

Affected world in Super Mario 3D World + Bowser's Fury

As a way to limit the maximum framerate a dynamic FPS game can run at, epicboy implemented a multiplier based cap. So, for example, if a game natively runs at 30 FPS, but can be run without issues at 240 FPS, setting an FPS cap of 8x will limit the FPS unlimiter to that value. Ideal for high refresh displays!

To avoid confusion with the FPS unlimiter, the old Frame limit was renamed to Speed limit.

 You can find the new options here

You can find the new options here

From before the release of the texture cache rewrite, a regression has existed that caused users' screenshots to save in the wrong directory. Turns out a single directory separator was missing in the code. Now, screenshots will work correctly by either pressing the Ctrl + P hotkey, or via selecting the Tools > Capture Screenshot… menu option, and save in the selected folder.

epicboy also added support for taking screenshots in the Vulkan API, solving an old issue from way back when Vulkan was first implemented two years ago. How time flies…

Finally, before being dragged against his will to work on Hades, epicboy was working on improving the performance of our compute shader accelerated ASTC decoder. By reducing the size of the workgroup, making some code simplifications, moving some look up tables, and other changes, performance increased by 15% on average. Astral Chain and similar titles that madly love ASTC should see more consistent frametimes with this change.

Blinkhawk has also been working constantly lately, not only on Hades and several other improvements, but also in some top secret projects we will mention later.

Koei Tecmo games are usually quite special, they never fail to give our developers headaches thanks to… unique decisions the studio makes. It’s not an exaggeration to say that Project Hades’ main motivation was improving how these games run in yuzu.

One of the remaining issues with Hyrule Warriors: Age of Calamity, Fire Emblem: Three Houses, and similar Koei games was instability caused by running them in High GPU accuracy when loading specific levels. In Blink’s own words, a simple fix, and the problem was solved.

Another old regression introduced by the Buffer Cache Rewrite affected particles in games like The Legend of Zelda: Breath of the Wild, the rendering of the BowWow in The Legend of Zelda: Link’s Awakening and caused vertex explosions in Unreal Engine 4 games like Yoshi’s Crafted World, BRAVELY DEFAULT 2 and similar. Tuning how to handle high downloads and not fully waiting for command buffers to finish solved these issues. To make the best out of this change, High GPU accuracy needs to be enabled.

High GPU Accuracy is recommended (The Legend of Zelda: Breath of the Wild)

When Blinkhawk introduced the new fence manager while working on improvements for Asynchronous GPU Emulation two years ago, some frame delays came with it, causing stuttering in gameplay even if the framerate counter showed a solid 30 or 60 FPS value. To counter this, Blink starts pre-queueing frames, providing a smooth gameplay experience, especially noticeable if the user’s hardware can’t sustain perfect performance constantly.

Smooth as butter (Xenoblade Chronicles Definitive Edition)

Rodrigo has also been hitting those keycaps without rest.

Hyrule Warriors: Age of Calamity suffered from very dark environments due to unprepared images that were used as render targets. When their dirty flags were not properly set, a desynchronization happened on the texture cache, causing the issue shown below. By correctly preparing such images, the game renders correctly.

That looks like the Dark World to me (Hyrule Warriors: Age of Calamity)
That looks like the Dark World to me (Hyrule Warriors: Age of Calamity)

That looks like the Dark World to me (Hyrule Warriors: Age of Calamity)

By optimizing shaders doing FMA operations, yuzu gains an extra 4% of performance overall.

By flipping the viewport in Y_NEGATE, Rodrigo matches in Vulkan the correct behaviour OpenGL has, resolving the “flipping” issues for the following games: Katana ZERO, UNDERTALE, DELTARUNE, Shantae, Fire Emblem: Shadow Dragon and the Blade of Light, and others.

Fire Emblem: Shadow Dragon and the Blade of Light
Fire Emblem: Shadow Dragon and the Blade of Light

Fire Emblem: Shadow Dragon and the Blade of Light

Xenoblade Chronicles 2 experienced crashes with the Vulkan Mesa drivers due to them lacking null buffers in its transform feedback bindings. Rodrigo had to emulate the lack of this function in order to solve the crashing.

AMD Radeon Linux users may have noticed that The Legend of Zelda: Skyward Sword would run at very slow framerates in stable versions of the OpenGL Mesa drivers. This is caused by a driver level bottleneck resulting in very slow PBO (Pixel Buffer Object) downloads. While the current mesa-git has this bottleneck solved, a solution is needed until those fixes reach the stable release versions. By specifying the GL_CLIENT_STORAGE_BIT flag, an alternative faster path can be used, increasing performance from around 8 FPS to a solid 60 FPS. Mesa drivers are the best drivers.

Morph also contributed with some graphical fixes.

New Super Mario Bros. U Deluxe provides video tutorials accessed via the web applet. Prior to his fix, trying to access that list would only result in the game returning to the previous menu. By implementing how to handle Nintendo CDN URLs in the web applet, this section of the game can now be accessed.

 Video playback is still a work in progress (New Super Mario Bros. U Deluxe)

Video playback is still a work in progress (New Super Mario Bros. U Deluxe)

Morph also solved a quite specific render issue affecting users with multiple displays. If two or more monitors were in use and the user started a game from any display besides the primary, black borders would appear in the rendering window. To solve this, Morph needed to tell Qt to create a dummy render widget.

Sonic Mania, in proper pixelated format
Sonic Mania, in proper pixelated format

Sonic Mania, in proper pixelated format

Newcomer yzct12345 arrived like a sonic boom, implementing critical improvements at impressive speeds!

By ignoring an invalid texture operation, an early crash affecting Pokémon: Let’s go, Eevee! & Pikachu! in Vulkan was solved. No more crashes when catching your first Pokémon. Gotta catch ’em all!

yzct12345’s work on optimizing UnswizzleTexture resulted in up to double the performance for video decoding, and it also improved general gameplay! This results in far smoother video playback and a considerable reduction of the CPU performance needed to get a pleasant gaming experience. Thanks!

toastUnlimited is our specialist in Linux testing and bug reporting. He noticed that the rune teleporting animation in The Legend of Zelda: Breath of the Wild wasn’t working correctly on the Iris and RadeonSI Mesa drivers, the default OpenGL drivers for recent Intel and AMD GPUs, respectively.

Thanks to instructions the Mesa driver team gave us in how to properly use BindImageTexture, toastUnlimited was able to implement the needed changes in yuzu, making the animation render correctly.

Well excuse me, Princess (The Legend of Zelda: Breath of the Wild)
Well excuse me, Princess (The Legend of Zelda: Breath of the Wild)

Well excuse me, Princess (The Legend of Zelda: Breath of the Wild)

K0bin arrived to give us a hand, fixing an important screw up we made.

Prior to the introduction of full support on Resizable BAR in modern GPUs and systems, the PCI-Express standard is limited to a buffer of 256MB in video memory to communicate to the CPU at a time. yuzu uses this small portion of VRAM for allocating its staging buffer, but if the user has GPU intensive background applications, there may not be enough space to allow the allocation to happen, and yuzu would refuse to create a Vulkan instance, failing to boot any game. OpenGL is, as usual, excluded from this issue thanks to letting the GPU driver handle those allocations on its own.

K0bin fixes this issue by performing the allocation in system RAM if there isn’t enough free space. Many thanks!

Stop This Sound!

Maide has been dedicated to improving the audio side of yuzu this month. By introducing some missing PCM formats, the missing audio present in Ys IX: Monstrum Nox was finally fixed.

PCM stands for Pulse-Code Modulation, an encoding technique used to represent analogue audio signals digitally. These signals have two basic properties: The sample rate (how many samples of the signal are taken per second), and the bit-depth (how many bits are used to represent the “loudness” of the signal’s samples at any given point in time).

In this PR, Maide introduced a number of methods to process formats that were missing in the current implementation — namely, the ability to decode PCM files encoded with a bit-depth of 8-bit, 32-bit, and also floating-point values. Since previously none of these formats were being decoded by yuzu, any audio file that made use of them was not being reproduced, causing this behaviour.

Tales of Vesperia was another title with sound problems, in which audio would be played monaurally through the left channel — far from a pleasant experience, as you can hear here:

This game in particular would request the available number of active channels and cap its output based on this information — in other words, the game would not output audio to more channels than what yuzu reported. Since yuzu was always returning a value of 1, the game ended up outputting all the audio into the left channel. Thus, this problem was fixed by reporting two channels as active instead of one, which is now mixed properly:

Not satisfied with just this, Maide also went on and changed the downmixing logic, which improved the audio in titles such as The Legend of Zelda: Link's Awakening, New Super Mario Bros. U, Disgaea 6: Defiance of Destiny, and Super Kirby Clash.

Simply put, downmixing refers to the process of combining multiple audio channels so it is possible to reproduce them in a system with a lower number of available audio channels. There is some mathematics involved here and there, but the general idea behind it is to balance the volume of these individual channels so that the resulting signal sounds centred.

In the case of the Nintendo Switch, many games report six audio channels as available to the system, even though they end up providing data for only two channels (stereo sound). Consequently, yuzu would think the games used all these channels and then “downmix” them to stereo, affecting the volumes of the left and right channel in the process, which would end up being much quieter than needed. While the math used by yuzu is valid when used to downmix six channels to two, it certainly was not a desirable effect in this case. Therefore, Maide changed the code to preserve the volume of these channels if the audio in a game is already stereo, which now reproduces with the correct levels.

On the same page of volume problems, there was also a bug with the volume in certain areas of Xenoblade Chronicles 2, where it would occasionally spike out of proportion. Maide tracked down the cause of this behaviour, and discovered that the gain of the samples sent by this game to yuzu had their values set as NaN.

A NaN is a type of placeholder used by computers to represent numeric values that can’t be defined otherwise — hence, the acronym Not a Number. Particularly, the problem at hand lies in the fact that, in order to operate over the samples of the audio signal, these gain values must be converted into integers. But in this process, the NaN values, in turn, become obscenely large positive and negative integer values.

As these samples were further processed by yuzu before sending them to your sound system, these gains would distort and cap the volume of the audio samples to their maximum or minimum value, causing this bug. To prevent this problem, Maide added a check that changes the gain value from NaN to zero in such cases, so that no error is propagated along the mixing.

This is mostly a workaround, as it still remains under investigation why the game is yielding these NaN values, but at the very least, it should help prevent a number of bleeding ears here and there.

Input changes

Along with some miscellaneous quality of life changes, german77 also changed the behaviour of recently enabled controllers in yuzu. In order to provide the most precise experience, now sticks will be auto-centred the moment the device is detected by yuzu. Surprisingly, this happens with almost every game controller, sticks are always slightly off-centre, and if the dead zone value is small enough, users would experience slight drifting during gameplay. No drift in this emulator!

Thanks to internal changes on how settings are stored, the default values of mouse panning were affected. german77 restored the previous values and additionally solved an issue where the emulated stick would always look down if the user had multiple screens. No drift, not even with the mouse!

UI changes

A silent change that has the potential to improve performance considerably for users of old or low-end CPUs has been made by toastUnlimited. In the past, we recommended our users to manually select the Unsafe CPU accuracy option if their CPU lacked the FMA instruction set. This is not only confusing for users, as it required them to know if their specific CPU model was compatible with FMA, but also relied on communication channels and guides properly explaining this to as many people as possible. This, of course, resulted in several users not even knowing why games performed so poorly.

Additionally, it was later discovered that using the whole Unsafe preset can cause precision issues affecting things like the shape of the character hitboxes in Super Smash Bros. Ultimate. A better solution was needed.

In response, toastUnlimited implemented the all new Auto CPU accuracy setting! Enabled by default for all users, this setting determines the need to use the Unfuse FMA value automatically by reading if the FMA instruction set is supported by the CPU in use. It also sets other values, for example Faster ASIMD instructions, to boost the performance of 32-bit games. Auto CPU accuracy has the potential to more than triple the performance of users running old or very low end CPUs!

Thanks to work done by Morph, now all default six Miis are available to the user in games that request them.

 We're still far away from offering full Mii customization, but at least more options are available now (Mario Kart 8 Deluxe)

We're still far away from offering full Mii customization, but at least more options are available now (Mario Kart 8 Deluxe)

lioncash, our silent optimizer and code cleaner, found an issue with the new strings that report if a game is 32 or 64-bit. Languages that read from right to left had issues with the initial implementation, and translating this string was disabled. Now both issues are resolved!

After gathering more information on the behaviour of the FPS unlimiter, epicboy discovered that some games will crash when attempting to boot them unlimited. Simply force-enabling the limiter at each game boot is enough to solve the issue. Remember to unlock the framerate manually after you start a game!

vonchenplus is back with another nice addition! Certain game dumps contain several games inside them, and yuzu would default to only launching the first one in the list. This change makes the necessary modifications so all games are properly listed.

Command-Line Shenanigans

toastUnlimited performed a general update to the settings of the command-line version of yuzu, yuzu-cmd.

The previous implementation had many options that were originally carried over from Citra and later deprecated, as well as others that were not read properly from the ini file, or were read but not written into the ini file, etc. In other words, there were a lot of things wrong with it, and some updating was in order to properly synchronize everything back with the settings present in the main executable of yuzu.

Since toastUnlimited was already working on yuzu-cmd, he also went on and fixed some problems related to Vulkan on Linux. When this executable was launched, it wasn’t able to detect the window manager, and would proceed to exit instead of booting a game.

The cause behind this problem lies in the fact that we recently updated how the SDL external library is being fetched for our Linux binaries, which came with a dummy configuration file with invalid settings. toastUnlimited made it so that we manually include the correct generated configuration file for building SDL instead of this dummy, while also adding some new logging information to report when support for a window manager was not compiled.

In a follow-up PR, toastUnlimited also added support for the full-screen mode settings to yuzu-cmd. Concurrently, he also fixed a bug that caused yuzu to render with the wrong resolution when in full-screen.

Some time ago, we made it possible for yuzu to run in different full-screen modes, but these options were never added to the command-line version of our executable, which is addressed in this PR.

Technical Changes

Meanwhile, bunnei was busy improving the management of kernel objects, drastically reducing the amount of objects that kept dangling in memory after closing the emulation session.

A dangling object refers to a stale object in memory that still has references, even though the object is no longer in use.

In yuzu, kernel objects are implemented so that they keep track of themselves through a reference counter, which keeps the object alive for as long as they’re needed. In other words, whenever a process needs an object, the reference counter is increased, and conversely, the reference is decreased when the object isn’t needed any more.

Once this value reaches zero, the object is automatically deleted. Previously, yuzu wasn’t doing a great job at maintaining this reference counter, as these kernel objects can be called by more more than just one process — i.e. the “owners”, who are responsible for freeing the resource once they’re done with it. In some cases, some of these owners weren’t properly freeing the object at all, which meant that the reference counter never reached zero, leaving this object “dangling” in memory, even though the information became basically useless at this point.

One of the many jobs of the kernel in the OS is to keep track of all the resources available in the system. For this reason, these dangling objects were a problem, as the kernel calculates the number of resources that can be spawned based partly on the number of active objects in memory. With dozens of different kernel objects being created thousands of times between emulation sessions, this easily saturated the amount of objects that could be spawned due to yuzu hitting the resource limits much earlier than expected. What’s more, since these objects stick around even after the emulation session is closed, this is a memory leak that would gradually increase for as long as the emulator is running a game — in other words, it would persist even if you stopped the emulation and started it again.

bunnei took a long look at the problem and improved the situation, but there’s still ongoing work to make our implementation more robust and accurate.

Blinkhawk also had his share of bug-fixing work, as he revisited the texture cache code related to 1D-to-2D texture overlaps, which fixes problems in Monster Hunter Rise and both the the trial and final versions of Monster Hunter Stories 2: Wings of Ruin.

Similarly to how two-dimensional textures are mapped to three dimensions, one-dimensional textures are a simple type of texture that is mapped as two-dimensional when rendered on the screen. The problem here lies in the fact that the GPU is unable to tell the difference between a one-dimensional texture and a two-dimensional texture with a height of one.

As such, it was necessary to add support for them, so that they can be processed correctly by our texture cache. With the changes in this PR, Blink made it so that they can be copied seamlessly, fixing this faulty behaviour.

From Zero to Hero (MONSTER HUNTER STORIES 2: WINGS OF RUIN Trial Version)
From Zero to Hero (MONSTER HUNTER STORIES 2: WINGS OF RUIN Trial Version)

From Zero to Hero (MONSTER HUNTER STORIES 2: WINGS OF RUIN Trial Version)

If you’re interested in a more technical explanation about textures and their types, we recommend reading the D3D11 documentation provided by Microsoft.

Future projects

Project A.R.T., the designated name for our revival of the resolution scaler, has started, and early results look very promising!

 Getting blue-shelled at 4K doesn't really help with anger management (Mario Kart 8 Deluxe)

Getting blue-shelled at 4K doesn't really help with anger management (Mario Kart 8 Deluxe)

There are many bugs to fix, optimizations to make, and tons of testing to do before we can confidently release this feature. So for now, know that the scaler is returning!

toastUnlimited started the preliminary work to get an operational Linux installer to accompany our current Windows one. This also means offering precompiled builds for both Mainline and Early Access. Once it is finished, Linux users will no longer need to be forced to build from the source (if they so prefer)!

That’s all folks! As a certain AI singer would say, thank you for your kind attention. See you next time!


Please consider supporting us on Patreon!
If you would like to contribute to this project, checkout our GitHub!