Progress Report April 2022

Written by GoldenX86 and CaptV0rt3x on May 11 2022

Hello yuz-ers, the month of April has been amazing! We’ll discuss CPU and Kernel performance improvements, several GPU emulation changes, UI tweaks, and more!

Saving Princess Peach yet again

Continuing his work to better support the official GameCube/Wii and Nintendo 64 emulators (codenamed Hagi and Hovercraft respectively), byte[] has introduced several new PRs to further improve the compatibility of the titles included within Super Mario 3D All-Stars.

byte[] first implemented support for GLSL in Super Mario Sunshine, as not everyone can run Vulkan. This is achieved by adding support for indirect addressing in OpenGL.

This change doesn’t include support for GLASM at the moment, since our developers aren’t too fond of having to deal with NVIDIA assembly shader code. Imagine being asked to fix an issue in a car engine, and the only given tools for the job are a rock and a stick.

However, that was only half the battle. Proper OpenGL support for Super Mario Sunshine and Super Mario Galaxy required solving an old limitation we had with the aging API: broken Z scale inversion.

Most Switch games use either OpenGL, the popular free graphics API, or NVN, the proprietary NVIDIA API exclusive to the console. Arguably, NVN is much closer to OpenGL than Vulkan in how it operates.

The Tegra X1 GPU on the Switch is flexible enough to allow the coordinate system to be changed at the discretion of the game developer. While most games will behave closer to what OpenGL expects, with the Z-axis facing away from the camera, Hagi and Hovercraft emulated games (which render using Vulkan and is exclusive to a tiny handful of titles on the Switch) have the coordinates inverted and the Z-axis facing into the camera, the way Vulkan games would expect to natively render.

 byte[]'s Z-axis diagram

byte[]'s Z-axis diagram

This was not an issue if you wanted to play Super Mario Galaxy or Super Mario Sunshine in yuzu with yuzu’s Vulkan backend, as the behaviour matched with what the game expected. But if you tried to play using OpenGL instead, yuzu would not correctly interpret that the faces were flipped due to the Z scale inversion, and thus rendered only the back faces of objects.

The solution is very simple, flip the front faces when the Z-axis is inverted.

Welcome to the Shadow Realm Resort (Super Mario Sunshine) Welcome to the Shadow Realm Resort (Super Mario Sunshine)

Welcome to the Shadow Realm Resort (Super Mario Sunshine)

Next in line, you may have noticed that Super Mario Sunshine rendered with a black bar at the bottom. This is because the Wii and GC games natively use an aspect ratio different to the usual 16:9 we’re used to. Instead, the games render at a 5:3 aspect ratio. Super Mario Galaxy informs the system to explicitly crop the screen to its native resolution of 1920x1012, but Super Mario Sunshine does not, so yuzu previously did not attempt to crop the game, resulting in a conspicuous black bar at the bottom of the render.

 Diagram of the cropping process

Diagram of the cropping process

While the game proportions in Super Mario Sunshine, arguably, appear more correct with the black bar, that’s not how Nintendo intended the game to be played. For accuracy’s sake, byte[] interprets the game’s implicit crop request, which stretches the image to match the native 1920x1080 resolution of the Switch, both for Vulkan and for OpenGL .

Do not adjust your set (Super Mario Sunshine) Do not adjust your set (Super Mario Sunshine)

Do not adjust your set (Super Mario Sunshine)

In the previous report, we mentioned how S8D24 < > ABGR8 texture conversions allow Super Mario Galaxy star bits to behave correctly. Well, it’s OpenGL’s turn to join the fun.

 S8D24 to ABGR8 texture conversion diagram

S8D24 to ABGR8 texture conversion diagram

We mentioned last month how Super Mario 64 had special requirements to start running on yuzu. Most games compile their code ahead-of-time (AOT), that is, before being shipped to you. The Operating System’s job is to execute that precompiled binary code, and then you’re playing games.

Super Mario 64, on the other hand, runs just-in-time (JIT), to make it easier to develop the Hovercraft emulator, and to allow reusing the same Hovercraft binary for different games. The Hovercraft emulator loads a native Nintendo 64 ROM of Super Mario 64, and then its JIT compiler takes the ROM and translates the original MIPS (the architecture of the Nintendo 64’s CPU) instructions into AArch64 (the Switch’s CPU architecture) instructions on the fly. Only then will the operating system execute the game code.

 Ahead-of-time versus Just-in-time compilation diagram

Ahead-of-time versus Just-in-time compilation diagram

This is similar to how yuzu translates AArch64 instructions into AMD64 instructions with the assistance of Dynarmic.

The JIT service, which is required to use JIT compilation on retail titles, is a functionality that yuzu didn’t have implemented, simply because no other game had ever needed it. Additionally, there were some obstacles to implementing it in a direct way, since it requires calling custom code supplied by the game, something which was never needed by any previous service implementation. So, some preliminary stubs aside , byte[] implemented the HLE JIT service to allow the Hovercraft emulator to function and Super Mario 64 to boot.

In a separate PR , byte[] adds documentation of how the JIT service interface operates. This should help other open source projects, if needed.

Of course, this wasn’t enough to get Super Mario 64 playable, as there were rendering issues to solve as well.

It’s never that simple… but let’s try to explain it simply. Nintendo Switch games bundle their own individual GPU driver with each game. This is done to increase compatibility, you don’t need to update every console in the world if a driver version has an issue.

For unknown reasons, either the Hovercraft emulator or the bundled GPU driver reports Vertex Buffers that are simply too large, especially when compared to what the game actually uses. Whether it’s an issue in the included emulator or just a driver bug, we can’t know for certain, but we do need to work around this problem.

 Erroneous Vertex Buffer size diagram

Erroneous Vertex Buffer size diagram

So, instead of using the insane reported buffer size, byte[] says NO! and uses the backing memory size instead.

 It's-a him! (Super Mario 64)

It's-a him! (Super Mario 64)

Performance on Vulkan is not stellar for now, but you can finally enjoy all 3 of the Super Mario 3D All-Stars games with both APIs.

Lastly, Morph implemented a fix to keep the web applet open in the foreground , as the Super Mario 3D All-Stars games require it or else they would crash a few minutes into gameplay.

General graphical fixes

Following up on last month’s NVFlinger rewrite, bunnei continued to track issues and bug reports. He fixed the reported issues and further cleaned up the code to improve code quality. See the code changes for the NVFlinger rewrite here .

Xenoblade Chronicles 2 and Hyrule Warriors: Age of Calamity would experience interesting issues which were caused by the new GPU Garbage Collector introduced as part of Project Y.F.C.. We talked about those changes back in January.

As you can see on the top bar below, Xenoblade Chronicles 2 would use exorbitant amounts of VRAM in OpenGL (top bar). The bottom bar shows the result after the fixes were implemented.

 Not the best way to test your whole VRAM (Xenoblade Chronicles 2)

Not the best way to test your whole VRAM (Xenoblade Chronicles 2)

Age of Calamity would display interesting graphics at random intervals:

This is why you don't blast Caramelldansen too hard (Hyrule Warriors: Age of Calamity) This is why you don't blast Caramelldansen too hard (Hyrule Warriors: Age of Calamity)

This is why you don't blast Caramelldansen too hard (Hyrule Warriors: Age of Calamity)

Blinkhawk fixed the regressions and both games are back in business.

Often times in emulation, when you fix one issue, another pops up. The cropping fix byte[] implemented for Super Mario 3D All-Stars had the lovely unintended side effect of breaking rendering for homebrew apps in Vulkan. Thankfully, Morph added the magic line to the code that solves this regression.

Skyline framework: Part 3

There has been important progress in getting the Skyline modding framework working. Here are the two links if you missed our previous progress reports on the subject.

tech-ticks has been quite busy working on the finishing touches . The latest changes include:

  • Better LayeredExeFs support, which results in easier mod distribution and self-updating capabilities.
  • Support for the SO_KEEPALIVE socket option, which allows the Skyline TCP logger to operate.
  • Implementation of DNS address resolution, which is required by plugins that use HTTPS requests.

We must mention that while Skyline kernel support is basically finished, bugs in yuzu’s codebase prevent proper operation of the modding framework. For example, due to underlying emulation issues, ARCropolis won’t work until Project Gaia is finished, and some of the changes previously mentioned need some fine tuning from our part to function properly.

There’s yet more work to do, but we’re a lot closer. We can see the finish line!

UI improvements

Merry, the core developer of Dynarmic, made some changes to the add-ons game properties window , improving column widths.

Low resolution users will like this

The hotkeys configuration window also got some love , changing the minimum column width.

It's also great for GNOME users

Both changes are extremely beneficial for bloated or size unoptimized desktop environments, like GNOME Shell.

Tachi107 fixed some embarrassing typos in our logging , and updated the About yuzu window to properly mention our new licence, GPLv3.0+. The + is there because we want to leave the door open for newer revisions.

 Screenshot of the yuzu About box

Screenshot of the yuzu About box

Not stopping there, Taichi performed some cleanup and made improvements to Flatpak builds , including using proper app ID, fixing some typos, and adding a launch parameter to make yuzu use the dedicated GPU by default on Linux instead of the integrated GPU.

Docteh has also helped considerably in improving yuzu’s UI.

With a bit of manual thinkering, they managed to bypass some Qt limitations in order to display more readable hyperlinks in our dark themes.

People seem to have forgotten what hyperlinks are for, just click them!

Thanks to a report from GillianMC in our Discord server, Docteh found out that some quirks in the Qt API caused the compatibility status of listed games to not be translated. The cause lies in QObject, you can find the specific details in the pull request’s description . Now the status is properly reported in the corresponding language.

Example in Spanish

Similarly, D-Pad directions also didn’t translate properly. The same suspect, once again . Someone, please send a warrant asking for the detention of Carmen Sandiego.

Example in French

Kernel and CPU emulation changes

Let’s begin with two changes that happened in March.

Our resident bunnei rabbit continued his work on rewriting yuzu’s kernel memory management to make it accurate to the latest system updates. This time, he tackled and revamped how the kernel code memory is mapped and unmapped .

Code memory support, in the context of the Switch, allows games and apps to load and unload smaller parts of their code on the fly. Thanks to these changes, Super Smash Bros. Ultimate no longer causes memory access issues while loading/unloading NROs, making the game stable for long play sessions.

bunnei also migrated slab heaps for the guest (Switch) kernel objects from host heap memory to emulated guest memory. With this change, yuzu’s memory layout is now more closely matching the console.

A slab represents a contiguous piece of memory. A heap is a general term used for any memory that is allocated dynamically and randomly.

The slab heap is the space used to store guest kernel objects. By moving these away from the host (PC) heap memory (RAM) to emulated guest (Switch) memory, we can ensure that the kernel objects never go beyond the system limits and cause memory leaks on the host (PC).

Thread local storage (TLS), the mechanism by which each thread in a given multithreaded process allocates storage for thread-specific data, was also rewritten making it accurate to the latest HorizonOS behaviour.

With these changes, we have now completely fixed the kernel memory object leaks that affected a few games, but went largely unnoticed, due to the previous implementation allowing unlimited allocations.

Back to the list of April changes, bunnei also reimplemented how yuzu handled thread allocation for HLE service interfaces.

Services are system processes running in the background which wait for incoming requests. The Switch’s HorizonOS has various services that perform various tasks e.g Audio, Bluetooth, etc.

Previously, we used to allocate one host thread per HLE service interface because -

  • some service routines need to potentially wait a long time for completion, like network or filesystem access, and
  • we don’t support guest thread rescheduling from host threads.

A thread which is blocked will have to wait until the action that blocked it, such as I/O or simply sleeping for some amount of time, completes.

The issue with this approach was that since it’s the host (Windows or Linux) that schedules the service threads, yuzu could create weird behaviour particularly on systems with hardware limitations due to the spawning one thread per service and the sheer number of service implementations we emulate.

With the rewrite, yuzu now has a single “default service thread” that is used for 99% of the service methods that are non-blocking. For the services that are time-sensitive and for those that need blocking, we still allow thread creation (e.g. Audio, BSD, FileSystem, nvdrv)

This brings down the service thread count from double digits to single digits, thus improving stability and consistency - especially on systems with less cores. Users with 4 thread CPUs (either 2 cores + HT/SMT, or 4 cores) should see performance and stability improvements in most games.

Another battle for proper shutdown behaviour is fought and won. yuzu currently does not emulate multi-process capabilities of the HorizonOS kernel, because it is not required to emulate any games. However, the multi-process APIs that are used by games still need to be managed in the way they expect. All HorizonOS services have a port (for both client and server) that is used as a channel of communication between the game process and service process. A session is opened for each communication interface for them both and they are managed by their respective kernel objects. When the game closes the client port, the service closes the server port, and everything is shut down.

The issue with our previous implementation was that yuzu wasn’t properly tracking all the KServerPort and KServerSession objects for each service. And because of this, the services weren’t properly getting closed, which in turn was causing further issues.

This originally worked fine, but was regressed when we migrated guest kernel objects to emulated guest memory, as we mentioned previously. bunnei figured out the issue and quickly reimplemented how we track these kernel objects .

By having a single place where we can register/unregister open ports and sessions, we can now keep better track of these kernel objects. And by ensuring that they are closed when we tear down all services and kernel, we get much better emulation shutdown behaviour.

Input changes and general bugfixes

If the user sets a very high DPI value for their mouse while using mouse panning, the cursor may be able to escape the rendering window. IamSanjid implemented the required fixes , including better centering timings to solve this issue. Thanks!

german77 has several fixes ready for us.

Let’s begin with an interesting one. yuzu’s screenshot capture feature allows an easy way to save moments at the resolution the scaler is currently set at. The hotkey for screenshot capture could be spammed, leading yuzu to a crash if several requests for screenshots were sent. This could be worsened if the rendering resolution was set to a high value. To solve this, yuzu now ignores new requests while a capture is being processed , and prints a warning in the log.

There’s always room for improvements in emulation, as nothing is ever truly complete. This time, german77 focuses on inaccuracies found in our input emulation.

IsSixAxisSensorFusionEnabled is implemented by reverse engineering all Sixaxis functions, and it was verified by comparing with unit test homebrew results done on the Switch. This should potentially improve motion accuracy.

The HID service in charge of handling input commands, among other things, used to operate by copying its assigned shared memory and reporting back the changes. This leads to mismatches or delays in the input process, and can potentially make games read completely incorrect data.

Obviously this isn’t ideal, so german77 gets rid of the memory duplication and uses the ever magical * pointers to access the shared memory directly instead. This can fix bugs affecting countless games, with the biggest example being the Pokémon: Let’s Go games having a hard time detecting controllers.

Hotkey presses will now be triggered by using a queue . This has the benefit of not having to wait for the UI to respond, reducing their delay.

Analog sticks got some love , with a couple of important changes in their mapping:

The default maximum range is now set to 95%, to ensure that games get to use the whole range. This change, for example, avoids characters walking when the stick is at certain angles in games like Pokémon Legends: Arceus. The minimum range was lowered from 50% to 25%, providing greater precision, particularly for people trying to play racing games with a matching wheel. Auto-center correction is stronger now, avoiding drifting without having to rely on stronger dead-zone values. Individual axis values can be manually deleted now if buttons were mapped manually.

Previously, only Player 1 could automatically reconnect a controller by pressing a button. Other players only could do so when using a keyboard. german77’s pull requests aims to solve that, allowing any of the remaining 7 players to reconnect their controller . No privileges for those higher in the hierarchy anymore.

This change is under testing at the time of writing, as it could potentially cause regressions. Be sure to use the status hovercard to check back in a few days!

Future projects

Project Y.F.C. is not far away from releasing its first part of two planned.

Project Gaia continues to progress slowly but surely, now causing some previously broken games to finally boot for the first time.

Minecraft and Mortal Kombat 11 are now booting!

That’s all folks! We’re still playing catch up with some kernel and CPU optimization changes, so expect a more extensive section next time. Thank you for the company, see you next month!


Please consider supporting us on Patreon!
If you would like to contribute to this project, check out our GitHub!