An endless cycle that was not: the story of the Holy Grail bug

Once upon a time there was a game for GBA called Hello Kitty Collection: Miracle Fashion Maker. It was a cute game based on the famous Sanrio Hello Kitty franchise and developed by Imagineer. But under the guise of a seemingly innocent name was an insidious problem. For some reason, this simple game did not run on any GBA emulator. But this alone would not be enough to call the problem a bug of the Holy Grail. Like all the bugs of the Holy Grail, this bug itself was completely confusing. The explanation was simple: at some point in the sequence of starting the game, it fell into a cycle from which it never exited , waiting for a certain value to be read from a memory that it does not exist . Although there are similar bugs in many games, for example, in the intro popularThe Legend of Zelda: The Minish Cap , they rely on special behavior caused by reading invalid memory addresses. But this cycle seemed to violate such behavior. Nevertheless, the game worked on real equipment. Moreover, the exact same bug occurred when loading a save into the Sonic Pinball Party after a cold reboot. Could the expectation of these invalid memory addresses be somehow erroneous? But if so, how?


But this is illegal, right?


Wait a minute - if you are trying to access invalid memory, then the game just needs to crash, right? An unresolved operation, segfault, or some other error should occur . Right?

Well it is more like Yes. But not really. At least not on the GBA.

In the architecture of the ARM processors that were used in GBA, this erroneous state is called data abort and occurs only when you try to access memory for which the memory manager has not assigned read permission 1 . When data abort occurs, the processor completes what it was doing and goes to the exception vectorassigned to data abort exceptions. Then the operating system can choose one of the solutions: kill the current process, assign a page fault memory , let the process cope with the situation, as some emulators JIT do it with “fastmem”, or perform some other actions.

How does the GBA handle data abort? The exception vector entry for data abort is located in the boot ROM of the GBA console (or, as it is also called, in the BIOS). If the GBA encounters data abort, then it tries to go to the DACS 2 handlerif it exists, otherwise blocking occurs. No commercial game has DACS handlers. So why is this game not freezing? Everything is very simple - GBA never generates data abort. It does not have a memory manager (MMU) (or even a memory protection unit, as in DS), so it just continues to work and reads out invalid memory.

The memory bus enters the scene.



What is invalid memory in general? How does she look? This is the main snag. This is a difficult situation: what the code reads depends heavily on what the CPU recently did, or, more precisely, what the memory bus did recently . In short, when accessing an invalid memory, the CPU reads what was the last on the memory bus. To understand what follows from this, you need to learn a little about the memory bus and how it works.

A memory bus is part of an electronic circuit that connects the CPU to all the platform's memory components. On the GBA, several devices are connected to the memory bus: working RAM, video memory, and the cartridge bus. When the CPU tries to access the memory, it tells the memory bus which address it needs access to, and then the component corresponding to that address is activated. Then the component places the value at this address on the bus, which may take several 3 cycles , and then the CPU can finally read the value from the bus. In the case of the GBA, if no equipment is associated with the address, then no value is written to the bus, and the CPU reads any value placed last on the bus. The situation can vary in different ways, for example, if the read was 16-bit, and the CPU tries to perform 32-bit read, but in general it will always be a value from the bus. Developers call this feature "open bus." Earlier, I wrote how it affects other games .

Well, it seems that everything doesn’t look so bad ... Right?


So you can just cache the last memory access? And then bring it back again? In the general case, this approach will work, but there are certain difficulties. First, you need to make sure that all memory access operations are in the correct order. This is more complicated than it sounds, because the CPU accesses the memory with each instruction to get the next instruction in the pipeline. And in fact, in the general case *, the memory stuck in the bus is the last instruction that was received. This simplifies the process, because you need to get only this last, pre-selected value. But since the last pre-selected value depends only on where we are currently executing from memory, it should always be the same. Even if the received address changes while it is invalid,you will always get the same memory.

Uh ... Stop. But this cycle exists, and it cannot be exited if this value is preselected. So what is going on? If he constantly receives the following instruction, then what happens between these operations? I tried to run such endless loops on test ROMs to check if, for example, the value could go bad. This can definitely happen if the value has not been updated recently, but the value is updated in each instruction, so it does not have time to get corrupted. My tests never left the loop. I did something different than in these games, although I recreated the cycle exactly. What did I do wrong?

Pokémon Emerald and ACE, occurring only on iron


Fast forward in time, in January 2020. The bug report at the Sonic Pinball Party at that time was about three and a half years old. In other emulators, he was known for many years. I have run out of working theories. At the end of this month, a user with the nickname merrpjoined the Discord community of the mGBA emulator and said that Pokémon Emerald has a new arbitrary code execution glitch (ACE) that works only on hardware. Moreover, this glitch will most likely be used by speedrunners, who may want to practice the emulator. Obviously, this bug has become an attractive target for fixing the error, although it would be better if I found out about it before version 0.8.0. I began to research the glitch and confirmed the observation of merrp that it only works on hardware. In all the emulators I tried, the game hung with a black screen. But merrp informed me that it hangs on reading from invalid memory in a loop, and I realized that most likely I could not fix the error in the near future. This is again the same bug.

This time, learning about looping functions gave me an edge. Thanks to the pokeemerald decompilation project, I could easily make targeted changes to the function to try to figure out how she managed to get out of the loop. A simplified version of this loop looks something like this:

uint16_t type = /* ... */;
for (int32_t i = 0; table[type][i] != 0xFFFF; ++i) {
	uint16_t value = table[type][i] & 0xFE00;
	if (value > 0x7E00) {
		break;
	}
	/* ... */
}

The loop performs a fairly simple task. There is a two-dimensional table of values. On each row of this column table, the typeloop first tries to determine if the value is a certain sentinel value. If so, the loop ends. Otherwise, it applies a mask to the value and checks if it is larger than the value being checked. Otherwise, it goes down the cycle. In a particular case of a glitch, the value typegoes beyond the boundaries of the table, which leads to the appearance of an invalid pointer. This means that when you try to accessiTo this element of this nonexistent column, we will always access invalid memory. Although the table offset increases with each iteration of the loop before returning to actual memory, it may need hundreds of millions of repetitions. Therefore, it is obvious that he does not. So how does a program get out of a loop?

To investigate this, I changed the cycle and looked at what would happen if I just instantly broke out of the cycle. Everything turned out to be quite simple: at this moment ACE worked on both the hardware and the emulator, and nothing hung. So instead, I tried to set the screen color to the value that the program reads when it exits the loop and freezes so that the color does not change. I recompiled the code and ran it on a real GBA. After a few seconds of freezing on a black screen, it became a gorgeous blue hue.


VERY BLUE

But the emulator still hung on a black screen. What value will he read if he read the previously received value? Instead, it became a dark turquoise.


Fu.

That is, the program, before it managed to get out of the cycle, most certainly passed it at least once. It also turned out that the time required to escape from the cycle on iron varies. This usually took 2 to 30 seconds. What is going on?

New working theory


Then I noticed the difference between my test ROM and the Pokémon Emerald when it hung. Pokémon played music. Sonic Pinball Party also played music. Hello Kitty didn't play music, but it gave me an idea. What happens if an interrupt occurs between prefetching and data loading? Does the program start prefetching the interrupt vector before accessing the invalid memory? I quickly created a layout for this situation in mGBA, turned on interrupts in the test ROM, and of course it got out of the loop. Then I tried the same test ROM on hardware and ... it didn’t get out of the loop. And so the theory came about. In the end, I realized something. I’m sure that you noticed an asterisk above, so yes, there can be one event between prefetching and accessing memory,but only if, between the prefetch and access to invalid memory, the memory bus sends a request not to the CPU, but to something else.

I said that the memory bus is controlled by the CPU. For the most part, this is true, but there is other important equipment that also has access to the memory bus bypassing the processor. This process is called direct memory access . I talked about DMA in a previous article , so now I will not go into the principles of its work. If you re-read the article, you may notice that I said that the main CPU pauses while DMA is running. This means that while DMA is running, the value on the bus will now be the last access to the DMA memory. This is mainly important if the DMA goes beyond the actual memory to an invalid region; however, it duplicates the last good value.

It has long been known that if you load invalid memory into DMA, you will get the last DMA value, but I implemented it in mGBA for a long time and already forgot about it. When I saw this in the access code for invalid memory when studying the bug, something clicked in my head. What if the DMA value lingers on the bus for one instruction? If the first instruction after DMA finishes loading the invalid memory before it gets the next value, then in theory this should lead to reloading the DMA value. Moreover, playing music in GBA typically uses DMA to transmit audio output. For the correct implementation of this, a tact-accurate emulator is required, which can block the CPU in the middle of the instruction execution, between the start of the instruction and memory access, and the GBA console emulation in the mGBA emulator is not tact-accurate.And this is something to me.recalls . Fortunately, I managed to get around this problem. The solution is imperfect, but I can now compare the expected CPU address for the instruction after DMA with the current CPU address on an invalid load and use a single address instead of the pre-selected value for this DMA value.

The long-awaited decision


I turned on the DMA operations for H-blank in the test ROM and synchronized them with V-blank so that the timings were stable, ran it on hardware, and ... this time it worked! The test ROM constantly exited the loop after the same number of iterations when the DMA value was read from the bus. I was right! For the correct implementation of this in mGBA, several attempts were required, but now the program exits the cycle with the same results as on the hardware. I finally got a shade of blue on mGBA. Hello Kitty has booted. Saving at the Sonic Pinball Party has earned.

I did it.

This was probably the longest time I spent on a single bug. For three years, I invested so much time in debugging it that I lost count, and I am sure that other developers also faced similar situations in their emulators. Without this insight, it could have taken me another year, or even more, but the black screen, on which nothing happened except for playing music, became that domino tile that led to the collapse of the whole problem.

Now that the solution is found, it can be implemented in other GBA emulators, putting an end to this bug. The bug will be fixed in mGBA 0.9.0, which, I hope, will be released this year, and has already been fixed in test builds. You can finally play Hello Kitty Collection: Miracle Fashion Maker. Unless, of course, you wish, it is not for me to judge you.

image

  1. If you try to execute memory that does not have execute permissions, this is called prefetch abort.
  2. DACS (short for Debugging and Communication System) is part of the GBA development kit.
  3. These idle cycles while reading from the bus are sometimes called wait states.

All Articles