Starting A Decomp Project
Matt Hurd / February 2023 (2720 Words, 16 Minutes)
Earlier this month, I finally decided to take the plunge and start working on a decomp of Winx Club for the Game Boy Advance. Not only that, I decided to approach it from a “matching” decompilation angle.
Background
What even is decomp?
Decomp is the process of taking assembly code, and converting it to C (or C++). In the case of matching decomp, the goal is to write code that when compiled produces exactly the same bytecode. There are many factors that can affect what assembly gets produced, the most prominent of which are the compiler is used, as well as the compiler flags. In the case of GBA compilers, we’re mainly looking at GCC (GNU Compiler Collection) vs ADS (Arm Developer Suite). For compiler flags the most important flag is the optimization level; decomp with -O0 (no optimization) is significantly easier than -O2 (most optimizations that aren’t trading off size for speed, at the risk of breaking undefined behavior).
For many programs that are written in C, the specifics of what assembly is produced isn’t too important. With “perfect” C code, the program should function the same despite what compiler or optimization is used. In practice this isn’t always the case. Small mistakes or oversights in code can cause compiler specific behavior to be a requirement for the code to function properly. Let’s look at a more concrete example.
The following C is code is a basic strlen
function used in Winx Club. It takes a C-string as input, and returns the length of it.
1
2
3
4
5
6
7
8
int strlen(char *a1)
{
int i;
for ( i = 0; *a1; ++i )
++a1;
return i;
}
This is what the compiler output looks like for varying compilers.
GCC 13.2 with -O0
GCC 13.2 with -O2
Winx Club ROM
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
strlen:
push rbp
mov rbp, rsp
mov QWORD PTR [rbp-24], rdi
mov DWORD PTR [rbp-4], 0
jmp .L2
.L3:
add QWORD PTR [rbp-24], 1
add DWORD PTR [rbp-4], 1
.L2:
mov rax, QWORD PTR [rbp-24]
movzx eax, BYTE PTR [rax]
test al, al
jne .L3
mov eax, DWORD PTR [rbp-4]
pop rbp
ret
1
2
3
4
5
6
7
8
9
10
11
12
strlen:
cmp BYTE PTR [rdi], 0
je .L3
sub rsp, 8
add rdi, 1
call strlen
add rsp, 8
add eax, 1
ret
.L3:
xor eax, eax
ret
1
2
3
4
5
6
7
8
9
10
11
12
13
14
sub_8000292
ldrb r2, [r0]
movs r1, #0
cmp r2, #0
beq _080002A4
_0800029A
adds r0, #1
ldrb r2, [r0]
adds r1, #1
cmp r2, #0
bne _0800029A
_080002A4
adds r0, r1, #0
bx lr
We can see that there are some stark differences between these outputs. Now for more complex functions, we can expect to see even bigger differences. The main concern with non-matching decomp (in my opinion) is that as you’re progressing, functions you create can leave registers or the stack in different states. If the original code makes assumptions about these states, there can be a drastic butterfly affect causing bugs to pop up later down the line. And even in the case where the state is left the exact same, there’s always the possibility of performance differences in the code causing odd stuff to happen when it comes to vsync loops.
With all of these considerations, I’ve decide to make this project about matching decomp. It’ll cause a lot more of a headache at the reverse engineering phase, but it completely eliminates any post-compilation debugging.
Okay, but why Winx Club
Many years ago, I made a Tool Assisted Speedrun (TAS) of this game:
The backstory behind it is pretty great. Three great glitch hunters (AKheon, Rocmox, and Kazooie) found that this is an incredibly glitchy game. Some highlights:
- Pausing while climbing a ledge or while jumping over an obstacle completely disables collision
- Pausing on the same frame that you transition between maps causes only the map to change, but you and all objects stay at the same coordinates.
- With objects loaded on the wrong map, they are able to execute code from the new map.
- Partially uninitialized objects are stored at the top left corner of the map (0,0)
- Pausing and unpausing the game several time causes the game to crash (tile memory overflow)
My personal favorite and most perplexing glitch is caused by interacting with a boss early. By going out of bounds, you’re able to skip past some cutscene triggers that initialize bosses, and get to them while they’re not full set up. Any form of interaction (hitting them, touching them, getting hit by them) causes weird effects. Sometimes the game crashes, sometimes it hangs, and sometimes we get some very unique VRAM corruption. This level of inconsistency makes me very hopeful that we can manage to hit some Arbitrary Code Execution (ACE) in this game. For those unfamiliar with ACE, or specifically ACE in games, here’s a great explanation video of Ocarina of Time’s ACE by GlitchesAndStuff:
Why decomp?
Much more experienced reverse engineers might be able to figure out exactly what’s going on without needing to go for full decomp, but I only have hobbyist experience. I’ve dug into the situation extensively with IDA and a debugger, but I’ve reached some limits of what I’m able to identify just by looking at assembly. And honestly, it seems like a really cool project to work on.
The beginning
I started my journey by looking into existing Game Boy Advance game decomps. The most mature and influential ones are all of the Pokemon Reverse Engineer Team’s (pret). Their discord is the place to go for any sort of decomp. Zelda Decompliation discord is a close second for decomp in general, but considering The Minish Cap is the only Zelda GBA game, there’s less there.
Searching through the pret discord, there’s some information on how to start a decomp project. I’ll try to go through my initial process here.
Information gathering
The very first thing we need to know is which compiler we’re using. The vast majority of GBA games were built using GNU, which has a very robust and commonly-used toolchain. The alternative is ADS, which is not commonly used for GBA and has very little tooling. Unfortunately, it appears that Winx Club is in the minority and is built using ADSv1.2. Normmatt from pret has a gist here that has a list of games that are likely compiled with ARM ADS 1.2. Great.
ADS1.2 is old, it was released in 2001. I can’t actually find any evidence of it ever being available for Linux. I was lucky to be able to acquire a copy of it for Windows, but that means our entire toolchain is going to be Windows-based. This is unfortunate, as the pret GBA decomps primarily are made with Linux in mind.
Creating the foundation
There are multiple ways to start, but I decided to start with having some sanity check in place to know if everything is working. There will be many steps ahead that require massive refactors, and our main guiding light will be whether or not our built ROM is actually matching. For this, I’m actually just going to copy and modify pret’s Fire Red buildsystem. The main thing is the Makefile (which will need some heavy modifications for Windows). We’ll need to compare the sha1sum of our built rom with the baserom.gba
. As long as we’re not making any changes to the game, these should always be matching.
Let’s briefly take a look at what our expected build process is going to be:
- Compilation
- Take .c input files, and compile them into .s assembly files
- Assembly
- Take .s assembly files, and assemble them into .o object files.
- Linking
- Take .o object files and order them into a .elf
- Finishing up
- Take the .elf and convert it to .gba
Let’s start working backwards. The actual conversion process from .elf
to .gba
isn’t too important here, we can just steal borrow pret’s process. Using the tools
from FireRed, as well as this Makefile target:
1
2
3
4
$(ROM): %.gba: %.elf
$(OBJCOPY) -bin -output build/objcopy $<
cp build/objcopy/.text $@
$(GBAFIX) $@ -p -t"$(TITLE)" -c$(GAME_CODE) -m$(MAKER_CODE) -r$(GAME_REVISION) --silent
We can move from .elf
to .gba
. Sorted.
Linking
This is the biggest shift from GCC to ADS. Those familiar with GCC will know of the ld_script.ld
file. For ADS, we’re looking at a scatter_script.txt
. I couldn’t find really any reasonable resources for it outside of simply reading the manual. Hope you like reading dev tool PDFs from 2001, because we’re going to be in there a lot. To save some pain, here’s a very basic scatter_script.txt
that will get us started:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
LOAD_ROM_2 0x02000000
{
EWRAM 0x02000000 0x40000
{
ewram.o (+RW)
}
}
LOAD_ROM_3 0x03000000
{
IWRAM 0x03000000 0x8000
{
iwram.o (+RW)
}
}
LOAD_ROM_1 0x08000000
{
.text 0x08000000
{
crt0.o (+RO, +FIRST)
}
.text2 0x08000210
{
data.o
}
}
This will give us a very basic pipeline. We’ll need to assemble objects for both ewram and iwram, as well as supply the crt0 assembly. For now, we’re going to just leave the rest of the data (and code) as one object file. Remember we’re just setting up our foundations for now. If you want to read on what ewram and iwram are, I highly recommend gbatek. This documentation is an absolute lifesaver.
Assembly
For now, let’s just set up some basic iwram.s and ewram.s files according to the spec. We’ll place them in data/
:
1
2
3
4
5
6
AREA data, DATA
GLOBAL ewram
ewram
SPACE 0x40000
END
1
2
3
4
5
6
AREA data, DATA
GLOBAL iwram
iwram
SPACE 0x8000
END
Now we just need to extract out crt0.s
, and create a data.s
. To keep this set up short, we’ll want to avoid any actual dissassembly. Let’s just manually set all of the bytes in these files to test our build pipeline. When I set up my project, I ended up going for disassembly first, and it was a massive headache. I’ll leave it as an exercise for the reader to write a script to extract the ROM in this way. In practice, the output .s
files will look something like this:
1
2
3
4
_08000000
DCB 0x3E, 0x00, 0x00, 0xEA, 0x24, 0xFF, 0xAE, 0x51, 0x69, 0x9A, 0xA2, 0x21, 0x3D, 0x84, 0x82, 0x0A
DCB 0x84, 0xE4, 0x09, 0xAD, 0x11, 0x24, 0x8B, 0x98, 0xC0, 0x81, 0x7F, 0x21, 0xA3, 0x52, 0xBE, 0x19
...
Compliation
Compilation will be a topic for another day, as this is already a pretty lengthy post.
Tying it all together
We can use a “short” Makefile
to verify that our pipeline is working:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
# Default variables
GAME_REVISION ?= 0
GAME_REGION ?= USA
DEBUG ?= 0
COMPARE ?= 1
# For gbafix
MAKER_CODE := A4
# Version
BUILD_NAME := winxclub
TITLE := WINXCLUB
GAME_CODE := BWIE
ASM_SUBDIR = src
DATA_ASM_SUBDIR = data
AS := tools/ADSv1_2/bin/armasm.exe
LD := tools/ADSv1_2/bin/armlink.exe
OBJCOPY := tools/ADSv1_2/Bin/fromelf.exe
GBAFIX := tools/gbafix/gbafix.exe
PREPROC := tools/preproc/preproc.exe
ROM := $(BUILD_NAME).gba
ELF := $(ROM:%.gba=%.elf)
ASFLAGS := -CPU arm7tdmi -LIttleend -apcs "/interwork" -I asminclude -I include
LDFLAGS = -noremove
LDSCRIPT := scatter_script.txt
ASM_SRCS := $(wildcard $(ASM_SUBDIR)/*.s $(ASM_SUBDIR)/*/*.s $(ASM_SUBDIR)/*/*/*.s)
ASM_OBJS := $(patsubst $(ASM_SUBDIR)/%.s,$(ASM_BUILDDIR)/%.o,$(ASM_SRCS))
DATA_ASM_SRCS := $(wildcard $(DATA_ASM_SUBDIR)/*.s)
DATA_ASM_OBJS := $(patsubst $(DATA_ASM_SUBDIR)/%.s,$(DATA_ASM_BUILDDIR)/%.o,$(DATA_ASM_SRCS))
ASM_BUILDDIR = $(OBJ_DIR)/$(ASM_SUBDIR)
DATA_ASM_BUILDDIR = $(OBJ_DIR)/$(DATA_ASM_SUBDIR)
OBJS := $(ASM_OBJS) $(DATA_ASM_OBJS)
.PHONY: all
MAKEFLAGS += --no-print-directory
# Secondary expansion is required for dependency variables in object rules.
.SECONDEXPANSION:
# Clear the default suffixes
.SUFFIXES:
# Don't delete intermediate files
.SECONDARY:
# Delete files that weren't built properly
.DELETE_ON_ERROR:
all: $(ROM)
coreutils sha1sum -c $(BUILD_NAME).sha1
$(DATA_ASM_BUILDDIR)/%.o: $(DATA_ASM_SUBDIR)/%.s $$(data_dep)
$(PREPROC) $< charmap.txt > $(DATA_ASM_BUILDDIR)/$*.p.i
$(AS) $(ASFLAGS) -o $@ $<
$(ASM_BUILDDIR)/%.o: $(ASM_SUBDIR)/%.s $$(asm_dep)
$(AS) $(ASFLAGS) -o $@ $<
$(ELF): compile-partial-c $(OBJS) scatter_script.txt
$(LD) $(LDFLAGS) -scatter $(LDSCRIPT) -Output $@ $(OBJS) tools/agbcc/lib/libgcc.a tools/agbcc/lib/libc.a
$(ROM): %.gba: %.elf
$(OBJCOPY) -bin -output build/objcopy $<
cp build/objcopy/.text $@
$(GBAFIX) $@ -p -t"$(TITLE)" -c$(GAME_CODE) -m$(MAKER_CODE) -r$(GAME_REVISION) --silent
If everything was done properly, we should get an OK
saying that our shas are matching!
Next steps
Now we just need to do the actual disassembly! And then follow it up with decompilation! There’s a lot of work ahead of us.