home

Starting A Decomp Project

decomp adsv1.2

Earlier this month, I finally decided to take the plunge and start working on a decomp of Winx Club for the Game Boy Advance. Not only that, I decided to approach it from a “matching” decompilation angle.

Background

What even is decomp?

Decomp is the process of taking assembly code, and converting it to C (or C++). In the case of matching decomp, the goal is to write code that when compiled produces exactly the same bytecode. There are many factors that can affect what assembly gets produced, the most prominent of which are the compiler is used, as well as the compiler flags. In the case of GBA compilers, we’re mainly looking at GCC (GNU Compiler Collection) vs ADS (Arm Developer Suite). For compiler flags the most important flag is the optimization level; decomp with -O0 (no optimization) is significantly easier than -O2 (most optimizations that aren’t trading off size for speed, at the risk of breaking undefined behavior).

For many programs that are written in C, the specifics of what assembly is produced isn’t too important. With “perfect” C code, the program should function the same despite what compiler or optimization is used. In practice this isn’t always the case. Small mistakes or oversights in code can cause compiler specific behavior to be a requirement for the code to function properly. Let’s look at a more concrete example. The following C is code is a basic strlen function used in Winx Club. It takes a C-string as input, and returns the length of it.

1
2
3
4
5
6
7
8
int strlen(char *a1)
{
  int i;

  for ( i = 0; *a1; ++i )
    ++a1;
  return i;
}

This is what the compiler output looks like for varying compilers.

GCC 13.2 with -O0

GCC 13.2 with -O2

Winx Club ROM

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
strlen:
    push    rbp  
    mov     rbp, rsp  
    mov     QWORD PTR [rbp-24], rdi  
    mov     DWORD PTR [rbp-4], 0  
    jmp     .L2  
.L3:  
    add     QWORD PTR [rbp-24], 1  
    add     DWORD PTR [rbp-4], 1  
.L2:  
    mov     rax, QWORD PTR [rbp-24]  
    movzx   eax, BYTE PTR [rax]  
    test    al, al  
    jne     .L3  
    mov     eax, DWORD PTR [rbp-4]  
    pop     rbp  
    ret  
1
2
3
4
5
6
7
8
9
10
11
12
strlen:  
    cmp     BYTE PTR [rdi], 0  
    je      .L3  
    sub     rsp, 8  
    add     rdi, 1  
    call    strlen  
    add     rsp, 8  
    add     eax, 1  
    ret  
.L3:  
    xor     eax, eax  
    ret
1
2
3
4
5
6
7
8
9
10
11
12
13
14
sub_8000292  
    ldrb r2, [r0]  
    movs r1, #0  
    cmp r2, #0  
    beq _080002A4  
_0800029A  
    adds r0, #1  
    ldrb r2, [r0]  
    adds r1, #1  
    cmp r2, #0  
    bne _0800029A  
_080002A4  
    adds r0, r1, #0  
    bx lr

We can see that there are some stark differences between these outputs. Now for more complex functions, we can expect to see even bigger differences. The main concern with non-matching decomp (in my opinion) is that as you’re progressing, functions you create can leave registers or the stack in different states. If the original code makes assumptions about these states, there can be a drastic butterfly affect causing bugs to pop up later down the line. And even in the case where the state is left the exact same, there’s always the possibility of performance differences in the code causing odd stuff to happen when it comes to vsync loops.

With all of these considerations, I’ve decide to make this project about matching decomp. It’ll cause a lot more of a headache at the reverse engineering phase, but it completely eliminates any post-compilation debugging.

Okay, but why Winx Club

Many years ago, I made a Tool Assisted Speedrun (TAS) of this game:

The backstory behind it is pretty great. Three great glitch hunters (AKheon, Rocmox, and Kazooie) found that this is an incredibly glitchy game. Some highlights:

My personal favorite and most perplexing glitch is caused by interacting with a boss early. By going out of bounds, you’re able to skip past some cutscene triggers that initialize bosses, and get to them while they’re not full set up. Any form of interaction (hitting them, touching them, getting hit by them) causes weird effects. Sometimes the game crashes, sometimes it hangs, and sometimes we get some very unique VRAM corruption. This level of inconsistency makes me very hopeful that we can manage to hit some Arbitrary Code Execution (ACE) in this game. For those unfamiliar with ACE, or specifically ACE in games, here’s a great explanation video of Ocarina of Time’s ACE by GlitchesAndStuff:

Why decomp?

Much more experienced reverse engineers might be able to figure out exactly what’s going on without needing to go for full decomp, but I only have hobbyist experience. I’ve dug into the situation extensively with IDA and a debugger, but I’ve reached some limits of what I’m able to identify just by looking at assembly. And honestly, it seems like a really cool project to work on.

The beginning

I started my journey by looking into existing Game Boy Advance game decomps. The most mature and influential ones are all of the Pokemon Reverse Engineer Team’s (pret). Their discord is the place to go for any sort of decomp. Zelda Decompliation discord is a close second for decomp in general, but considering The Minish Cap is the only Zelda GBA game, there’s less there.

Searching through the pret discord, there’s some information on how to start a decomp project. I’ll try to go through my initial process here.

Information gathering

The very first thing we need to know is which compiler we’re using. The vast majority of GBA games were built using GNU, which has a very robust and commonly-used toolchain. The alternative is ADS, which is not commonly used for GBA and has very little tooling. Unfortunately, it appears that Winx Club is in the minority and is built using ADSv1.2. Normmatt from pret has a gist here that has a list of games that are likely compiled with ARM ADS 1.2. Great.

ADS1.2 is old, it was released in 2001. I can’t actually find any evidence of it ever being available for Linux. I was lucky to be able to acquire a copy of it for Windows, but that means our entire toolchain is going to be Windows-based. This is unfortunate, as the pret GBA decomps primarily are made with Linux in mind.

Creating the foundation

There are multiple ways to start, but I decided to start with having some sanity check in place to know if everything is working. There will be many steps ahead that require massive refactors, and our main guiding light will be whether or not our built ROM is actually matching. For this, I’m actually just going to copy and modify pret’s Fire Red buildsystem. The main thing is the Makefile (which will need some heavy modifications for Windows). We’ll need to compare the sha1sum of our built rom with the baserom.gba. As long as we’re not making any changes to the game, these should always be matching.

Let’s briefly take a look at what our expected build process is going to be:

  1. Compilation
    • Take .c input files, and compile them into .s assembly files
  2. Assembly
    • Take .s assembly files, and assemble them into .o object files.
  3. Linking
    • Take .o object files and order them into a .elf
  4. Finishing up
    • Take the .elf and convert it to .gba

Let’s start working backwards. The actual conversion process from .elf to .gba isn’t too important here, we can just steal borrow pret’s process. Using the tools from FireRed, as well as this Makefile target:

1
2
3
4
$(ROM): %.gba: %.elf
	$(OBJCOPY) -bin -output build/objcopy $<
	cp build/objcopy/.text $@
	$(GBAFIX) $@ -p -t"$(TITLE)" -c$(GAME_CODE) -m$(MAKER_CODE) -r$(GAME_REVISION) --silent

We can move from .elf to .gba. Sorted.

Linking

This is the biggest shift from GCC to ADS. Those familiar with GCC will know of the ld_script.ld file. For ADS, we’re looking at a scatter_script.txt. I couldn’t find really any reasonable resources for it outside of simply reading the manual. Hope you like reading dev tool PDFs from 2001, because we’re going to be in there a lot. To save some pain, here’s a very basic scatter_script.txt that will get us started:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
LOAD_ROM_2 0x02000000
{
    EWRAM 0x02000000 0x40000
    {
        ewram.o (+RW)
    }
}
LOAD_ROM_3 0x03000000
{
    IWRAM 0x03000000 0x8000
    {
        iwram.o (+RW)
    }
}

LOAD_ROM_1 0x08000000
{
    .text 0x08000000
    {
        crt0.o (+RO, +FIRST)
    }

    .text2 0x08000210
    {
        data.o
    }
}

This will give us a very basic pipeline. We’ll need to assemble objects for both ewram and iwram, as well as supply the crt0 assembly. For now, we’re going to just leave the rest of the data (and code) as one object file. Remember we’re just setting up our foundations for now. If you want to read on what ewram and iwram are, I highly recommend gbatek. This documentation is an absolute lifesaver.

Assembly

For now, let’s just set up some basic iwram.s and ewram.s files according to the spec. We’ll place them in data/:

1
2
3
4
5
6
	AREA data, DATA

	GLOBAL ewram
ewram
	SPACE 0x40000
	END
1
2
3
4
5
6
	AREA data, DATA

	GLOBAL iwram
iwram
	SPACE 0x8000
	END

Now we just need to extract out crt0.s, and create a data.s. To keep this set up short, we’ll want to avoid any actual dissassembly. Let’s just manually set all of the bytes in these files to test our build pipeline. When I set up my project, I ended up going for disassembly first, and it was a massive headache. I’ll leave it as an exercise for the reader to write a script to extract the ROM in this way. In practice, the output .s files will look something like this:

1
2
3
4
_08000000
	DCB 0x3E, 0x00, 0x00, 0xEA, 0x24, 0xFF, 0xAE, 0x51, 0x69, 0x9A, 0xA2, 0x21, 0x3D, 0x84, 0x82, 0x0A
	DCB 0x84, 0xE4, 0x09, 0xAD, 0x11, 0x24, 0x8B, 0x98, 0xC0, 0x81, 0x7F, 0x21, 0xA3, 0x52, 0xBE, 0x19
...

Compliation

Compilation will be a topic for another day, as this is already a pretty lengthy post.

Tying it all together

We can use a “short” Makefile to verify that our pipeline is working:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
# Default variables
GAME_REVISION ?= 0
GAME_REGION   ?= USA
DEBUG         ?= 0
COMPARE       ?= 1

# For gbafix
MAKER_CODE := A4

# Version
BUILD_NAME := winxclub
TITLE      := WINXCLUB
GAME_CODE  := BWIE

ASM_SUBDIR = src
DATA_ASM_SUBDIR = data

AS	     := tools/ADSv1_2/bin/armasm.exe
LD	     := tools/ADSv1_2/bin/armlink.exe
OBJCOPY  := tools/ADSv1_2/Bin/fromelf.exe
GBAFIX   := tools/gbafix/gbafix.exe
PREPROC  := tools/preproc/preproc.exe

ROM 	 := $(BUILD_NAME).gba
ELF	     := $(ROM:%.gba=%.elf)
ASFLAGS  := -CPU arm7tdmi -LIttleend -apcs "/interwork" -I asminclude -I include
LDFLAGS   = -noremove
LDSCRIPT := scatter_script.txt

ASM_SRCS := $(wildcard $(ASM_SUBDIR)/*.s $(ASM_SUBDIR)/*/*.s $(ASM_SUBDIR)/*/*/*.s)
ASM_OBJS := $(patsubst $(ASM_SUBDIR)/%.s,$(ASM_BUILDDIR)/%.o,$(ASM_SRCS))

DATA_ASM_SRCS := $(wildcard $(DATA_ASM_SUBDIR)/*.s)
DATA_ASM_OBJS := $(patsubst $(DATA_ASM_SUBDIR)/%.s,$(DATA_ASM_BUILDDIR)/%.o,$(DATA_ASM_SRCS))

ASM_BUILDDIR = $(OBJ_DIR)/$(ASM_SUBDIR)
DATA_ASM_BUILDDIR = $(OBJ_DIR)/$(DATA_ASM_SUBDIR)

OBJS := $(ASM_OBJS) $(DATA_ASM_OBJS)


.PHONY: all

MAKEFLAGS += --no-print-directory
# Secondary expansion is required for dependency variables in object rules.
.SECONDEXPANSION:
# Clear the default suffixes
.SUFFIXES:
# Don't delete intermediate files
.SECONDARY:
# Delete files that weren't built properly
.DELETE_ON_ERROR:

all: $(ROM)
	coreutils sha1sum -c $(BUILD_NAME).sha1

$(DATA_ASM_BUILDDIR)/%.o: $(DATA_ASM_SUBDIR)/%.s $$(data_dep)
	$(PREPROC) $< charmap.txt > $(DATA_ASM_BUILDDIR)/$*.p.i
	$(AS) $(ASFLAGS) -o $@ $<

$(ASM_BUILDDIR)/%.o: $(ASM_SUBDIR)/%.s $$(asm_dep)
	$(AS) $(ASFLAGS) -o $@ $<

$(ELF): compile-partial-c $(OBJS) scatter_script.txt
	$(LD) $(LDFLAGS) -scatter $(LDSCRIPT) -Output $@ $(OBJS) tools/agbcc/lib/libgcc.a tools/agbcc/lib/libc.a

$(ROM): %.gba: %.elf
	$(OBJCOPY) -bin -output build/objcopy $<
	cp build/objcopy/.text $@
	$(GBAFIX) $@ -p -t"$(TITLE)" -c$(GAME_CODE) -m$(MAKER_CODE) -r$(GAME_REVISION) --silent

If everything was done properly, we should get an OK saying that our shas are matching!

Next steps

Now we just need to do the actual disassembly! And then follow it up with decompilation! There’s a lot of work ahead of us.

© 2024 Matt Hurd   •  Theme  Moonwalk