home

Disassembling Winx

decomp adsv1.2

In our last post, we explored starting our GBA decomp project and set up a basic repo for an ADS v1.2 game. Now it’s time to get some real code in here.

Introducing gbadisasm

At the time of writing this, I know of 3 forks of the the gbadisasm repo. The original is by camthesaxman. It was then forked by PikalaxALT (which fell behind), but this was later forked by jiangzhengwenjz. I’ll be playing with both jiang’s and cam’s forks to try to find which is best. Jiang’s is definitely the most recently updated repo, but since they’ve diverged, it’s important to check both of them. Keep in mind that these were both made with GCC in mind, so yeah…

In hindsight, I absolutely should’ve forked this repo myself and made changes. At some point I might, but for this project I ended up just writing a series of Python scripts that modified the output to work for me.

Diving into Jiang’s gbadisasm

Jiang’s gbadisasm fork has the best documentation (README), so let’s move forward with it for now. Running it without any config file on our baserom.gba, we get:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
        arm_func_start sub_08000000
sub_08000000: @ 0x08000000
        b _08000100
_08000004:
        .byte 0x24, 0xFF, 0xAE, 0x51, 0x69, 0x9A, 0xA2, 0x21, 0x3D, 0x84, 0x82, 0x0A
        .byte 0x84, 0xE4, 0x09, 0xAD, 0x11, 0x24, 0x8B, 0x98, 0xC0, 0x81, 0x7F, 0x21, 0xA3, 0x52, 0xBE, 0x19
        .byte 0x93, 0x09, 0xCE, 0x20, 0x10, 0x46, 0x4A, 0x4A, 0xF8, 0x27, 0x31, 0xEC, 0x58, 0xC7, 0xE8, 0x33
        .byte 0x82, 0xE3, 0xCE, 0xBF, 0x85, 0xF4, 0xDF, 0x94, 0xCE, 0x4B, 0x09, 0xC1, 0x94, 0x56, 0x8A, 0xC0
        .byte 0x13, 0x72, 0xA7, 0xFC, 0x9F, 0x84, 0x4D, 0x73, 0xA3, 0xCA, 0x9A, 0x61, 0x58, 0x97, 0xA3, 0x27
        .byte 0xFC, 0x03, 0x98, 0x76, 0x23, 0x1D, 0xC7, 0x61, 0x03, 0x04, 0xAE, 0x56, 0xBF, 0x38, 0x84, 0x00
        .byte 0x40, 0xA7, 0x0E, 0xFD, 0xFF, 0x52, 0xFE, 0x03, 0x6F, 0x95, 0x30, 0xF1, 0x97, 0xFB, 0xC0, 0x85
        .byte 0x60, 0xD6, 0x80, 0x25, 0xA9, 0x63, 0xBE, 0x03, 0x01, 0x4E, 0x38, 0xE2, 0xF9, 0xA2, 0x34, 0xFF
        .byte 0xBB, 0x3E, 0x03, 0x44, 0x78, 0x00, 0x90, 0xCB, 0x88, 0x11, 0x3A, 0x94, 0x65, 0xC0, 0x7C, 0x63
        .byte 0x87, 0xF0, 0x3C, 0xAF, 0xD6, 0x25, 0xE4, 0x8B, 0x38, 0x0A, 0xAC, 0x72, 0x21, 0xD4, 0xF8, 0x07
        .byte 0x57, 0x49, 0x4E, 0x58, 0x43, 0x4C, 0x55, 0x42, 0x00, 0x00, 0x00, 0x00, 0x42, 0x57, 0x49, 0x45
        .byte 0x41, 0x34, 0x96, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x49, 0x00, 0x00
        .byte 0x0E, 0x00, 0x00, 0xEA, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00
        .byte 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00
        .byte 0x06, 0x00, 0x00, 0xEA, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00
        .byte 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00
_08000100:
        mov r0, #0x12
        msr cpsr_fc, r0
        ldr sp, _0800012C @ =0x03007FA0
        mov r0, #0x1f
        msr cpsr_fc, r0
        ldr sp, _08000128 @ =0x03007F00
        ldr r1, _08000130 @ =0x08000134
        mov lr, pc
        bx r1
        b _08000100
        .align 2, 0
_08000128: .4byte 0x03007F00
_0800012C: .4byte 0x03007FA0
_08000130: .4byte 0x08000134

That’s not a lot of code. In fact, it’s just the crt0. Also of note: these .<directive> commands are GCC specific and do not work on ADS. We’re going to have a lot of cleanup, but that’s what scripting is for. Now we’re at a point where we need to find the rest of the code. It’s time to break out IDA.

Introducing IDA

IDA has been my go-to for years. I started using it for the Wildstar private server project NexusForever since the other devs were using it, and it’s stuck. While you could also use Ghidra, most of the tooling I’ve found around GBA is with IDA. For those unfamiliar with IDA, it’s a dissassembler that’s used to explore programs and see what secrets are hiding in them.

Before we even open up IDA, let’s install some basic tooling that has been built to assist with GBA games. Laqieer’s ida_gba_stuff repo has what we need. GBA_Loader is going to be the most useful, but let’s copy everything over. The only thing to be aware of is that cfg/idagui.cfg will overwrite some IDA defaults and isn’t updated past IDA 7.0, so you likely won’t want to use it. Consider copying over the file extensions and adding GBA_ROM to the DEFAULT_FILE_FILTER block of your own cfg though.

With that set up, we’re ready to open up IDA. When loading a .gba file into IDA, it should now default to ARM7TDMI using GBA_Loader.py. Looking at IDA’s console, we see:

1
2
3
4
5
6
7
8
9
Detected file format: Game Boy Advance ROM: ARM7TDMI
  0. Creating a new segment  (00000000-00004000) ... ... OK
  1. Creating a new segment  (02000000-02040000) ... ... OK
  2. Creating a new segment  (03000000-03008000) ... ... OK
  3. Creating a new segment  (04000000-04000400) ... ... OK
  4. Creating a new segment  (05000000-05000400) ... ... OK
  5. Creating a new segment  (06000000-06018000) ... ... OK
  6. Creating a new segment  (07000000-07000400) ... ... OK
  7. Creating a new segment  (08000000-0A000000) ... ... OK

This is great. If you’ve been reading up on GBATek, you’ll recognize these address ranges. Once IDA has finished it’s first autoanalysis pass, we end up with 1146 Functions found. Included in laqieer’s repo is an incredibly helpful script that lets you convert your IDA database (idb) to the .cfg file for gbadisasm. Let’s see how that works out.

  1. First we’ll run the script through ida: File -> Script File... -> Select <ida_dir>/idc/export_gbadisasm_config.idc, select output file.
  2. Re-run gbadisasm ./gbadisasm/gbadisasm baserom.gba -c output.cfg

And our output:

1
2
gbadisasm: disasm.c:604: analyze: Assertion `tmp_cnt == 1' failed.
Aborted

Oh. That’s not great. For debugging this, I ended up making some slight modifications to gbadisasm to give some more information on which function the error occured at. For this case, it looks like IDA marked a lot of data as code, so gbadisasm failed trying to disassemble it. In IDA, the best way to handle this is to create some new segments. It looks like our last valid function is sub_805582C, so we’ll change the main code segment to end after it. After creating a new “DATA” segment for 08055ABC to 0A000000, we can delete all of the functions within it.

Trying the export script and gbadisasm again, our output looks much better:

1
2
$ ./gbadisasm/gbadisasm baserom.gba -c test.cfg | wc -l
89003

Let’s save it to a file and check it out

1
./gbadisasm/gbadisasm baserom.gba -c test.cfg > test.s

Analysing our dissassembly

Scrolling over our dissassembly, we’ve extracted a lot of code. We’ve also got some small blocks of data, a few medium blocks of data, as well as some large blocks. Let’s dive into some solid examples. The first small block I see is here:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
	thumb_func_start sub_8000210
sub_8000210: @ 0x08000210
	b _08000216
_08000212:
	adds r0, #1
	adds r1, #1
_08000216:
	ldrb r3, [r0]
	cmp r3, #0
	beq _08000228
	ldrb r2, [r1]
	cmp r2, #0
	beq _08000232
	cmp r3, r2
	beq _08000212
	b _08000232
_08000228:
	ldrb r0, [r1]
	cmp r0, #0
	bne _08000232
	movs r0, #1
	bx lr
_08000232:
	movs r0, #0
	bx lr
_08000236:
	.byte 0x30, 0xB4, 0x19, 0xE0, 0x00, 0x23, 0xC4, 0x56, 0x23, 0x1C
	.byte 0x61, 0x3B, 0x19, 0x2B, 0x00, 0xD8, 0x20, 0x3C, 0x25, 0x06, 0x00, 0x23, 0xCC, 0x56, 0x2D, 0x16
	.byte 0x23, 0x1C, 0x61, 0x3B, 0x19, 0x2B, 0x00, 0xD8, 0x20, 0x3C, 0x23, 0x06, 0x1B, 0x16, 0x9D, 0x42
	.byte 0x02, 0xD0, 0x58, 0x1B, 0x30, 0xBC, 0x70, 0x47, 0x01, 0x30, 0x01, 0x31, 0x01, 0x3A, 0x03, 0x78
	.byte 0x00, 0x2B, 0x05, 0xD0, 0x0B, 0x78, 0x00, 0x2B, 0x02, 0xD0, 0x00, 0x2A, 0xDD, 0xD1, 0x01, 0xE0
	.byte 0x00, 0x2A, 0x01, 0xD1, 0x00, 0x20, 0xED, 0xE7, 0x00, 0x23, 0xC9, 0x56, 0xC0, 0x56, 0x08, 0x1A
	.byte 0xE8, 0xE7

	non_word_aligned_thumb_func_start sub_8000292
sub_8000292: @ 0x08000292
	ldrb r2, [r0]
	movs r1, #0
	cmp r2, #0
	beq _080002A4
_0800029A:
	adds r0, #1
	ldrb r2, [r0]
	adds r1, #1
	cmp r2, #0
	bne _0800029A
_080002A4:
	adds r0, r1, #0
	bx lr

Jumping to it in IDA, we can see that this looks like some real code that’s being missed:

IDA

Looking a bit further down, we see different case where we have a block of data mid-function

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
_0800FBD2:
	ldr r1, _0800FD3C @ =0x03003E84
	movs r3, #0
	ldr r5, [r1]
	adds r1, r0, #0
	movs r2, #0x48
	adds r0, r5, #0
	bl sub_800529A
	adds r6, r0, #0
	ldr r0, [r0, #4]
	lsls r0, r0, #0x16
	lsrs r0, r0, #0x1c
	cmp r0, #6
	bhs _0800FC58
	add r3, pc, #0x8
	ldrb r3, [r3, r0]
	lsls r3, r3, #1
	add pc, r3
	movs r0, r0
	lsls r0, r6, #0xc
	asrs r3, r0, #8
	movs r1, #0x32
	bl sub_8005106
	adds r1, r0, #0
	movs r3, #0
	movs r2, #0
	movs r0, #0x6c
	bl sub_803DA80
	adds r5, r0, #0
	beq _0800FC18
	adds r0, r5, #0
	bl sub_800FAB0
_0800FC18:
	str r5, [r4]
	b _0800FC5C
_0800FC1C:
	.byte 0xF5, 0xF7, 0x73, 0xFA
	.byte 0x01, 0x1C, 0x00, 0x23, 0x00, 0x22, 0x78, 0x20, 0x2D, 0xF0, 0x2A, 0xFF, 0x05, 0x1C, 0x02, 0xD0
	.byte 0x28, 0x1C, 0xFF, 0xF7, 0x17, 0xFB, 0x25, 0x60, 0x10, 0xE0, 0xF5, 0xF7, 0x64, 0xFA, 0x01, 0x1C
	.byte 0x00, 0x23, 0x00, 0x22, 0x8C, 0x20, 0x2D, 0xF0, 0x1B, 0xFF, 0x05, 0x1C, 0x02, 0xD0, 0x28, 0x1C
	.byte 0xFF, 0xF7, 0x6C, 0xFD, 0x25, 0x60, 0x01, 0xE0
_0800FC58:
	movs r0, #0
	str r0, [r4]

Looking in IDA, we can see this is actually part of a jump table

IDA.

Looking at gbadisasm’s source code, it doesn’t handle jump tables that are defined this way. We’re going to have to figure this one out ourselves.

The final big block is at 803E2A0. Looking in IDA, this is just pure data. Let’s just leave it as is for now. This will have to be split out and broken up later, but for now it’s not a concern. Today, let’s tackle getting all of our functions defined.

Defining all of the functions

First, let’s get a list of all of the potential holes in our code. Disclaimer: All of the following scripts were made to be 1-offs, so they’re not pretty.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
def handle_found(mode, address):
	# Original naive implementation
	# print(f"{current_mode}_func 0x{current_address} sub_{current_address}")
	if mode == "thumb":
		print(current_address)

current_address = None
byte_count = 0
current_mode = None
with open("code.s", 'r') as f:
	for line in f:
		if 'arm_' in line:
			current_mode = "arm"
		elif 'thumb_' in line:
			current_mode = "thumb"
		if line.startswith('_'):
			current_address = line[2:-2]
			byte_count = 0
		elif line.startswith('	.byte '):
			byte_count += 1
		else:
			byte_count = 0
		if byte_count == 3:
			handle_found(current_mode, current_address)

In this script, we’re specifically targetting thumb functions. We’re looking for at least 3 lines of bytes to increase the chance we’re actually hitting code rather than data. This gives us 236 potential functions. This will give us a list of addresses that might be functions. Now let’s move back over to ida and try to define them all. IDA’s python scripting is very useful.

1
2
3
4
5
funcs = ["8000236",
"800088C",
...]
for func in funcs:
   ida_funcs.add_func(int(func, 16))

Now we can go through the .cfg -> .s cycle again, and we should have more functions defined. Of course there are still some remaining gaps, so let’s just repeat this process until we’re no longer making any progress. After repeating this process a few times, we’re now sitting at 1341 functions defined. Each cycle now is only getting us a few new functions, so let’s move on. We can always define any functions manually in IDA in the future.

Putting it all together

We’ve now got one massive .s file, so we’re going to want to do a sanity check to see if we can build with it. We can split out the code from 0x8000000 to 0x8000210 into asm/crt0.s, and the rest into asm/code.s. With a small tweak to our scatter_script.txt (changing data.o to code.o), we should be good to go. Let’s run make.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
$ tools/ADSv1_2/bin/armasm.exe -CPU arm7tdmi -LIttleend -apcs "/interwork" -I asminclude -I include -o build/winxclub/asm/crt0.o asm/crt0.s
"asm/crt0.s", line 2: Error: A1163E: Unknown opcode
    2 00000000  arm_func_start start
"asm/crt0.s", line 3: Error: A1167E: Invalid line start
    3 00000000 start: @ 0x08000000
"asm/crt0.s", line 4: Error: A1105E: Area directive missing
    4 00000000  b _08000100
"asm/crt0.s", line 4: Warning: A1088W: Faking declaration of area AREA |$$$$$$$|
    4 00000000  b _08000100
"asm/crt0.s", line 5: Error: A1167E: Invalid line start
    5 00000004 _08000004:
"asm/crt0.s", line 6: Error: A1137E: Unexpected characters at end of line
    6 00000004  .byte 0x24, 0xFF, 0xAE, 0x51, 0x69, 0x9A, 0xA2, 0x21, 0x3D, 0x84, 0x82, 0x0A
"asm/crt0.s", line 7: Error: A1137E: Unexpected characters at end of line
    7 00000004  .byte 0x84, 0xE4, 0x09, 0xAD, 0x11, 0x24, 0x8B, 0x98, 0xC0, 0x81, 0x7F, 0x21, 0xA3, 0x52, 0xBE, 0x19
"asm/crt0.s", line 8: Error: A1137E: Unexpected characters at end of line
    8 00000004  .byte 0x93, 0x09, 0xCE, 0x20, 0x10, 0x46, 0x4A, 0x4A, 0xF8, 0x27, 0x31, 0xEC, 0x58, 0xC7, 0xE8, 0x33
"asm/crt0.s", line 9: Error: A1137E: Unexpected characters at end of line
    9 00000004  .byte 0x82, 0xE3, 0xCE, 0xBF, 0x85, 0xF4, 0xDF, 0x94, 0xCE, 0x4B, 0x09, 0xC1, 0x94, 0x56, 0x8A, 0xC0
"asm/crt0.s", line 10: Error: A1137E: Unexpected characters at end of line
   10 00000004  .byte 0x13, 0x72, 0xA7, 0xFC, 0x9F, 0x84, 0x4D, 0x73, 0xA3, 0xCA, 0x9A, 0x61, 0x58, 0x97, 0xA3, 0x27
"asm/crt0.s", line 11: Error: A1137E: Unexpected characters at end of line
   11 00000004  .byte 0xFC, 0x03, 0x98, 0x76, 0x23, 0x1D, 0xC7, 0x61, 0x03, 0x04, 0xAE, 0x56, 0xBF, 0x38, 0x84, 0x00
"asm/crt0.s", line 12: Error: A1137E: Unexpected characters at end of line
   12 00000004  .byte 0x40, 0xA7, 0x0E, 0xFD, 0xFF, 0x52, 0xFE, 0x03, 0x6F, 0x95, 0x30, 0xF1, 0x97, 0xFB, 0xC0, 0x85
"asm/crt0.s", line 13: Error: A1137E: Unexpected characters at end of line
   13 00000004  .byte 0x60, 0xD6, 0x80, 0x25, 0xA9, 0x63, 0xBE, 0x03, 0x01, 0x4E, 0x38, 0xE2, 0xF9, 0xA2, 0x34, 0xFF
"asm/crt0.s", line 14: Error: A1137E: Unexpected characters at end of line
   14 00000004  .byte 0xBB, 0x3E, 0x03, 0x44, 0x78, 0x00, 0x90, 0xCB, 0x88, 0x11, 0x3A, 0x94, 0x65, 0xC0, 0x7C, 0x63
"asm/crt0.s", line 15: Error: A1137E: Unexpected characters at end of line
   15 00000004  .byte 0x87, 0xF0, 0x3C, 0xAF, 0xD6, 0x25, 0xE4, 0x8B, 0x38, 0x0A, 0xAC, 0x72, 0x21, 0xD4, 0xF8, 0x07
"asm/crt0.s", line 16: Error: A1137E: Unexpected characters at end of line
   16 00000004  .byte 0x57, 0x49, 0x4E, 0x58, 0x43, 0x4C, 0x55, 0x42, 0x00, 0x00, 0x00, 0x00, 0x42, 0x57, 0x49, 0x45
"asm/crt0.s", line 17: Error: A1137E: Unexpected characters at end of line
   17 00000004  .byte 0x41, 0x34, 0x96, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x49, 0x00, 0x00
"asm/crt0.s", line 18: Error: A1137E: Unexpected characters at end of line
   18 00000004  .byte 0x0E, 0x00, 0x00, 0xEA, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00
"asm/crt0.s", line 19: Error: A1137E: Unexpected characters at end of line
   19 00000004  .byte 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00
"asm/crt0.s", line 20: Error: A1137E: Unexpected characters at end of line
   20 00000004  .byte 0x06, 0x00, 0x00, 0xEA, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00
"asm/crt0.s", line 21: Error: A1137E: Unexpected characters at end of line
   21 00000004  .byte 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00
"asm/crt0.s", line 22: Error: A1167E: Invalid line start
   22 00000004 _08000100:
"asm/crt0.s", line 25: Error: A1154E: Unexpected operand
   25 0000000c  ldr sp, _0800012C @ =0x03007FA0
"asm/crt0.s", line 28: Error: A1154E: Unexpected operand
   28 00000018  ldr sp, _08000128 @ =0x03007F00
"asm/crt0.s", line 29: Error: A1154E: Unexpected operand
   29 0000001c  ldr r1, _08000130 @ =sub_8000134
"asm/crt0.s", line 33: Error: A1137E: Unexpected characters at end of line
   33 0000002c  .align 2, 0
"asm/crt0.s", line 34: Error: A1167E: Invalid line start
   34 0000002c _08000128: .4byte 0x03007F00
"asm/crt0.s", line 35: Error: A1167E: Invalid line start
   35 0000002c _0800012C: .4byte 0x03007FA0
"asm/crt0.s", line 36: Error: A1167E: Invalid line start
   36 0000002c _08000130: .4byte sub_8000134
"asm/crt0.s", line 38: Error: A1163E: Unknown opcode
   38 0000002c  arm_func_start sub_8000134
"asm/crt0.s", line 39: Error: A1167E: Invalid line start
   39 0000002c sub_8000134: @ 0x08000134
"asm/crt0.s", line 40: Error: A1154E: Unexpected operand
   40 0000002c  add r8, pc, #0xC4 @ =0x08000200
"asm/crt0.s", line 41: Error: A1163E: Unknown opcode
   41 0000002c  ldm r8, {r0, r1}
"asm/crt0.s", line 45: Error: A1167E: Invalid line start
   45 00000038 _08000148:
"asm/crt0.s", line 48: Error: A1163E: Unknown opcode
   48 00000040  ldm r0!, {r4, r5, r6}
"asm/crt0.s", line 58: Error: A1167E: Invalid line start
   58 00000064 _08000178:
"asm/crt0.s", line 60: Error: A1163E: Unknown opcode
   60 00000068  ldmhs r4!, {r2, r3, r7, ip}
"asm/crt0.s", line 61: Error: A1163E: Unknown opcode
   61 00000068  stmhs r5!, {r2, r3, r7, ip}
"asm/crt0.s", line 63: Error: A1163E: Unknown opcode
   63 0000006c  lsls r6, r6, #0x1d
"asm/crt0.s", line 64: Error: A1163E: Unknown opcode
   64 0000006c  ldmhs r4!, {r2, r3}
"asm/crt0.s", line 65: Error: A1163E: Unknown opcode
   65 0000006c  stmhs r5!, {r2, r3}
"asm/crt0.s", line 69: Error: A1167E: Invalid line start
   69 00000078 _080001A0:
"asm/crt0.s", line 79: Error: A1167E: Invalid line start
   79 0000009c _080001C4:
"asm/crt0.s", line 82: Error: A1163E: Unknown opcode
   82 000000a4  ldm r2!, {r4, r5}
"asm/crt0.s", line 88: Error: A1167E: Invalid line start
   88 000000b8 _080001E4:
"asm/crt0.s", line 90: Error: A1163E: Unknown opcode
   90 000000bc  stmhs r4!, {r0, r6, r7, fp}
"asm/crt0.s", line 92: Error: A1163E: Unknown opcode
   92 000000c0  lsls r5, r5, #0x1d
"asm/crt0.s", line 93: Error: A1163E: Unknown opcode
   93 000000c0  stmhs r4!, {r0, r6}
"asm/crt0.s", line 96: Error: A1167E: Invalid line start
   96 000000c8 _08000200:
"asm/crt0.s", line 97: Error: A1137E: Unexpected characters at end of line
   97 000000c8  .byte 0x78, 0x15, 0x05, 0x00, 0xA8, 0x15, 0x05, 0x00, 0xA8, 0x15, 0x05, 0x00, 0xC8, 0x15, 0x05, 0x00
"asm/crt0.s", line 98: Warning: A1313W: Missing END directive at end of file
   98 000000c8
49 Errors, 2 Warnings
make: *** [Makefile:209: build/winxclub/asm/crt0.o] Error 1

Oh no. Let’s break down all of these errors.

“asm/crt0.s”, line 2: Error: A1163E: Unknown opcode
2 00000000 arm_func_start start

Looks like we’re missing an include for some macros.

“asm/crt0.s”, line 3: Error: A1167E: Invalid line start
3 00000000 start: @ 0x08000000

GNU to ADS issue, labels don’t end with : in ADS

“asm/crt0.s”, line 4: Error: A1105E: Area directive missing
4 00000000 b _08000100

Looks like we need to include an ` AREA text, CODE` directive

“asm/crt0.s”, line 6: Error: A1137E: Unexpected characters at end of line
6 00000004 .byte 0x24, 0xFF, 0xAE, 0x51, 0x69, 0x9A, 0xA2, 0x21, 0x3D, 0x84, 0x82, 0x0A

As mentioned earlier, . directives aren’t in ADS. We’ll need to change this to DCB.

“asm/crt0.s”, line 25: Error: A1154E: Unexpected operand
25 0000000c ldr sp, _0800012C @ =0x03007FA0

Looks like ADS doesn’t use @ for comments, we’ll need to replace them with ;

“asm/crt0.s”, line 41: Error: A1163E: Unknown opcode
41 0000002c ldm r8, {r0, r1}

Unknown opcode is going to be a recurring theme here.

“asm/crt0.s”, line 98: Warning: A1313W: Missing END directive at end of file
98 000000c8

We’ve just got to add END at the end of each file, no big deal.

Resolving these errors

First, we’ve got to add a file to include the MACRO definitions. Here’s the code from FireRed ported over to ADS:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
	MACRO
$l	arm_func_start $name
	ALIGN 2, 0
	GLOBAL $name
	CODE32
	MEND

	MACRO
$l	arm_func_end $name
	SIZE $name, .-$name
	MEND

	MACRO
$l	thumb_func_start $name
	ALIGN 2, 0
	GLOBAL $name
	CODE16
	MEND

	MACRO
$l	non_word_aligned_thumb_func_start $name
	GLOBAL $name
	CODE16
	MEND

	MACRO
$l	thumb_func_end $name
	SIZE $name, .-$name
	MEND

	MACRO
$l	mov32 $reg, $addr
	IF $addr == 0x08800000
		movs $reg, #0x88
		lsls $reg, $reg, #0x14
	ELSE
		ldr $reg, =#$addr
	ENDIF
	MEND
	END

We can now just include this at the beginning of our .s files:

1
	INCLUDE asm/macros.inc

In hindsight, I absolutely should’ve modified gbadisasm for this. I’ve written a series of scripts that I used to fix these issues up, but I think it would be worthwhile to just fork it instead and fix the core issue. As an example, this is the script I wrote to fix our opcodes.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
import re
import fileinput

replacements = {
    "arm": {
        "ldrblo": "LDRCCB",
        "ldrbhs": "LDRCSB",
        "ldm ": "LDMIA ",
        "stm ": "STMIA ",
        "stmhs ": "STMHSIA ",
        "stmmi ": "STMMIIA ",
        "ldmhs ": "LDMHSIA ",
        "ldmmi ": "LDMMIIA ",
        "rrxne r0, r0": "MOVNE r0, r0",
        r"subs(\w\w) ": r"sub\1s ",
        ".align 2, 0": "ALIGN",
    },
    "thumb": {
        r"rsbs r(\d), r(\d), #0": r"NEGS r\1, r\2",
        "svc #": "SWI ",
        ".align 2, 0": "ALIGN",
        r"muls(.*), r\d": r"muls\1",
    }
}

rep_all = {
    " @": " ;@",
    ":": "",
    ".4byte": "DCDU",
    ".byte": "DCB"
}

condition_translation = {
    "lo": "cc",
    "hs": "cs"
}

def handle_pop_push(line):
    extract = r"(pop|push)(\w\w)? (.*)"
    result = re.search(extract, line)
    if not result:
        return line
    op, condition, end = result.groups()
    prefix = "LDM" if op == 'pop' else "STM"
    return f"\t{prefix}{condition.upper() if condition else ''}FD SP!, {end}"

def handle_log_shift(line):
    extract = r"(l|a)s(r|l)([bs]?)(\w\w)? (\S\S), (\S\S), (#?.*)"
    result = re.search(extract, line)
    if not result:
        return line
    t, direction, b, condition, r1, r2, op = result.groups()
    if condition and condition in condition_translation.keys():
        condition = condition_translation[condition]
    return(f"\tMOV{condition.upper() if condition else ''}{b.upper() if b else ''} {r1}, {r2}, {t.upper()}S{direction.upper()} {op}")

def handle_ldr_str(line):
    extract = r"(ldr|str)(b?)(\w\w)?(\w\w)? (.*)"
    result = re.search(extract, line)
    if not result:
        return line
    op, b, sh, condition, end = result.groups()
    return(f"\t{op.upper() if op else ''}{condition.upper() if condition else ''}{sh.upper() if sh else ''}{b.upper() if b else ''} {end}")

current_mode = None
for line in fileinput.input(inplace=True, backup='.bak'):
    if 'arm_' in line:
        current_mode = "arm"
    elif 'thumb_' in line:
        current_mode = "thumb"
    for x in rep_all.keys():
        line = re.sub(x, rep_all[x], line)
    if line.startswith('	'):
        if current_mode is None:
            print(line.rstrip())
            continue
        for pattern in replacements[current_mode].keys():
            line = re.sub(pattern, replacements[current_mode][pattern], line)
        if current_mode == 'arm':
            line = handle_pop_push(line)
            line = handle_log_shift(line)
            line = handle_ldr_str(line)
        print(line.rstrip())
    else:
        print(line.rstrip())

To summarize all of the changes we have to make to our .s files:

  1. Include a file of macros for func definitions
  2. Replace all @ comments with ;
  3. Remove : at the end of label definitions
  4. Replace .align with ALIGN
  5. Replace .4byte with DCDU
  6. Replace .bytes with DCB
  7. Replace a bunch of opcodes with their ADS equivalents
  8. Change the lo and hs conditions to cc and cs

While yes, our scripts can handle this, it’s exceptionally brittle. Every time we export from .cfg to .s, we’re going to have to remember to run our scripts. We could make a pipeline for it, but that’s really a bandaid over the core problem. Regardless, with enough massaging, we can get a matching ROM created.

Conclusion

With a bit of jank, we were able to take our .gba file and extract out a ton of code. We’re also able to assemble it back into a matching ROM.
We’re still left with quite a few extra blocks of data that we’re going to need to sort out. Plus we need to set up our compilation process.

© 2024 Matt Hurd   •  Theme  Moonwalk