Disassembling Winx
Matt Hurd / March 2023 (6268 Words, 35 Minutes)
In our last post, we explored starting our GBA decomp project and set up a basic repo for an ADS v1.2 game. Now it’s time to get some real code in here.
Introducing gbadisasm
At the time of writing this, I know of 3 forks of the the gbadisasm repo. The original is by camthesaxman. It was then forked by PikalaxALT (which fell behind), but this was later forked by jiangzhengwenjz. I’ll be playing with both jiang’s and cam’s forks to try to find which is best. Jiang’s is definitely the most recently updated repo, but since they’ve diverged, it’s important to check both of them. Keep in mind that these were both made with GCC in mind, so yeah…
In hindsight, I absolutely should’ve forked this repo myself and made changes. At some point I might, but for this project I ended up just writing a series of Python scripts that modified the output to work for me.
Diving into Jiang’s gbadisasm
Jiang’s gbadisasm fork has the best documentation (README), so let’s move forward with it for now. Running it without any config file on our baserom.gba, we get:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
arm_func_start sub_08000000
sub_08000000: @ 0x08000000
b _08000100
_08000004:
.byte 0x24, 0xFF, 0xAE, 0x51, 0x69, 0x9A, 0xA2, 0x21, 0x3D, 0x84, 0x82, 0x0A
.byte 0x84, 0xE4, 0x09, 0xAD, 0x11, 0x24, 0x8B, 0x98, 0xC0, 0x81, 0x7F, 0x21, 0xA3, 0x52, 0xBE, 0x19
.byte 0x93, 0x09, 0xCE, 0x20, 0x10, 0x46, 0x4A, 0x4A, 0xF8, 0x27, 0x31, 0xEC, 0x58, 0xC7, 0xE8, 0x33
.byte 0x82, 0xE3, 0xCE, 0xBF, 0x85, 0xF4, 0xDF, 0x94, 0xCE, 0x4B, 0x09, 0xC1, 0x94, 0x56, 0x8A, 0xC0
.byte 0x13, 0x72, 0xA7, 0xFC, 0x9F, 0x84, 0x4D, 0x73, 0xA3, 0xCA, 0x9A, 0x61, 0x58, 0x97, 0xA3, 0x27
.byte 0xFC, 0x03, 0x98, 0x76, 0x23, 0x1D, 0xC7, 0x61, 0x03, 0x04, 0xAE, 0x56, 0xBF, 0x38, 0x84, 0x00
.byte 0x40, 0xA7, 0x0E, 0xFD, 0xFF, 0x52, 0xFE, 0x03, 0x6F, 0x95, 0x30, 0xF1, 0x97, 0xFB, 0xC0, 0x85
.byte 0x60, 0xD6, 0x80, 0x25, 0xA9, 0x63, 0xBE, 0x03, 0x01, 0x4E, 0x38, 0xE2, 0xF9, 0xA2, 0x34, 0xFF
.byte 0xBB, 0x3E, 0x03, 0x44, 0x78, 0x00, 0x90, 0xCB, 0x88, 0x11, 0x3A, 0x94, 0x65, 0xC0, 0x7C, 0x63
.byte 0x87, 0xF0, 0x3C, 0xAF, 0xD6, 0x25, 0xE4, 0x8B, 0x38, 0x0A, 0xAC, 0x72, 0x21, 0xD4, 0xF8, 0x07
.byte 0x57, 0x49, 0x4E, 0x58, 0x43, 0x4C, 0x55, 0x42, 0x00, 0x00, 0x00, 0x00, 0x42, 0x57, 0x49, 0x45
.byte 0x41, 0x34, 0x96, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x49, 0x00, 0x00
.byte 0x0E, 0x00, 0x00, 0xEA, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00
.byte 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00
.byte 0x06, 0x00, 0x00, 0xEA, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00
.byte 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00
_08000100:
mov r0, #0x12
msr cpsr_fc, r0
ldr sp, _0800012C @ =0x03007FA0
mov r0, #0x1f
msr cpsr_fc, r0
ldr sp, _08000128 @ =0x03007F00
ldr r1, _08000130 @ =0x08000134
mov lr, pc
bx r1
b _08000100
.align 2, 0
_08000128: .4byte 0x03007F00
_0800012C: .4byte 0x03007FA0
_08000130: .4byte 0x08000134
That’s not a lot of code. In fact, it’s just the crt0. Also of note: these .<directive>
commands are GCC specific and do not work on ADS. We’re going to have a lot of cleanup, but that’s what scripting is for. Now we’re at a point where we need to find the rest of the code. It’s time to break out IDA.
Introducing IDA
IDA has been my go-to for years. I started using it for the Wildstar private server project NexusForever since the other devs were using it, and it’s stuck. While you could also use Ghidra, most of the tooling I’ve found around GBA is with IDA. For those unfamiliar with IDA, it’s a dissassembler that’s used to explore programs and see what secrets are hiding in them.
Before we even open up IDA, let’s install some basic tooling that has been built to assist with GBA games. Laqieer’s ida_gba_stuff repo has what we need. GBA_Loader
is going to be the most useful, but let’s copy everything over. The only thing to be aware of is that cfg/idagui.cfg
will overwrite some IDA defaults and isn’t updated past IDA 7.0, so you likely won’t want to use it. Consider copying over the file extensions and adding GBA_ROM
to the DEFAULT_FILE_FILTER
block of your own cfg though.
With that set up, we’re ready to open up IDA. When loading a .gba
file into IDA, it should now default to ARM7TDMI
using GBA_Loader.py
. Looking at IDA’s console, we see:
1
2
3
4
5
6
7
8
9
Detected file format: Game Boy Advance ROM: ARM7TDMI
0. Creating a new segment (00000000-00004000) ... ... OK
1. Creating a new segment (02000000-02040000) ... ... OK
2. Creating a new segment (03000000-03008000) ... ... OK
3. Creating a new segment (04000000-04000400) ... ... OK
4. Creating a new segment (05000000-05000400) ... ... OK
5. Creating a new segment (06000000-06018000) ... ... OK
6. Creating a new segment (07000000-07000400) ... ... OK
7. Creating a new segment (08000000-0A000000) ... ... OK
This is great. If you’ve been reading up on GBATek, you’ll recognize these address ranges.
Once IDA has finished it’s first autoanalysis pass, we end up with 1146 Functions found. Included in laqieer’s repo is an incredibly helpful script that lets you convert your IDA database (idb) to the .cfg
file for gbadisasm. Let’s see how that works out.
- First we’ll run the script through ida:
File -> Script File... -> Select <ida_dir>/idc/export_gbadisasm_config.idc
, select output file. - Re-run gbadisasm
./gbadisasm/gbadisasm baserom.gba -c output.cfg
And our output:
1
2
gbadisasm: disasm.c:604: analyze: Assertion `tmp_cnt == 1' failed.
Aborted
Oh. That’s not great. For debugging this, I ended up making some slight modifications to gbadisasm to give some more information on which function the error occured at. For this case, it looks like IDA marked a lot of data as code, so gbadisasm failed trying to disassemble it. In IDA, the best way to handle this is to create some new segments. It looks like our last valid function is sub_805582C
, so we’ll change the main code segment to end after it. After creating a new “DATA” segment for 08055ABC
to 0A000000
, we can delete all of the functions within it.
Trying the export script and gbadisasm again, our output looks much better:
1
2
$ ./gbadisasm/gbadisasm baserom.gba -c test.cfg | wc -l
89003
Let’s save it to a file and check it out
1
./gbadisasm/gbadisasm baserom.gba -c test.cfg > test.s
Analysing our dissassembly
Scrolling over our dissassembly, we’ve extracted a lot of code. We’ve also got some small blocks of data, a few medium blocks of data, as well as some large blocks. Let’s dive into some solid examples. The first small block I see is here:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
thumb_func_start sub_8000210
sub_8000210: @ 0x08000210
b _08000216
_08000212:
adds r0, #1
adds r1, #1
_08000216:
ldrb r3, [r0]
cmp r3, #0
beq _08000228
ldrb r2, [r1]
cmp r2, #0
beq _08000232
cmp r3, r2
beq _08000212
b _08000232
_08000228:
ldrb r0, [r1]
cmp r0, #0
bne _08000232
movs r0, #1
bx lr
_08000232:
movs r0, #0
bx lr
_08000236:
.byte 0x30, 0xB4, 0x19, 0xE0, 0x00, 0x23, 0xC4, 0x56, 0x23, 0x1C
.byte 0x61, 0x3B, 0x19, 0x2B, 0x00, 0xD8, 0x20, 0x3C, 0x25, 0x06, 0x00, 0x23, 0xCC, 0x56, 0x2D, 0x16
.byte 0x23, 0x1C, 0x61, 0x3B, 0x19, 0x2B, 0x00, 0xD8, 0x20, 0x3C, 0x23, 0x06, 0x1B, 0x16, 0x9D, 0x42
.byte 0x02, 0xD0, 0x58, 0x1B, 0x30, 0xBC, 0x70, 0x47, 0x01, 0x30, 0x01, 0x31, 0x01, 0x3A, 0x03, 0x78
.byte 0x00, 0x2B, 0x05, 0xD0, 0x0B, 0x78, 0x00, 0x2B, 0x02, 0xD0, 0x00, 0x2A, 0xDD, 0xD1, 0x01, 0xE0
.byte 0x00, 0x2A, 0x01, 0xD1, 0x00, 0x20, 0xED, 0xE7, 0x00, 0x23, 0xC9, 0x56, 0xC0, 0x56, 0x08, 0x1A
.byte 0xE8, 0xE7
non_word_aligned_thumb_func_start sub_8000292
sub_8000292: @ 0x08000292
ldrb r2, [r0]
movs r1, #0
cmp r2, #0
beq _080002A4
_0800029A:
adds r0, #1
ldrb r2, [r0]
adds r1, #1
cmp r2, #0
bne _0800029A
_080002A4:
adds r0, r1, #0
bx lr
Jumping to it in IDA, we can see that this looks like some real code that’s being missed:
Looking a bit further down, we see different case where we have a block of data mid-function
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
_0800FBD2:
ldr r1, _0800FD3C @ =0x03003E84
movs r3, #0
ldr r5, [r1]
adds r1, r0, #0
movs r2, #0x48
adds r0, r5, #0
bl sub_800529A
adds r6, r0, #0
ldr r0, [r0, #4]
lsls r0, r0, #0x16
lsrs r0, r0, #0x1c
cmp r0, #6
bhs _0800FC58
add r3, pc, #0x8
ldrb r3, [r3, r0]
lsls r3, r3, #1
add pc, r3
movs r0, r0
lsls r0, r6, #0xc
asrs r3, r0, #8
movs r1, #0x32
bl sub_8005106
adds r1, r0, #0
movs r3, #0
movs r2, #0
movs r0, #0x6c
bl sub_803DA80
adds r5, r0, #0
beq _0800FC18
adds r0, r5, #0
bl sub_800FAB0
_0800FC18:
str r5, [r4]
b _0800FC5C
_0800FC1C:
.byte 0xF5, 0xF7, 0x73, 0xFA
.byte 0x01, 0x1C, 0x00, 0x23, 0x00, 0x22, 0x78, 0x20, 0x2D, 0xF0, 0x2A, 0xFF, 0x05, 0x1C, 0x02, 0xD0
.byte 0x28, 0x1C, 0xFF, 0xF7, 0x17, 0xFB, 0x25, 0x60, 0x10, 0xE0, 0xF5, 0xF7, 0x64, 0xFA, 0x01, 0x1C
.byte 0x00, 0x23, 0x00, 0x22, 0x8C, 0x20, 0x2D, 0xF0, 0x1B, 0xFF, 0x05, 0x1C, 0x02, 0xD0, 0x28, 0x1C
.byte 0xFF, 0xF7, 0x6C, 0xFD, 0x25, 0x60, 0x01, 0xE0
_0800FC58:
movs r0, #0
str r0, [r4]
Looking in IDA, we can see this is actually part of a jump table
.
Looking at gbadisasm’s source code, it doesn’t handle jump tables that are defined this way. We’re going to have to figure this one out ourselves.
The final big block is at 803E2A0
. Looking in IDA, this is just pure data. Let’s just leave it as is for now. This will have to be split out and broken up later, but for now it’s not a concern.
Today, let’s tackle getting all of our functions defined.
Defining all of the functions
First, let’s get a list of all of the potential holes in our code. Disclaimer: All of the following scripts were made to be 1-offs, so they’re not pretty.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
def handle_found(mode, address):
# Original naive implementation
# print(f"{current_mode}_func 0x{current_address} sub_{current_address}")
if mode == "thumb":
print(current_address)
current_address = None
byte_count = 0
current_mode = None
with open("code.s", 'r') as f:
for line in f:
if 'arm_' in line:
current_mode = "arm"
elif 'thumb_' in line:
current_mode = "thumb"
if line.startswith('_'):
current_address = line[2:-2]
byte_count = 0
elif line.startswith(' .byte '):
byte_count += 1
else:
byte_count = 0
if byte_count == 3:
handle_found(current_mode, current_address)
In this script, we’re specifically targetting thumb
functions. We’re looking for at least 3 lines of bytes to increase the chance we’re actually hitting code rather than data. This gives us 236 potential functions. This will give us a list of addresses that might be functions. Now let’s move back over to ida and try to define them all. IDA’s python scripting is very useful.
1
2
3
4
5
funcs = ["8000236",
"800088C",
...]
for func in funcs:
ida_funcs.add_func(int(func, 16))
Now we can go through the .cfg
-> .s
cycle again, and we should have more functions defined. Of course there are still some remaining gaps, so let’s just repeat this process until we’re no longer making any progress. After repeating this process a few times, we’re now sitting at 1341 functions defined. Each cycle now is only getting us a few new functions, so let’s move on. We can always define any functions manually in IDA in the future.
Putting it all together
We’ve now got one massive .s
file, so we’re going to want to do a sanity check to see if we can build with it. We can split out the code from 0x8000000 to 0x8000210 into asm/crt0.s
, and the rest into asm/code.s
. With a small tweak to our scatter_script.txt
(changing data.o
to code.o
), we should be good to go. Let’s run make
.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
$ tools/ADSv1_2/bin/armasm.exe -CPU arm7tdmi -LIttleend -apcs "/interwork" -I asminclude -I include -o build/winxclub/asm/crt0.o asm/crt0.s
"asm/crt0.s", line 2: Error: A1163E: Unknown opcode
2 00000000 arm_func_start start
"asm/crt0.s", line 3: Error: A1167E: Invalid line start
3 00000000 start: @ 0x08000000
"asm/crt0.s", line 4: Error: A1105E: Area directive missing
4 00000000 b _08000100
"asm/crt0.s", line 4: Warning: A1088W: Faking declaration of area AREA |$$$$$$$|
4 00000000 b _08000100
"asm/crt0.s", line 5: Error: A1167E: Invalid line start
5 00000004 _08000004:
"asm/crt0.s", line 6: Error: A1137E: Unexpected characters at end of line
6 00000004 .byte 0x24, 0xFF, 0xAE, 0x51, 0x69, 0x9A, 0xA2, 0x21, 0x3D, 0x84, 0x82, 0x0A
"asm/crt0.s", line 7: Error: A1137E: Unexpected characters at end of line
7 00000004 .byte 0x84, 0xE4, 0x09, 0xAD, 0x11, 0x24, 0x8B, 0x98, 0xC0, 0x81, 0x7F, 0x21, 0xA3, 0x52, 0xBE, 0x19
"asm/crt0.s", line 8: Error: A1137E: Unexpected characters at end of line
8 00000004 .byte 0x93, 0x09, 0xCE, 0x20, 0x10, 0x46, 0x4A, 0x4A, 0xF8, 0x27, 0x31, 0xEC, 0x58, 0xC7, 0xE8, 0x33
"asm/crt0.s", line 9: Error: A1137E: Unexpected characters at end of line
9 00000004 .byte 0x82, 0xE3, 0xCE, 0xBF, 0x85, 0xF4, 0xDF, 0x94, 0xCE, 0x4B, 0x09, 0xC1, 0x94, 0x56, 0x8A, 0xC0
"asm/crt0.s", line 10: Error: A1137E: Unexpected characters at end of line
10 00000004 .byte 0x13, 0x72, 0xA7, 0xFC, 0x9F, 0x84, 0x4D, 0x73, 0xA3, 0xCA, 0x9A, 0x61, 0x58, 0x97, 0xA3, 0x27
"asm/crt0.s", line 11: Error: A1137E: Unexpected characters at end of line
11 00000004 .byte 0xFC, 0x03, 0x98, 0x76, 0x23, 0x1D, 0xC7, 0x61, 0x03, 0x04, 0xAE, 0x56, 0xBF, 0x38, 0x84, 0x00
"asm/crt0.s", line 12: Error: A1137E: Unexpected characters at end of line
12 00000004 .byte 0x40, 0xA7, 0x0E, 0xFD, 0xFF, 0x52, 0xFE, 0x03, 0x6F, 0x95, 0x30, 0xF1, 0x97, 0xFB, 0xC0, 0x85
"asm/crt0.s", line 13: Error: A1137E: Unexpected characters at end of line
13 00000004 .byte 0x60, 0xD6, 0x80, 0x25, 0xA9, 0x63, 0xBE, 0x03, 0x01, 0x4E, 0x38, 0xE2, 0xF9, 0xA2, 0x34, 0xFF
"asm/crt0.s", line 14: Error: A1137E: Unexpected characters at end of line
14 00000004 .byte 0xBB, 0x3E, 0x03, 0x44, 0x78, 0x00, 0x90, 0xCB, 0x88, 0x11, 0x3A, 0x94, 0x65, 0xC0, 0x7C, 0x63
"asm/crt0.s", line 15: Error: A1137E: Unexpected characters at end of line
15 00000004 .byte 0x87, 0xF0, 0x3C, 0xAF, 0xD6, 0x25, 0xE4, 0x8B, 0x38, 0x0A, 0xAC, 0x72, 0x21, 0xD4, 0xF8, 0x07
"asm/crt0.s", line 16: Error: A1137E: Unexpected characters at end of line
16 00000004 .byte 0x57, 0x49, 0x4E, 0x58, 0x43, 0x4C, 0x55, 0x42, 0x00, 0x00, 0x00, 0x00, 0x42, 0x57, 0x49, 0x45
"asm/crt0.s", line 17: Error: A1137E: Unexpected characters at end of line
17 00000004 .byte 0x41, 0x34, 0x96, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x49, 0x00, 0x00
"asm/crt0.s", line 18: Error: A1137E: Unexpected characters at end of line
18 00000004 .byte 0x0E, 0x00, 0x00, 0xEA, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00
"asm/crt0.s", line 19: Error: A1137E: Unexpected characters at end of line
19 00000004 .byte 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00
"asm/crt0.s", line 20: Error: A1137E: Unexpected characters at end of line
20 00000004 .byte 0x06, 0x00, 0x00, 0xEA, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00
"asm/crt0.s", line 21: Error: A1137E: Unexpected characters at end of line
21 00000004 .byte 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00
"asm/crt0.s", line 22: Error: A1167E: Invalid line start
22 00000004 _08000100:
"asm/crt0.s", line 25: Error: A1154E: Unexpected operand
25 0000000c ldr sp, _0800012C @ =0x03007FA0
"asm/crt0.s", line 28: Error: A1154E: Unexpected operand
28 00000018 ldr sp, _08000128 @ =0x03007F00
"asm/crt0.s", line 29: Error: A1154E: Unexpected operand
29 0000001c ldr r1, _08000130 @ =sub_8000134
"asm/crt0.s", line 33: Error: A1137E: Unexpected characters at end of line
33 0000002c .align 2, 0
"asm/crt0.s", line 34: Error: A1167E: Invalid line start
34 0000002c _08000128: .4byte 0x03007F00
"asm/crt0.s", line 35: Error: A1167E: Invalid line start
35 0000002c _0800012C: .4byte 0x03007FA0
"asm/crt0.s", line 36: Error: A1167E: Invalid line start
36 0000002c _08000130: .4byte sub_8000134
"asm/crt0.s", line 38: Error: A1163E: Unknown opcode
38 0000002c arm_func_start sub_8000134
"asm/crt0.s", line 39: Error: A1167E: Invalid line start
39 0000002c sub_8000134: @ 0x08000134
"asm/crt0.s", line 40: Error: A1154E: Unexpected operand
40 0000002c add r8, pc, #0xC4 @ =0x08000200
"asm/crt0.s", line 41: Error: A1163E: Unknown opcode
41 0000002c ldm r8, {r0, r1}
"asm/crt0.s", line 45: Error: A1167E: Invalid line start
45 00000038 _08000148:
"asm/crt0.s", line 48: Error: A1163E: Unknown opcode
48 00000040 ldm r0!, {r4, r5, r6}
"asm/crt0.s", line 58: Error: A1167E: Invalid line start
58 00000064 _08000178:
"asm/crt0.s", line 60: Error: A1163E: Unknown opcode
60 00000068 ldmhs r4!, {r2, r3, r7, ip}
"asm/crt0.s", line 61: Error: A1163E: Unknown opcode
61 00000068 stmhs r5!, {r2, r3, r7, ip}
"asm/crt0.s", line 63: Error: A1163E: Unknown opcode
63 0000006c lsls r6, r6, #0x1d
"asm/crt0.s", line 64: Error: A1163E: Unknown opcode
64 0000006c ldmhs r4!, {r2, r3}
"asm/crt0.s", line 65: Error: A1163E: Unknown opcode
65 0000006c stmhs r5!, {r2, r3}
"asm/crt0.s", line 69: Error: A1167E: Invalid line start
69 00000078 _080001A0:
"asm/crt0.s", line 79: Error: A1167E: Invalid line start
79 0000009c _080001C4:
"asm/crt0.s", line 82: Error: A1163E: Unknown opcode
82 000000a4 ldm r2!, {r4, r5}
"asm/crt0.s", line 88: Error: A1167E: Invalid line start
88 000000b8 _080001E4:
"asm/crt0.s", line 90: Error: A1163E: Unknown opcode
90 000000bc stmhs r4!, {r0, r6, r7, fp}
"asm/crt0.s", line 92: Error: A1163E: Unknown opcode
92 000000c0 lsls r5, r5, #0x1d
"asm/crt0.s", line 93: Error: A1163E: Unknown opcode
93 000000c0 stmhs r4!, {r0, r6}
"asm/crt0.s", line 96: Error: A1167E: Invalid line start
96 000000c8 _08000200:
"asm/crt0.s", line 97: Error: A1137E: Unexpected characters at end of line
97 000000c8 .byte 0x78, 0x15, 0x05, 0x00, 0xA8, 0x15, 0x05, 0x00, 0xA8, 0x15, 0x05, 0x00, 0xC8, 0x15, 0x05, 0x00
"asm/crt0.s", line 98: Warning: A1313W: Missing END directive at end of file
98 000000c8
49 Errors, 2 Warnings
make: *** [Makefile:209: build/winxclub/asm/crt0.o] Error 1
Oh no. Let’s break down all of these errors.
“asm/crt0.s”, line 2: Error: A1163E: Unknown opcode
2 00000000 arm_func_start start
Looks like we’re missing an include for some macros.
“asm/crt0.s”, line 3: Error: A1167E: Invalid line start
3 00000000 start: @ 0x08000000
GNU to ADS issue, labels don’t end with :
in ADS
“asm/crt0.s”, line 4: Error: A1105E: Area directive missing
4 00000000 b _08000100
Looks like we need to include an ` AREA text, CODE` directive
“asm/crt0.s”, line 6: Error: A1137E: Unexpected characters at end of line
6 00000004 .byte 0x24, 0xFF, 0xAE, 0x51, 0x69, 0x9A, 0xA2, 0x21, 0x3D, 0x84, 0x82, 0x0A
As mentioned earlier, .
directives aren’t in ADS. We’ll need to change this to DCB.
“asm/crt0.s”, line 25: Error: A1154E: Unexpected operand
25 0000000c ldr sp, _0800012C @ =0x03007FA0
Looks like ADS doesn’t use @
for comments, we’ll need to replace them with ;
“asm/crt0.s”, line 41: Error: A1163E: Unknown opcode
41 0000002c ldm r8, {r0, r1}
Unknown opcode
is going to be a recurring theme here.
“asm/crt0.s”, line 98: Warning: A1313W: Missing END directive at end of file
98 000000c8
We’ve just got to add END
at the end of each file, no big deal.
Resolving these errors
First, we’ve got to add a file to include the MACRO definitions. Here’s the code from FireRed ported over to ADS:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
MACRO
$l arm_func_start $name
ALIGN 2, 0
GLOBAL $name
CODE32
MEND
MACRO
$l arm_func_end $name
SIZE $name, .-$name
MEND
MACRO
$l thumb_func_start $name
ALIGN 2, 0
GLOBAL $name
CODE16
MEND
MACRO
$l non_word_aligned_thumb_func_start $name
GLOBAL $name
CODE16
MEND
MACRO
$l thumb_func_end $name
SIZE $name, .-$name
MEND
MACRO
$l mov32 $reg, $addr
IF $addr == 0x08800000
movs $reg, #0x88
lsls $reg, $reg, #0x14
ELSE
ldr $reg, =#$addr
ENDIF
MEND
END
We can now just include this at the beginning of our .s
files:
1
INCLUDE asm/macros.inc
In hindsight, I absolutely should’ve modified gbadisasm
for this. I’ve written a series of scripts that I used to fix these issues up, but I think it would be worthwhile to just fork it instead and fix the core issue. As an example, this is the script I wrote to fix our opcodes.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
import re
import fileinput
replacements = {
"arm": {
"ldrblo": "LDRCCB",
"ldrbhs": "LDRCSB",
"ldm ": "LDMIA ",
"stm ": "STMIA ",
"stmhs ": "STMHSIA ",
"stmmi ": "STMMIIA ",
"ldmhs ": "LDMHSIA ",
"ldmmi ": "LDMMIIA ",
"rrxne r0, r0": "MOVNE r0, r0",
r"subs(\w\w) ": r"sub\1s ",
".align 2, 0": "ALIGN",
},
"thumb": {
r"rsbs r(\d), r(\d), #0": r"NEGS r\1, r\2",
"svc #": "SWI ",
".align 2, 0": "ALIGN",
r"muls(.*), r\d": r"muls\1",
}
}
rep_all = {
" @": " ;@",
":": "",
".4byte": "DCDU",
".byte": "DCB"
}
condition_translation = {
"lo": "cc",
"hs": "cs"
}
def handle_pop_push(line):
extract = r"(pop|push)(\w\w)? (.*)"
result = re.search(extract, line)
if not result:
return line
op, condition, end = result.groups()
prefix = "LDM" if op == 'pop' else "STM"
return f"\t{prefix}{condition.upper() if condition else ''}FD SP!, {end}"
def handle_log_shift(line):
extract = r"(l|a)s(r|l)([bs]?)(\w\w)? (\S\S), (\S\S), (#?.*)"
result = re.search(extract, line)
if not result:
return line
t, direction, b, condition, r1, r2, op = result.groups()
if condition and condition in condition_translation.keys():
condition = condition_translation[condition]
return(f"\tMOV{condition.upper() if condition else ''}{b.upper() if b else ''} {r1}, {r2}, {t.upper()}S{direction.upper()} {op}")
def handle_ldr_str(line):
extract = r"(ldr|str)(b?)(\w\w)?(\w\w)? (.*)"
result = re.search(extract, line)
if not result:
return line
op, b, sh, condition, end = result.groups()
return(f"\t{op.upper() if op else ''}{condition.upper() if condition else ''}{sh.upper() if sh else ''}{b.upper() if b else ''} {end}")
current_mode = None
for line in fileinput.input(inplace=True, backup='.bak'):
if 'arm_' in line:
current_mode = "arm"
elif 'thumb_' in line:
current_mode = "thumb"
for x in rep_all.keys():
line = re.sub(x, rep_all[x], line)
if line.startswith(' '):
if current_mode is None:
print(line.rstrip())
continue
for pattern in replacements[current_mode].keys():
line = re.sub(pattern, replacements[current_mode][pattern], line)
if current_mode == 'arm':
line = handle_pop_push(line)
line = handle_log_shift(line)
line = handle_ldr_str(line)
print(line.rstrip())
else:
print(line.rstrip())
To summarize all of the changes we have to make to our .s
files:
- Include a file of macros for func definitions
- Replace all
@
comments with;
- Remove
:
at the end of label definitions - Replace
.align
withALIGN
- Replace
.4byte
withDCDU
- Replace
.bytes
with DCB - Replace a bunch of opcodes with their ADS equivalents
- Change the
lo
andhs
conditions tocc
andcs
While yes, our scripts can handle this, it’s exceptionally brittle. Every time we export from .cfg
to .s
, we’re going to have to remember to run our scripts. We could make a pipeline for it, but that’s really a bandaid over the core problem. Regardless, with enough massaging, we can get a matching ROM created.
Conclusion
With a bit of jank, we were able to take our .gba
file and extract out a ton of code. We’re also able to assemble it back into a matching ROM.
We’re still left with quite a few extra blocks of data that we’re going to need to sort out. Plus we need to set up our compilation process.