Misc Winx Cleanup
Matt Hurd / April 2023 (1499 Words, 9 Minutes)
A bit of time has passed, and I’ve forgotten to document my exact process for some cleanup tasks. I’m going to try to rapid-fire some of them here. They may be out of order of what I did, but hopefully this can provide some guidance for massaging gbadisasm.
Cleaning up literal pools
Now that we’ve gotten most of the code being extracted, it’ll drastically improve readability if we can define our literal pools correctly. Literal pools are small sections of memory allocated by the compiler at the end of functions. These will often include addresses, so it’s good to use .4byte
or DCDU
for them.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
def handle_found(start_address, end_address):
start = int(start_address, 16)
end = int(end_address, 16)
diff = end - start
count = int(diff / 4)
print(f"pool_label {hex(start)} {count}")
start_address = None
byte_count = 0
current_mode = None
in_pool = False
with open("code.s", 'r') as f:
for line in f:
if line.startswith('_'):
if in_pool:
in_pool = False
handle_found(start_address, line[2:9])
start_address = line[2:9]
if line.startswith('sub_'):
if in_pool:
in_pool = False
handle_found(start_address, line[4:11])
start_address = line[4:11]
elif line.startswith(' .byte ') and len(line) > 20:
in_pool = True
This code looks for sections at the end of functions that have a series of byte declarations. Defining them in the .cfg
will make sure that they’re actually written as .4byte
, so we can extract addresses from them later.
Splitting out data
I’d known for a while that there was a big block of data in my code. This isn’t a problem , but it’s much better to actually extract the data into binary files that can be referenced from code. This datablock goes from 0x0803E2A0
to 0x0803EF1C
. I’d initially thought it was larger, but it’s only about 3196 bytes long. Still long enough to where it makes sense to extract it.
First, I wanted to find all of the potential files. We can do this by looking for all references to data within this range.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
import re
used = set()
for name in ["code.s"]:
with open(f"./{name}", 'r') as f:
for line in f:
if '_0' in line:
for m in re.finditer('_0', line):
pos = m.start()
if pos == 0:
continue
val = int(line[pos + 1: pos + 9], 16)
if val >= 0x0803E2A0 and val <= 0x0803EF1C:
used.add(val)
addrs = sorted(list(used))
print(" AREA data, DATA")
with open("repo/baserom.gba", mode='rb') as baserom:
for idx, x in enumerate(addrs):
if idx == len(addrs) - 1:
end = 0x0803EF1C
else:
end = addrs[idx + 1]
length = end - x
out_filename = f"{x - 0x08000000:#010X}.bin"
print(f" GLOBAL gUnknown_{x:08X}")
print(f"gUnknown_{x:08X}")
print(f" INCBIN data/ripped/{out_filename}")
baserom.seek(x - 0x08000000)
with open(f"repo/data/ripped/{out_filename}", 'wb') as output_file:
output_file.write(baserom.read(length))
print(" END")
print()
This outputs a data.s
file that we can use. We also need to extract all of the binary files from our ROM. It also creates .bin
files for each of the referenced addresses.
We also need to update our code to use the new names. Note: This code updates the file inplace. I used a few different techniques throughout these scripts…
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
import re
import fileinput
for line in fileinput.input(inplace=True, backup='.bak'):
if '_0' in line:
for m in re.finditer('_0', line):
pos = m.start()
if pos == 0:
continue
val = int(line[pos + 1: pos + 9], 16)
if val >= 0x0803E2A0 and val < 0x0803EF1C:
line = line.replace(f"_{val:08X}", f"gUnknown_{val:08X}")
print(line.rstrip())
else:
print(line.rstrip())
Finding IWRAM addresses
IWRAM is the Internal Working RAM. The key aspects of it is that it is 32kb large and uses a 32 bit bus. In practice, this is the RAM that we’ll see absolute pointers to. For example, here’s a random literal pool:
1
2
3
4
5
6
7
_0800E920: .4byte 0x04000200
_0800E924: .4byte 0x03003438
_0800E928: .4byte 0x03003E94
_0800E92C: .4byte 0x03003EB8
_0800E930: .4byte 0x03003EA4
_0800E934: .4byte 0x03007FF8
_0800E938: .4byte 0x03003E98
All of these 0x0300xxxx
addresses point into IWRAM. Extracting them out into their own labels will drastically improve readability, and help us document code later on. Thankfully, we can reuse a good portion of our data splitting code above:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
import re
import fileinput
iwram_addresses = []
for line in fileinput.input(inplace=True, backup='.bak'):
if '_0' in line:
for m in re.finditer('_0', line):
pos = m.start()
if pos == 0:
continue
val = int(line[pos + 1: pos + 9], 16)
if val >= 0x03000000 and val < 0x03007FFF:
line = line.replace(f"_{val:08X}", f"gUnknown_{val:08X}")
iwram_addresses.append(val)
print(line.rstrip())
else:
print(line.rstrip())
prev = None
print(" AREA data, DATA")
for addr in iwram_addresses:
if prev is not None:
print(f"\tSPACE {addr-prev-1:#x}")
print()
print(f"\tGLOBAL gUnknown_{addr:08X}")
print(f"gUnknown_{addr:08X}")
print(f"\tDCB 0x00")
prev = addr
print(" END")
print()
This updates our code to use labels instead of absolute values, and outputs a iwram.s
file.