home

Misc Winx Cleanup

decomp adsv1.2

A bit of time has passed, and I’ve forgotten to document my exact process for some cleanup tasks. I’m going to try to rapid-fire some of them here. They may be out of order of what I did, but hopefully this can provide some guidance for massaging gbadisasm.

Cleaning up literal pools

Now that we’ve gotten most of the code being extracted, it’ll drastically improve readability if we can define our literal pools correctly. Literal pools are small sections of memory allocated by the compiler at the end of functions. These will often include addresses, so it’s good to use .4byte or DCDU for them.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
def handle_found(start_address, end_address):
	start = int(start_address, 16)
	end = int(end_address, 16)
	diff = end - start
	count = int(diff / 4)
	print(f"pool_label {hex(start)} {count}")

start_address = None
byte_count = 0
current_mode = None
in_pool = False
with open("code.s", 'r') as f:
	for line in f:
		if line.startswith('_'):
			if in_pool:
				in_pool = False
				handle_found(start_address, line[2:9])
			start_address = line[2:9]
		if line.startswith('sub_'):
			if in_pool:
				in_pool = False
				handle_found(start_address, line[4:11])
			start_address = line[4:11]
		elif line.startswith('	.byte ') and len(line) > 20:
			in_pool = True

This code looks for sections at the end of functions that have a series of byte declarations. Defining them in the .cfg will make sure that they’re actually written as .4byte, so we can extract addresses from them later.

Splitting out data

I’d known for a while that there was a big block of data in my code. This isn’t a problem , but it’s much better to actually extract the data into binary files that can be referenced from code. This datablock goes from 0x0803E2A0 to 0x0803EF1C. I’d initially thought it was larger, but it’s only about 3196 bytes long. Still long enough to where it makes sense to extract it.

First, I wanted to find all of the potential files. We can do this by looking for all references to data within this range.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
import re

used = set()

for name in ["code.s"]:
    with open(f"./{name}", 'r') as f:
        for line in f:
            if '_0' in line:
                for m in re.finditer('_0', line):
                    pos = m.start()
                    if pos == 0:
                        continue
                    val = int(line[pos + 1: pos + 9], 16)
                    if val >= 0x0803E2A0 and val <= 0x0803EF1C:
                        used.add(val)
addrs = sorted(list(used))

print("	AREA data, DATA")
with open("repo/baserom.gba", mode='rb') as baserom:
    for idx, x in enumerate(addrs):
    if idx == len(addrs) - 1:
        end = 0x0803EF1C
    else:
        end = addrs[idx + 1]
        length = end - x
        out_filename = f"{x - 0x08000000:#010X}.bin"
        print(f"    GLOBAL gUnknown_{x:08X}")
        print(f"gUnknown_{x:08X}")
        print(f"    INCBIN data/ripped/{out_filename}")
        baserom.seek(x - 0x08000000)
        with open(f"repo/data/ripped/{out_filename}", 'wb') as output_file:
            output_file.write(baserom.read(length))
print("	END")
print()

This outputs a data.s file that we can use. We also need to extract all of the binary files from our ROM. It also creates .bin files for each of the referenced addresses.

We also need to update our code to use the new names. Note: This code updates the file inplace. I used a few different techniques throughout these scripts…

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
import re
import fileinput

for line in fileinput.input(inplace=True, backup='.bak'):
    if '_0' in line:
        for m in re.finditer('_0', line):
            pos = m.start()
            if pos == 0:
                continue
            val = int(line[pos + 1: pos + 9], 16)
            if val >= 0x0803E2A0 and val < 0x0803EF1C:
                line = line.replace(f"_{val:08X}", f"gUnknown_{val:08X}")
        print(line.rstrip())
    else:
        print(line.rstrip())

Finding IWRAM addresses

IWRAM is the Internal Working RAM. The key aspects of it is that it is 32kb large and uses a 32 bit bus. In practice, this is the RAM that we’ll see absolute pointers to. For example, here’s a random literal pool:

1
2
3
4
5
6
7
_0800E920: .4byte 0x04000200
_0800E924: .4byte 0x03003438
_0800E928: .4byte 0x03003E94
_0800E92C: .4byte 0x03003EB8
_0800E930: .4byte 0x03003EA4
_0800E934: .4byte 0x03007FF8
_0800E938: .4byte 0x03003E98

All of these 0x0300xxxx addresses point into IWRAM. Extracting them out into their own labels will drastically improve readability, and help us document code later on. Thankfully, we can reuse a good portion of our data splitting code above:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
import re
import fileinput

iwram_addresses = []

for line in fileinput.input(inplace=True, backup='.bak'):
    if '_0' in line:
        for m in re.finditer('_0', line):
            pos = m.start()
            if pos == 0:
                continue
            val = int(line[pos + 1: pos + 9], 16)
            if val >= 0x03000000 and val < 0x03007FFF:
                line = line.replace(f"_{val:08X}", f"gUnknown_{val:08X}")
                iwram_addresses.append(val)
        print(line.rstrip())
    else:
        print(line.rstrip())

prev = None
print("	AREA data, DATA")
for addr in iwram_addresses:
    if prev is not None:
        print(f"\tSPACE {addr-prev-1:#x}")
        print()
    print(f"\tGLOBAL gUnknown_{addr:08X}")
    print(f"gUnknown_{addr:08X}")
    print(f"\tDCB 0x00")
    prev = addr
print("	END")
print()

This updates our code to use labels instead of absolute values, and outputs a iwram.s file.

© 2024 Matt Hurd   •  Theme  Moonwalk