home

Let The Decomp Begin

Abahbob / April 2023 (2741 Words, 16 Minutes)

decomp adsv1.2

So we have a build pipeline, and we have our code split out into files. Let’s see if we can’t get a single matching function to celebrate.

Setting up the compiler

Determining the compiler

I’ve realized that I never actually talked about the process of identifying which compiler is being used. I already know that Winx Club was compiled using ADSv1.2, but how do we even figure that out in the first place? Introducing: Signatures. IDA has a great technology called FLIRT which allows people to create signatures for identifying common functions generated by libraries. Nintendo released a GBA SDK which included a common core library. Thankfully, in laqieer’s ida_gba_stuff repo, they include signatures. These can be applied in IDA and we can see how many matches we have.

After applying these signatures in IDA, we see that there are very few matches. Since these signatures were made for GCC, it makes sense that we have very few matches. That pretty much completely rules out GCC. I’m honestly not sure how to identify the ADS version outside of actually trying to match code. When trying out ADSv1.0.1 from archive.org, I reached a point where there was a consistent way that I could not get a function to match. It was just two opcodes always being done in a different way. Swapping to ADSv1.2 perfectly solved it. Hopefully there’s a solution in the future.

Determining compiler flags

Let’s start with the most glaring flags: Optimization. According to the ADS docs, this is what each of the optimzation levels does:

-O0 Turns off all optimization, except some simple source transformations. This is the default optimization level if debug tables are generated with -g. It gives the best possible debug view and the lowest level of optimization.
-O1 Turns off optimizations that seriously degrade the debug view. If used with -g, this option gives a satisfactory debug view with good code density.
-O2 Generates fully optimized code. If used with -g, the debug view might be less satisfactory because the mapping of object code to source code is not always clear. This is the default optimization level if debug tables are not generated.

-Ospace This option optimizes to reduce image size at the expense of a possible increase in execution time. For example, large structure copies are done by out-of-line function calls instead of inline code. Use this option if code size is more critical than performance. This is the default.
-Otime This option optimizes to reduce execution time at the possible expense of a larger image. Use this option if execution time is more critical than code size. For example, it compiles:
    while (expression) body;
    as:
    if (expression) {
        do body;
    while (expression);
    }
If you specify neither -Otime or -Ospace, the compiler uses -Ospace. You can compile time-critical parts of your code with -Otime, and the rest with -Ospace. You must not specify both -Otime and -Ospace in the same compiler invocation.

The most notorious case of -O0 being used is in Super Mario 64. MVG did an amazing video covering it. When in doubt, assume -O2. It’ll be pretty clear once we actually start trying to match code if this is correct. There are also a handful of other optimization flags, but we’ll only have to look into them if we’re having issues with matching.

Picking our first file

To start, I wanted to try a file that was actually working towards my goal of understanding some enemy behaviors. In hindsight, I’m pretty sure this file doesn’t do what I thought it did, but oh well. We’re starting with split_80239EC.s. We’ll be starting with the simplest function here, named ModifyPlayerHealth.

	non_word_aligned_thumb_func_start ModifyPlayerHealth
ModifyPlayerHealth ;@ 0x08023A3A
	push {r3, lr}
	ldrb r2, [r0]
	adds r2, r2, r1
	cmp r1, #0
	bge _08023A4A
	ldrb r3, [r0, #0xe]
	cmp r3, #0
	bne _08023A6A
_08023A4A
	cmp r2, #0
	bge _08023A54
	movs r2, #0
	strb r2, [r0]
	b _08023A60
_08023A54
	ldrb r3, [r0, #3]
	cmp r3, r2
	bge _08023A5E
	strb r3, [r0]
	b _08023A60
_08023A5E
	strb r2, [r0]
_08023A60
	ldr r0, _08023CF0 ;@ =0x030034F8
	lsrs r1, r1, #0x1f
	ldr r0, [r0]
	bl sub_80244C6
_08023A6A
	add sp, #4
	pop {r3}
	bx r3

IDA’s initial pseudocode output gives us this:

unsigned __int8 *__fastcall sub_8023A3A(unsigned __int8 *result, int a2)
{
  int v2; // r2
  int v3; // r3

  v2 = *result + a2;
  if ( a2 >= 0 || !result[14] )
  {
    if ( v2 >= 0 )
    {
      v3 = result[3];
      if ( v3 >= v2 )
        *result = v2;
      else
        *result = v3;
    }
    else
    {
      *result = 0;
    }
    return (unsigned __int8 *)sub_80244C6(dword_30034F8, (unsigned int)a2 >> 31);
  }
  return result;
}

Running this through tcc, here’s a comparison of the output.

ROM

tcc output

ModifyPlayerHealth ;@ 0x08023A3A
        push {r3, lr}
        ldrb r2, [r0]
        adds r2, r2, r1
        cmp r1, #0
        bge _08023A4A
        ldrb r3, [r0, #0xe]
        cmp r3, #0
        bne _08023A6A
_08023A4A
        cmp r2, #0
        bge _08023A54
        movs r2, #0
        strb r2, [r0]
        b _08023A60
_08023A54
        ldrb r3, [r0, #3]
        cmp r3, r2
        bge _08023A5E
        strb r3, [r0]
        b _08023A60
_08023A5E
        strb r2, [r0]
_08023A60
        ldr r0, _08023CF0 ;@ =0x030034F8
        lsrs r1, r1, #0x1f
        ldr r0, [r0]
        bl sub_80244C6
_08023A6A
        add sp, #4
        pop {r3}
        bx r3

sub_8023A3A PROC
        PUSH     {r7,lr}
        LDRB     r2,[r0,#0]
        ADD      r2,r2,r1
        CMP      r1,#0
        BGE      |L1.16|
        LDRB     r3,[r0,#0xe]
        CMP      r3,#0
        BNE      |L1.42|
|L1.16|
        CMP      r2,#0
        BLT      |L1.44|
        LDRB     r3,[r0,#3]
        CMP      r3,r2
        BLT      |L1.30|
|L1.26|
        STRB     r2,[r0,#0]
        B        |L1.32|
|L1.30|
        STRB     r3,[r0,#0]
|L1.32|
        LDR      r0,|L1.48|
        LSR      r1,r1,#31
        LDR      r0,[r0,#0]  ; gUnknown_030034F8
        BL       sub_80244C6
|L1.42|
        POP      {r7,pc}
|L1.44|
        MOV      r2,#0
        B        |L1.26|
|L1.48| DATA
        DCD      gUnknown_030034F8
        ENDP

It looks pretty darn close already, which is a good sign that our compiler flags are right. A lot of the process of getting to matching code is a lot of trial and error. Experts will know a lot of patterns that work and ways to massage the code, but we’re not experts. So for now, we’re going to be trying to poking and prodding and seeing what works.

In my journey to figure out how to make things work, I’ve actually had a lot of success with using ChatGPT as well as other LLMs. It takes a ton of prompt engineering to get them to be happy, but when done properly, I’ve had more success than with IDA’s output. At the very least, it provides alternative code that you can see how the compiler handles it. For now, we’ll be trying to fix this function by hand.

The first change that we can see in the assembly is that the source uses bge _08023A54, whereas our output is BLT |L1.44|. This is an easy fix, we just need to swap around the interior conditional block.

if ( v2 >= 0 ) {
    v3 = result[3];
    if ( v3 >= v2 ) {
        *result = v2;
    } else {
        *result = v3;
    }
} else {
    *result = 0;
}

becomes

if ( v2 < 0 ) {
    *result = 0;
} else {
    v3 = result[3];
    if ( v3 >= v2 ) {
        *result = v2;
    } else {
        *result = v3;
    }
}

Now our assembly looks like this:

sub_8023A3A PROC
        PUSH     {r7,lr}
        LDRB     r2,[r0,#0]
        ADD      r2,r2,r1
        CMP      r1,#0
        BGE      |L1.16|
        LDRB     r3,[r0,#0xe]
        CMP      r3,#0
        BNE      |L1.46|
|L1.16|
        CMP      r2,#0
        BGE      |L1.24|
        MOV      r2,#0
        B        |L1.30|
|L1.24|
        LDRB     r3,[r0,#3]
        CMP      r3,r2
        BLT      |L1.34|
|L1.30|
        STRB     r2,[r0,#0]
        B        |L1.36|
|L1.34|
        STRB     r3,[r0,#0]
|L1.36|
        LDR      r0,|L1.48|
        LSR      r1,r1,#31
        LDR      r0,[r0,#0]  ; gUnknown_030034F8
        BL       sub_80244C6
|L1.46|
        POP      {r7,pc}
|L1.48| DATA
        DCD      gUnknown_030034F8
        ENDP

With a bit more massaging of this code, we get our first matching function:

unsigned char *ModifyPlayerHealth(unsigned char* r0, int r1) {
    signed int r2 = *r0 + r1;

    if (r1 >= 0 || !r0[14]) {
        if (r2 < 0) {
            *r0 = 0;
        }
        else if (r0[3] < r2) {
            *r0 = r0[3];
        } else {
            *r0 = r2;
        }
        return sub_80244C6(gUnknown_030034F8, (unsigned int)r1 >> 31);
    }
    return r0;
}

Now, r0 is most likely not actually just a byte pointer. In reality, it’s most likely a pointer to a struct. But for now we can just leave it as is so we don’t have to do a massive deep-dive in IDA. Now we just need to get this all compiling.

Stubbing out the other functions

While we could try to decomp the rest of the functions in the file, let’s try just using inline ASM so we can have a compiling file. This is pretty common within decomp projects when you’re unable to produce a perfect match with code for the moment.

Here come the issues

I bashed my head against this for a while before deciding to check the docs. Ideally, we’d just be able to inline the asm with a __asm{} block, but ADS has some severe limitations. Let’s take a look at the manual.

The inline assemblers allow restricted access to the physical registers. It is illegal to write to pc. Only branches using B and BL are allowed. In addition, it is inadvisable to intermix inline assembler instructions that use physical registers and complex C or C++ expressions.

Alright, this will pose a problem in the future, but we’re not writing to pc in any of the functions in this file.

The LDR Rn, =expression pseudo-instruction is not supported. Use MOV Rn, expression instead (this can generate a load from a literal pool).

Well, that pseudo-instruction is how we’re wanting to load from our literal pools. Maybe the assembler will convert it to a LDR for us?

You should not modify the stack. This is not necessary because the compiler will stack and restore any working registers as required automatically. It is not allowed to explicitly stack and restore work registers.

Ah, this poses an issue. While we could in theory set up the stack beforehand, I don’t know how to do it. Any attempts I’ve made has not worked, at all. For me, this has been an absolute blocker. Someone more versed in mixing assembly with C might be able to manage it, but this is out of my skillset.

On top of all of this, I don’t have a clear path forward on generating the literal pool. If we inline ASM, the compiler won’t have enough information to generate the literal pool the way we need it. This is a serious blocker. In an ideal world, I’d be able to just push through the rest of the file with matching decomp. This file actually ends with a pretty beefy function though, and I haven’t gotten close to a good match for it. IDA’s psuedocode output for it is 109 lines long and has two do while loops.

I’m going to have to come back to this later, it needs some serious thought into how I want to handle these sorts of situations.