home

Figuring Out Jump Tables

Abahbob / March 2023 (1910 Words, 11 Minutes)

decomp adsv1.2

Last time, we managed to extract a good bit of code from our game. We were able to use some scripts to manually define functions in IDA that allowed us to get some extra coverage, but we’re still missing a good chunk of code. The main offender at the moment is jump tables. gbadisasm does have support for GCC’s jump tables, but not ADS’s. Let’s dive into it.

Jump Tables
Fixing it
- Result

Jump Tables

What is a jump table?

In this context, a jump table is used by the compiler for a C switch statement. Different compilers have their own implementations of switch statements. Let’s see an example of ADS’s. This will be a good time to actually test out our tools. First, we’ll make a C file that uses a switch statment. Note that the switch needs to have enough cases, otherwise the compiler will optimize it away.

void switch_example(int action) {
   switch (action) {
      case 0:  printf("0\n"); break;
      case 1:  printf("1\n"); break;
      case 2:  printf("2\n"); break;
      case 3:  printf("3\n"); break;
      case 4:  printf("4\n"); break;
      default: printf("Default action\n"); break;
   }
}

ARM

Running it through armcc with repo\tools\ADSv1_2\Bin\armcc.exe -O2 -S "c testing\test.c", we get:

performAction PROC
        CMP      r0,#4
        ADDLS    pc,pc,r0,LSL #2
        B        |L1.72|
        B        |L1.32|
        B        |L1.40|
        B        |L1.48|
        B        |L1.56|
        B        |L1.64|
|L1.32|
        ADR      r0,|L1.80|
        B        |L1.76|
|L1.40|
        ADR      r0,|L1.84|
        B        |L1.76|
|L1.48|
        ADR      r0,|L1.88|
        B        |L1.76|
|L1.56|
        ADR      r0,|L1.92|
        B        |L1.76|
|L1.64|
        ADR      r0,|L1.96|
        B        |L1.76|
|L1.72|
        ADR      r0,|L1.100|
|L1.76|
        B        _printf
|L1.80|
        DCB      "0\n\0\0"
|L1.84|
        DCB      "1\n\0\0"
|L1.88|
        DCB      "2\n\0\0"
|L1.92|
        DCB      "3\n\0\0"
|L1.96|
        DCB      "4\n\0\0"
|L1.100|
        DCB      "Defa"
        DCB      "ult "
        DCB      "acti"
        DCB      "on\n\0"
        ENDP

This is the general way that the arm compiler will handle jump tables. It seems like arm jumptables only work for switch statements in which the integer is incrementing by 1. Otherwise, it will find some other way to compile it. This is actually a pretty convenient assembly for us to work with. Essentially there is a comparison done against the operand, and the program counter is incremented based on which value it’s equal to. So if case 0 is matched, the program counter is not incremented at all. If case 5 is matched against, the program counter is incremented by 5 << 2 (10). In that case, the next instruction is B |L1.64|. The key here is that we get a list of all of the branches which is quite easy to navigate.

Here’s what we’re going to need to do.

Find ADDLS pc,pc,r0,LSL #2 instructions.
Find the previous instruction to find the number of cases in the jump table.
Read the next instructions and we'll get code locations.

Actually, since we have direct branches to these labels, this should already be being handled by gbadisasm. We’ll have to take a look at what the Thumb compiler spits out and see if that’s the source of our headache.

Thumb

If we instead pass our C code through tcc, we get:

performAction PROC
        PUSH     {r7,lr}
        CMP      r0,#5
        BCS      |L1.42|
        ADR      r3,|L1.16|
        LDRB     r3,[r3,r0]
        LSL      r3,r3,#1
        ADD      pc,r3
        DCW      0000
|L1.16| DATA
        DCB      0x03,0x05
        DCB      0x07,0x09
        DCB      0x0b,00
|L1.22|
        ADR      r0,|L1.72|
        B        |L1.44|
|L1.26|
        ADR      r0,|L1.72| + 4
        B        |L1.44|
|L1.30|
        ADR      r0,|L1.72| + 8
        B        |L1.44|
|L1.34|
        ADR      r0,|L1.72| + 12
        B        |L1.44|
|L1.38|
        ADR      r0,|L1.72| + 16
        B        |L1.44|
|L1.42|
        ADR      r0,|L1.72| + 20
|L1.44|
        BL       _printf
        POP      {r7,pc}
        ENDP

This is a bit less straightforward, so let’s look at IDA’s annotations for it:

IDA

It looks like Thumb’s definition is a bit more complicated, using a small datablock for how much to add to PC, rather than adding to PC and then branching to a label. Notably, even though these labels are present in the compiler’s output, they’re not being directly referenced. This means once the code is linked and made into the .gba, that information will be lost. This is most likely where our issue lies.

Fixing it

Once again, we should’ve just updated gbadisasm to handle this. Alas, it was quicker for me to write something in python to handle it. Here’s the idapython script:

import idautils
import ida_funcs


def find_jumps(si: ida_nalt.switch_info_t) -> list:
    jtable = []
    e_size = si.get_jtable_element_size()

    for num in range(0, si.get_jtable_size()):
        jtable.append(int.from_bytes(ida_bytes.get_bytes(si.jumps + (num * e_size), e_size), 'little') * 2 + si.elbase)

    return jtable


def find_jump_tables():
    for func_ea in idautils.Functions():
        func = ida_funcs.get_func(func_ea)
        if not func:
            continue

        for head in idautils.FuncItems(func_ea):
            insn = ida_ua.insn_t()
            if ida_ua.decode_insn(insn, head) > 0:
                si = ida_nalt.get_switch_info(insn.ea)
                if si and si.jumps != ida_idaapi.BADADDR:
                    jtable = find_jumps(si)

                    insn = ida_ua.insn_t()
                    inslen = ida_ua.decode_insn(insn, si.jumps - 2)
                    print(f"# jmp table at {hex(si.jumps)}")
                    if inslen > 0 and insn.get_canon_mnem() == 'MOV':
                        print(f"data_label {hex(si.jumps - 2)}")
                    e_size = si.get_jtable_element_size()
                    for num in range(0, si.get_jtable_size()):
                        print(f"data_label {hex(si.jumps + num * e_size)}")
                    for jmp in jtable:
                        print(f"thumb_label {hex(jmp)} loc_{jmp:x}")
                    print()

def main():
    find_jump_tables()

if __name__ == "__main__":
    main()

Here, we’re relying heavily on IDA to do our work for us. Here’s a brief explanation of what it does

Imports IDA Pro API modules for program analysis.
Iterates through all functions to find switches
Extracts jump table addresses from switches
For each jump table, export labels that we can add to our .cfg file.
- We’re explicitly marking the small datablock as data, and re-adding in the labels for each of the jumps as thumb_labels.

Now if we add these entries to our .cfg, we should get some better output.

$ ./gbadisasm/gbadisasm baserom.gba -c code.cfg
Segmentation fault

Classic.
I honestly cannot remember exactly what the issue was here. For debugging, I modified gbadisasm to give more output while processing. There was an entry that it wasn’t happy about processing, so I remove it and our problems were solved.

Result

Taking a look at our initial failing function sub_8000948, this is how the jump table looks now:

...
	cmp r0, #0xa
	bhs _080009B8
	add r3, pc, #0x4 @ =_0800097C
	ldrb r3, [r3, r0]
	lsls r3, r3, #1
	add pc, r3
_0800097C:
	.byte 0x04
_0800097D:
	.byte 0x20
_0800097E:
	.byte 0x1D
_0800097F:
	.byte 0x1D
_08000980:
	.byte 0x1D
_08000981:
	.byte 0x1D
_08000982:
	.byte 0x1D
_08000983:
	.byte 0x45
_08000984:
	.byte 0x50
_08000985:
	.byte 0x71
loc_8000986:
	ldr r0, [r4, #0x48]
	adds r0, #4
	str r0, [r4, #0x48]
	ldr r0, [r4, #8]
	ldr r1, [r4, #0xc]
	adds r0, r0, r1
	asrs r1, r0, #0x1f
	movs r2, #0x10
	str r0, [r4, #8]
	bl j__swap_4A2D8
	mov r2, pc
...

Much better.