Figuring Out Jump Tables
Matt Hurd / March 2023 (1910 Words, 11 Minutes)
Last time, we managed to extract a good bit of code from our game. We were able to use some scripts to manually define functions in IDA that allowed us to get some extra coverage, but we’re still missing a good chunk of code. The main offender at the moment is jump tables. gbadisasm
does have support for GCC’s jump tables, but not ADS’s. Let’s dive into it.
Jump Tables
What is a jump table?
In this context, a jump table is used by the compiler for a C switch statement. Different compilers have their own implementations of switch statements. Let’s see an example of ADS’s. This will be a good time to actually test out our tools. First, we’ll make a C file that uses a switch statment. Note that the switch needs to have enough cases, otherwise the compiler will optimize it away.
1
2
3
4
5
6
7
8
9
10
void switch_example(int action) {
switch (action) {
case 0: printf("0\n"); break;
case 1: printf("1\n"); break;
case 2: printf("2\n"); break;
case 3: printf("3\n"); break;
case 4: printf("4\n"); break;
default: printf("Default action\n"); break;
}
}
ARM
Running it through armcc
with repo\tools\ADSv1_2\Bin\armcc.exe -O2 -S "c testing\test.c"
, we get:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
performAction PROC
CMP r0,#4
ADDLS pc,pc,r0,LSL #2
B |L1.72|
B |L1.32|
B |L1.40|
B |L1.48|
B |L1.56|
B |L1.64|
|L1.32|
ADR r0,|L1.80|
B |L1.76|
|L1.40|
ADR r0,|L1.84|
B |L1.76|
|L1.48|
ADR r0,|L1.88|
B |L1.76|
|L1.56|
ADR r0,|L1.92|
B |L1.76|
|L1.64|
ADR r0,|L1.96|
B |L1.76|
|L1.72|
ADR r0,|L1.100|
|L1.76|
B _printf
|L1.80|
DCB "0\n\0\0"
|L1.84|
DCB "1\n\0\0"
|L1.88|
DCB "2\n\0\0"
|L1.92|
DCB "3\n\0\0"
|L1.96|
DCB "4\n\0\0"
|L1.100|
DCB "Defa"
DCB "ult "
DCB "acti"
DCB "on\n\0"
ENDP
This is the general way that the arm compiler will handle jump tables. It seems like arm jumptables only work for switch statements in which the integer is incrementing by 1. Otherwise, it will find some other way to compile it. This is actually a pretty convenient assembly for us to work with. Essentially there is a comparison done against the operand, and the program counter is incremented based on which value it’s equal to. So if case 0
is matched, the program counter is not incremented at all. If case 5
is matched against, the program counter is incremented by 5 << 2
(10). In that case, the next instruction is B |L1.64|
. The key here is that we get a list of all of the branches which is quite easy to navigate.
Here’s what we’re going to need to do.
- Find
ADDLS pc,pc,r0,LSL #2
instructions. - Find the previous instruction to find the number of cases in the jump table.
- Read the next
instructions and we'll get code locations.
Actually, since we have direct branches to these labels, this should already be being handled by gbadisasm
. We’ll have to take a look at what the Thumb compiler spits out and see if that’s the source of our headache.
Thumb
If we instead pass our C code through tcc
, we get:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
performAction PROC
PUSH {r7,lr}
CMP r0,#5
BCS |L1.42|
ADR r3,|L1.16|
LDRB r3,[r3,r0]
LSL r3,r3,#1
ADD pc,r3
DCW 0000
|L1.16| DATA
DCB 0x03,0x05
DCB 0x07,0x09
DCB 0x0b,00
|L1.22|
ADR r0,|L1.72|
B |L1.44|
|L1.26|
ADR r0,|L1.72| + 4
B |L1.44|
|L1.30|
ADR r0,|L1.72| + 8
B |L1.44|
|L1.34|
ADR r0,|L1.72| + 12
B |L1.44|
|L1.38|
ADR r0,|L1.72| + 16
B |L1.44|
|L1.42|
ADR r0,|L1.72| + 20
|L1.44|
BL _printf
POP {r7,pc}
ENDP
This is a bit less straightforward, so let’s look at IDA’s annotations for it:
It looks like Thumb’s definition is a bit more complicated, using a small datablock for how much to add to PC, rather than adding to PC and then branching to a label. Notably, even though these labels are present in the compiler’s output, they’re not being directly referenced. This means once the code is linked and made into the .gba
, that information will be lost. This is most likely where our issue lies.
Fixing it
Once again, we should’ve just updated gbadisasm to handle this. Alas, it was quicker for me to write something in python to handle it. Here’s the idapython script:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
import idautils
import ida_funcs
def find_jumps(si: ida_nalt.switch_info_t) -> list:
jtable = []
e_size = si.get_jtable_element_size()
for num in range(0, si.get_jtable_size()):
jtable.append(int.from_bytes(ida_bytes.get_bytes(si.jumps + (num * e_size), e_size), 'little') * 2 + si.elbase)
return jtable
def find_jump_tables():
for func_ea in idautils.Functions():
func = ida_funcs.get_func(func_ea)
if not func:
continue
for head in idautils.FuncItems(func_ea):
insn = ida_ua.insn_t()
if ida_ua.decode_insn(insn, head) > 0:
si = ida_nalt.get_switch_info(insn.ea)
if si and si.jumps != ida_idaapi.BADADDR:
jtable = find_jumps(si)
insn = ida_ua.insn_t()
inslen = ida_ua.decode_insn(insn, si.jumps - 2)
print(f"# jmp table at {hex(si.jumps)}")
if inslen > 0 and insn.get_canon_mnem() == 'MOV':
print(f"data_label {hex(si.jumps - 2)}")
e_size = si.get_jtable_element_size()
for num in range(0, si.get_jtable_size()):
print(f"data_label {hex(si.jumps + num * e_size)}")
for jmp in jtable:
print(f"thumb_label {hex(jmp)} loc_{jmp:x}")
print()
def main():
find_jump_tables()
if __name__ == "__main__":
main()
Here, we’re relying heavily on IDA to do our work for us. Here’s a brief explanation of what it does
- Imports IDA Pro API modules for program analysis.
- Iterates through all functions to find switches
- Extracts jump table addresses from switches
- For each jump table, export labels that we can add to our
.cfg
file.- We’re explicitly marking the small datablock as data, and re-adding in the labels for each of the jumps as
thumb_label
s.
- We’re explicitly marking the small datablock as data, and re-adding in the labels for each of the jumps as
Now if we add these entries to our .cfg
, we should get some better output.
1
2
$ ./gbadisasm/gbadisasm baserom.gba -c code.cfg
Segmentation fault
Classic.
I honestly cannot remember exactly what the issue was here. For debugging, I modified gbadisasm to give more output while processing. There was an entry that it wasn’t happy about processing, so I remove it and our problems were solved.
Result
Taking a look at our initial failing function sub_8000948
, this is how the jump table looks now:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
...
cmp r0, #0xa
bhs _080009B8
add r3, pc, #0x4 @ =_0800097C
ldrb r3, [r3, r0]
lsls r3, r3, #1
add pc, r3
_0800097C:
.byte 0x04
_0800097D:
.byte 0x20
_0800097E:
.byte 0x1D
_0800097F:
.byte 0x1D
_08000980:
.byte 0x1D
_08000981:
.byte 0x1D
_08000982:
.byte 0x1D
_08000983:
.byte 0x45
_08000984:
.byte 0x50
_08000985:
.byte 0x71
loc_8000986:
ldr r0, [r4, #0x48]
adds r0, #4
str r0, [r4, #0x48]
ldr r0, [r4, #8]
ldr r1, [r4, #0xc]
adds r0, r0, r1
asrs r1, r0, #0x1f
movs r2, #0x10
str r0, [r4, #8]
bl j__swap_4A2D8
mov r2, pc
...
Much better.