UIUCTF 2022 - CPSC
Categories: re
2022-08-03
by not_really
CPSC
Love Haskell? Ocaml? Then you’ll love this.
author: richyliu, Surg
Love Haskell? Ocaml? Then you’ll love this.
author: richyliu, Surg
This program takes a “password”, mixes it in some way, and compares it with the flag. Changing a single character at the beginning or end changes the entire thing, so it’s not encrypted a character at a time.
$ ./cpsc
--- M I X E R ---
Welcome to the Mixer.
Please enter the password: abcd
Mixed: 32c39cbe
Wrong.
$ ./cpsc
--- M I X E R ---
Welcome to the Mixer.
Please enter the password: abce
Mixed: 79ba0518
Wrong.
According to the strings in the binary, this was made with cpc (“Continuation Passing C”). The description says “Love Haskell? Ocaml? Then you’ll love this.”
Example input
--- M I X E R ---
Welcome to the Mixer.
Please enter the password: uiuctf{not_a_real_flag}
Mixed: eb3d2ef730f54cc4da121eb0d4d1a2c92c36581095bab8
Wrong.
Intro
I freaked out at first when reading the description because I had a bad experience with a previous functional language challenge, “hardskull” from CSAW2021. If I recall, the strings weren’t in ASCII format in memory, were constantly moving around, and the control flow changed by changing a single character. This made it really hard to use any tricks other than just understanding the code.
Like almost all the rev challenges I solve, I try to leave all the thinking to Z3 and do just the reversing part. If I can replicate this program in Python at least, it should be possible to have Z3 solve the problem.
Decompiling, sort of
Skipping past the input reading function, the first function references two other functions, which in turn reference 2-3 other functions:
I generated a graph of xrefs to get an idea of what was happening. Naming convention is just adding on an A/B/C for the first/second/third function call or O for being the only function call.
Not terribly bad. Time to start looking through the functions to get an idea of what they’re doing.
As it shows in the graph, AA is obviously the flag checking function:
That memcmp
function compares the “Mixed” input with this string:
It seems to be the same no matter what you type, so this must be the final string to match. This also tells us the length of the input: 43 characters.
Logging with GDB
Next, I wanted to log all the functions being called so I could see if there were any obvious patterns. For this I usually use gdb python since it’s easier in my opinion to read and set breakpoints than clicking through the breakpoint UI of IDA or trying to use the poorly documented IDC.
import gdb
#gdbpy start
offAddress = 0
def setBreakEvent(callback, addr, inst=None):
class GdbBreakpoint(gdb.Breakpoint):
def __init__(self, callback, addr, inst):
global offAddress
super(GdbBreakpoint, self).__init__("*" + hex(addr + offAddress), type=gdb.BP_BREAKPOINT, internal=False)
self.callback = callback
self.inst = inst
def stop(self):
pos = readReg("rip")
res = None
if inst == None:
res = self.callback(pos)
else:
res = self.callback(inst, pos)
if res == True:
return res
else:
return False
GdbBreakpoint(callback, addr, inst)
def getEntryPoint():
ENTRY_POINT_STR = "\tEntry point: "
fileInfo = gdb.execute("info file", to_string=True)
infoLines = fileInfo.splitlines()
for line in infoLines:
if line.startswith(ENTRY_POINT_STR):
entryPoint = int(line[len(ENTRY_POINT_STR):],0)
return entryPoint
return None
def readReg(reg):
gdbValue = gdb.selected_frame().read_register(reg)
return int(gdbValue.cast(gdb.lookup_type("long")))
#gdbpy end
def logCall(text, addr):
print(text)
return False
gdb.execute("set print addr off")
gdb.execute("set pagination off")
gdb.execute("set disassembly-flavor intel")
gdb.execute(f"file cpsc", False)
fileAddress = getEntryPoint()
gdb.execute("starti")
offAddress = getEntryPoint() - fileAddress
setBreakEvent(logCall, 0x2d10, "a")
setBreakEvent(logCall, 0x2870, "aa")
setBreakEvent(logCall, 0x2e80, "ab")
setBreakEvent(logCall, 0x2970, "aba")
setBreakEvent(logCall, 0x34a0, "abb")
setBreakEvent(logCall, 0x2fa0, "abba")
setBreakEvent(logCall, 0x3120, "abbaa")
setBreakEvent(logCall, 0x2980, "abbaaa")
setBreakEvent(logCall, 0x3270, "abbaab")
setBreakEvent(logCall, 0x2940, "abbaaba")
setBreakEvent(logCall, 0x2a60, "abbaabb")
setBreakEvent(logCall, 0x3620, "abbaabbo")
setBreakEvent(logCall, 0x2c20, "abbaabboa")
setBreakEvent(logCall, 0x2830, "abbaabbob")
setBreakEvent(logCall, 0x2990, "abbaabboc")
setBreakEvent(logCall, 0x3850, "abbaabboco")
setBreakEvent(logCall, 0x2b40, "abbaabbocoa")
setBreakEvent(logCall, 0x2860, "abbaabbococ")
gdb.execute("c")
Let’s see what happens if we type test
.
--- M I X E R ---
Welcome to the Mixer.
Please enter the password: test
a
ab
abb
abb
abb
abba
abb
abbaa
abbaab
abbaabb
abbaabbo
abbaabbob
abbaabboa
abbaabbo
abbaabbob
abbaabboa
abbaabbo
abbaabboc
abbaabboco
abbaabbob
abbaabbocoa
abbaabboco
abbaabbob
abbaabbocoa
abbaabboco
abbaabbococ
abbaaba
abbaaa
abba
abb
abb
abba
abb
abbaa
abbaab
abbaabb
abbaabbo
abbaabbob
abbaabboa
abbaabbo
abbaabbob
abbaabboa
abbaabbo
abbaabboc
abbaabboco
abbaabbob
abbaabbocoa
abbaabboco
abbaabbob
abbaabbocoa
abbaabboco
abbaabbococ
abbaaba
abbaaa
abbaa
abbaab
abbaabb
abbaabbo
abbaabbob
abbaabboa
abbaabbo
abbaabbob
abbaabboa
abbaabbo
abbaabbob
abbaabboa
abbaabbo
abbaabbob
abbaabboa
abbaabbo
abbaabboc
abbaabboco
abbaabbob
abbaabbocoa
abbaabboco
abbaabbob
abbaabbocoa
abbaabboco
abbaabbob
abbaabbocoa
abbaabboco
abbaabbob
abbaabbocoa
abbaabboco
abbaabbococ
abbaaba
abbaaa
aba
aa
Mixed: 8d368fad
Wrong.
And what about tesu
?
--- M I X E R ---
Welcome to the Mixer.
Please enter the password: tesu
a
ab
abb
abb
abb
abba
...
abbaabbob
abbaabbocoa
abbaabboco
abbaabbococ
abbaaba
abbaaa
aba
aa
Mixed: c64f160b
Wrong.
I’m not going to paste it all again, but it’s the exact same output as test
. It turns out the control flow only depends on string length but not on string content.
That’s great if I only wanted to know what functions were being called, but it doesn’t tell me what the argument values were. I realized a pattern in the code that probably represented arguments.
I manually went through and edited the breakpoint logs to have values so I could get a better idea of what was happening. Here’s the new script output for test
(data is a pointer to input memory):
--- M I X E R ---
Welcome to the Mixer.
Please enter the password: test
a(data[0], 4, 43, 0, data[2192])
ab(data[0], 0, 4)
abb(data[0], 4, 1)
abb(data[0], 2, 2)
abb(data[0], 1, 4)
abba(1, 1, 1, data[0], 2)
abb(data[1], 1, 5)
abbaa(1, 1, 1, data[0], 2)
abbaab_mix(data[0], data[0], 1, data[1], 1, 0, 2)
abbaabb_mul1(data[0], 2, 2)
abbaabbo_xortext1(94, 0, 0, data[0], 2, 0, 2)
abbaabbob_encbyte()
abbaabboa(0, 0, data[0], 2, 2, 0, 253)
abbaabbo_xortext1(253, 1, 0, data[0], 2, 0, 2)
abbaabbob_encbyte()
abbaabboa(1, 0, data[0], 2, 2, 0, 171)
abbaabbo_xortext1(171, 2, 0, data[0], 2, 0, 2)
abbaabboc_mul2(171, 0, data[0], 2, 2)
abbaabboco_xortext2(102, 0, data[0], 2, 2)
abbaabbob_encbyte()
abbaabbocoa_setchar(0, data[0], 2, 2, 148)
abbaabboco_xortext2(148, 1, data[0], 2, 2)
abbaabbob_encbyte()
abbaabbocoa_setchar(1, data[0], 2, 2, 123)
abbaabboco_xortext2(123, 2, data[0], 2, 2)
abbaabbococ()
abbaaba()
abbaaa()
abba(2, 2, 2, data[0], 1)
abb(data[2], 2, 3)
abb(data[2], 1, 6)
abba(1, 1, 1, data[2], 3)
abb(data[3], 1, 7)
abbaa(1, 1, 1, data[2], 3)
abbaab_mix(data[2], data[2], 1, data[3], 1, 2, 3)
abbaabb_mul1(data[2], 2, 3)
abbaabbo_xortext1(141, 0, 0, data[2], 2, 0, 3)
abbaabbob_encbyte()
abbaabboa(0, 0, data[2], 2, 3, 0, 130)
abbaabbo_xortext1(130, 1, 0, data[2], 2, 0, 3)
abbaabbob_encbyte()
abbaabboa(1, 0, data[2], 2, 3, 0, 154)
abbaabbo_xortext1(154, 2, 0, data[2], 2, 0, 3)
abbaabboc_mul2(154, 0, data[2], 2, 3)
abbaabboco_xortext2(153, 0, data[2], 2, 2)
abbaabbob_encbyte()
abbaabbocoa_setchar(0, data[2], 2, 2, 70)
abbaabboco_xortext2(70, 1, data[2], 2, 2)
abbaabbob_encbyte()
abbaabbocoa_setchar(1, data[2], 2, 2, 79)
abbaabboco_xortext2(79, 2, data[2], 2, 2)
abbaabbococ()
abbaaba()
abbaaa()
abbaa(2, 2, 2, data[0], 1)
abbaab_mix(data[0], data[0], 2, data[2], 2, 0, 1)
abbaabb_mul1(data[0], 4, 1)
abbaabbo_xortext1(47, 0, 0, data[0], 4, data[2], 1)
abbaabbob_encbyte()
abbaabboa(0, 0, data[0], 4, 1, data[2], 141)
abbaabbo_xortext1(141, 1, 0, data[0], 4, data[2], 1)
abbaabbob_encbyte()
abbaabboa(1, 0, data[0], 4, 1, data[2], 29)
abbaabbo_xortext1(29, 2, 0, data[0], 4, data[2], 1)
abbaabbob_encbyte()
abbaabboa(2, 0, data[0], 4, 1, data[2], 41)
abbaabbo_xortext1(41, 3, 0, data[0], 4, data[2], 1)
abbaabbob_encbyte()
abbaabboa(3, 0, data[0], 4, 1, data[2], 41)
abbaabbo_xortext1(41, 4, 0, data[0], 4, data[2], 1)
abbaabboc_mul2(41, 0, data[0], 4, 1)
abbaabboco_xortext2(51, 0, data[0], 4, 4)
abbaabbob_encbyte()
abbaabbocoa_setchar(0, data[0], 4, 4, 173)
abbaabboco_xortext2(173, 1, data[0], 4, 4)
abbaabbob_encbyte()
abbaabbocoa_setchar(1, data[0], 4, 4, 143)
abbaabboco_xortext2(143, 2, data[0], 4, 4)
abbaabbob_encbyte()
abbaabbocoa_setchar(2, data[0], 4, 4, 54)
abbaabboco_xortext2(54, 3, data[0], 4, 4)
abbaabbob_encbyte()
abbaabbocoa_setchar(3, data[0], 4, 4, 141)
abbaabboco_xortext2(141, 4, data[0], 4, 4)
abbaabbococ()
abbaaba()
abbaaa()
aba()
aa(data[0], 4, 43, 0, data[2192])
Mixed: 8d368fad
Wrong.
This makes it a bit easier to understand what’s going on. You might notice I also renamed some things in this script. Here’s what I found about the functions from a mix of debugging and reading code.
ABBAAB_mix(dst, srcB, lenB, srcA, lenA, ?, ?)
Reorders the input buffer in memory (srcA
and srcB
are pointers somewhere in the input.) It first makes a new array. It takes the byte at srcA
and copies it to the array and then subtracts 1 from lenA
. Then it adds the byte at srcB
at subtracts 1 from lenB
. It keeps switching between these two until lenA
and lenB
are 0. It then reverses this array and overwrites whatever is at dst
. For the implementation, see the scramble
function in the Python simulator code below.
ABBAABB_mul1(?, ?, value)
Returns (value * 0x2f) & 0xff
. This value is passed into the first argument of ABBAABBO
.
ABBAABBO_xortext1(value, off, ?, ptr, ?, ?, ?)
Returns value ^ *(ptr+off)
. This value is passed into ABBAABBOB
.
ABBAABBOA_setchar(off, ?, ptr, ?, ?, ?, value)
Sets *(ptr+off) = value
.
ABBAABBOB_encbyte(value)
“Encrypts” the value. This is a 1:1 conversion from one byte value to another. For example, 1 becomes 195, 2 becomes 133, etc. This value is passed into the fifth argument of ABBAABBOC
or ABBAABBOCOA
.
ABBAABBOC_mul2(?, ?, ?, ?, value)
Returns (value * 0x33) & 0xff
. This value is passed into the first argument of ABBAABBOCO
.
ABBAABBOCO_xortext2(value, off0, ptr, ?, off1)
Returns value ^ *(ptr + (off1 - 1 - off0))
. This value is passed into ABBAABBOB
.
ABBAABBOCOA_setchar(off0, ptr, off1, ?, value)
Sets *(ptr + (off1 - 1 - off0)) = value
.
Writing the simulator
At this point I have no idea what’s going on, but I start to write an emulator in Python anyways. As long as I can get a simulation, it should be pretty easy to port it over to z3. To build the simulator, I decided to just parse the output from GDB because it sounded easier than trying to understand the rest of the decompiled code (this made me feel awful inside, but it’s a CTF so it’s fine.)
Normally the program calculates a value and then passes it onto the next function. There is always only one variable in each function call. Any other argument is constant no matter the input or I don’t read it in my script. Because of this, I can use a single temp variable to represent the output from the last function and not worry about actually passing values between functions.
import sys
class SimSim:
def loadText(self, text):
self.text = bytearray(text.encode("utf-8"))
def start(self, lines):
self.lines = lines
self.lineIdx = 0
self.tmp0 = 0
while True:
self.step()
if self.lineIdx >= len(self.lines):
break
def scramble(self, startx, countx, starty, county):
newList = []
actualStartx = startx
actualCountx = countx
actualCounty = county
while True:
if county > 0:
newList.append(self.text[starty])
starty += 1
county -= 1
if countx > 0:
newList.append(self.text[startx])
startx += 1
countx -= 1
if countx == 0 and county == 0:
break
newList.reverse()
for i in range(0, actualCountx + actualCounty):
self.text[i + actualStartx] = newList[i]
# get nth argument (regular number)
def getnarg(self, idx):
line = self.lines[self.lineIdx]
inText = line[line.find("(")+1:line.find(")")]
inTextSpl = inText.split(",")[idx]
return int(inTextSpl)
# get nth pointer argument (data[XXX])
def getparg(self, idx):
line = self.lines[self.lineIdx]
inText = line[line.find("(")+1:line.find(")")].replace(" ", "")
inTextSpl = inText.split(",")[idx]
return int(inTextSpl[5:-1])
def encrypt(self, num):
esi = num
ecx = num
ecx <<= 6
esi ^= ecx
ecx = esi
ecx = (ecx >> 8 << 8) | ((ecx & 0xff) >> 7)
ecx ^= esi
esi = ecx << 1
ecx ^= esi
return ecx & 0xff
def step(self):
line = self.lines[self.lineIdx]
if line.startswith("abbaab_mix("):
cx = self.getnarg(2)
cy = self.getnarg(4)
sx = self.getparg(1)
sy = self.getparg(3)
self.scramble(sx, cx, sy, cy)
elif line.startswith("abbaabb_mul1("):
val = self.getnarg(2)
self.tmp0 = (val * 0x2f) & 0xff
elif line.startswith("abbaabbo_xortext1("):
a1 = self.getnarg(1)
a4 = self.getnarg(4)
if a1 == a4:
self.lineIdx += 1
return
a3 = self.getparg(3)
txt = self.text[a3 + a1]
self.tmp0 ^= txt
elif line.startswith("abbaabbob_encbyte("):
self.tmp0 = self.encrypt(self.tmp0)
elif line.startswith("abbaabboa("):
a2 = self.getparg(2)
a0 = self.getnarg(0)
self.text[a2 + a0] = self.tmp0
elif line.startswith("abbaabboc_mul2("):
a4 = self.getnarg(4)
self.tmp0 = (a4 * 0x33) & 0xff
elif line.startswith("abbaabboco_xortext2("):
a2 = self.getparg(2)
a4 = self.getnarg(4)
a1 = self.getnarg(1)
if (a4 - 1 - a1) >= 0:
self.tmp0 = self.tmp0 ^ self.text[a2 + (a4 - 1 - a1)]
elif line.startswith("abbaabbocoa_setchar("):
a1 = self.getparg(1)
a2 = self.getnarg(2)
a0 = self.getnarg(0)
if (a2 - 1 - a0) >= 0:
self.text[a1 + (a2 - 1 - a0)] = self.tmp0
self.lineIdx += 1
sim = SimSim()
sim.loadText(sys.argv[1])
# output from gdb script
with open("cpsc_debug_output.txt") as file:
lines = file.readlines()
lines = [line.rstrip() for line in lines]
sim.start(lines)
print(bytes(sim.text).hex())
And testing it on test
again:
$ python3 sim_cpsc.py test
8d368fad
Good, it matches what the simulator does. I made sure it wasn’t a fluke by typing something else (of the same length) in the real cpsc program and comparing with the Python script without updating the cpsc_debug_output.txt
file.
$ ./cpsc
--- M I X E R ---
Welcome to the Mixer.
Please enter the password: butt
Mixed: c8ac38f0
Wrong.
$ python3 sim_cpsc.py butt
c8ac38f0
Nice, it’s working.
Z3
Now it’s time to get z3 solving it. In a previous challenge where I wrote a simulator (dicecraft), I made minimal changes to my simulator to get it working with z3. However in this one it’s a bit tougher. We have a mix function and a encode byte function which z3 can’t easily handle. I decided it might be easier to unroll everything to make it easier for z3 to do its job. I had my teammate sera help with z3 since I was struggling.
Encrypt function
As I said before, the encrypt function takes a one byte input and produces a one byte output. We couldn’t find anything about making a lookup table with Z3, so we generated a huge chain of if statements.
obf_table = [0, 195, 133, 70, 12, 207, 137, 74, 24, 219, 157, 94, 20, 215, 145, 82, 48, 243, 181, 118, 60, 255, 185, 122, 40, 235, 173, 110, 36, 231, 161, 98, 96, 163, 229, 38, 108, 175, 233, 42, 120, 187, 253, 62, 116, 183, 241, 50, 80, 147, 213, 22, 92, 159, 217, 26, 72, 139, 205, 14, 68, 135, 193, 2, 192, 3, 69, 134, 204, 15, 73, 138, 216, 27, 93, 158, 212, 23, 81, 146, 240, 51, 117, 182, 252, 63, 121, 186, 232, 43, 109, 174, 228, 39, 97, 162, 160, 99, 37, 230, 172, 111, 41, 234, 184, 123, 61, 254, 180, 119, 49, 242, 144, 83, 21, 214, 156, 95, 25, 218, 136, 75, 13, 206, 132, 71, 1, 194, 131, 64, 6, 197, 143, 76, 10, 201, 155, 88, 30, 221, 151, 84, 18, 209, 179, 112, 54, 245, 191, 124, 58, 249, 171, 104, 46, 237, 167, 100, 34, 225, 227, 32, 102, 165, 239, 44, 106, 169, 251, 56, 126, 189, 247, 52, 114, 177, 211, 16, 86, 149, 223, 28, 90, 153, 203, 8, 78, 141, 199, 4, 66, 129, 67, 128, 198, 5, 79, 140, 202, 9, 91, 152, 222, 29, 87, 148, 210, 17, 115, 176, 246, 53, 127, 188, 250, 57, 107, 168, 238, 45, 103, 164, 226, 33, 35, 224, 166, 101, 47, 236, 170, 105, 59, 248, 190, 125, 55, 244, 178, 113, 19, 208, 150, 85, 31, 220, 154, 89, 11, 200, 142, 77, 7, 196, 130, 65]
st = f"c254 = If(inp == 254, BitVecVal({obf_table[254]}, 32), BitVecVal({obf_table[255]}, 32))\n"
for i in range(253, -1, -1):
st += f"c{i} = If(inp == {i}, BitVecVal({obf_table[i]}, 32), c{i + 1})\n"
print(st)
Pretty terrible, but hey, who cares if it gets the job done.
Scramble function
This one is a bit tougher to deal with since it involves moving the values around in the array. Sera realized that the indices are always constant and we could scramble the indices of the array instead of the actually array elements. Then at the end, we would just scramble the comparison array into the same order.
Variables “over time”
After doing these two things, z3 still wouldn’t solve anything. Sera suggested that assigning a value to the same variable multiple times might be a bad idea, and to instead suffix each variable with the “time value” or “assignment index”. Originally we would have something like this:
tmp0 = 12
tmp0 = data[0] ^ tmp0
And instead we would use this:
tmp0_0 = 12
tmp0_1 = data0_0 ^ tmp0_0
(This probably wasn’t necessary as we’ll see below.)
After all this, I still couldn’t get z3 to solve on my Windows machine. No matter how long I waited, it didn’t do anything. Giving up, I pasted it onto Discord and worked on different challenges, hoping someone would see something I missed. And then I saw this:
The script had been working the whole time and my z3 just didn’t want to work. I switched over to my Linux machine which was able to solve it in about 35 seconds 🤦.
The amazing part about solving this challenge is that I still didn’t understand how it worked, but just being able to simulate it in Python is good enough. Would it have been faster for me to read into it a little more? Probably. But who cares, it’s a CTF. All that matters at the end of the day is that it was solved.
$ ./cpsc --- M I X E R --- Welcome to the Mixer. Please enter the password: uiuctf{n41tv3_func7iona1_pr0gr4mm1ng_1n_C!} Mixed: e338e9cc0199e8c24b43760f2277cf56f9b7ddff343aaf116fe26cafca4538cfb9c26477e377d19a301e13 Correct! That's the flag!
Files:
logcpsc.py GDB logging script
simcpsc.py Python simulation
cpsc_debug_output.txt Result of GDB logging script for 43 character input
generate_encrypt.py Generate cpsc_solve.py
encrypt function
generate_run.py Generate cpsc_solve.py
go function
cpsc_solve.py Z3 solve script