UIUCTF 2022 - CPSC

Categories: re
2022-08-03
by not_really

CPSC

Love Haskell? Ocaml? Then you’ll love this.

author: richyliu, Surg

This program takes a “password”, mixes it in some way, and compares it with the flag. Changing a single character at the beginning or end changes the entire thing, so it’s not encrypted a character at a time.

$ ./cpsc
--- M I X E R ---
Welcome to the Mixer.
Please enter the password: abcd
Mixed: 32c39cbe
Wrong.
$ ./cpsc
--- M I X E R ---
Welcome to the Mixer.
Please enter the password: abce
Mixed: 79ba0518
Wrong.

According to the strings in the binary, this was made with cpc (“Continuation Passing C”). The description says “Love Haskell? Ocaml? Then you’ll love this.”

Example input

--- M I X E R ---
Welcome to the Mixer.
Please enter the password: uiuctf{not_a_real_flag}
Mixed: eb3d2ef730f54cc4da121eb0d4d1a2c92c36581095bab8
Wrong.

Intro

I freaked out at first when reading the description because I had a bad experience with a previous functional language challenge, “hardskull” from CSAW2021. If I recall, the strings weren’t in ASCII format in memory, were constantly moving around, and the control flow changed by changing a single character. This made it really hard to use any tricks other than just understanding the code.

Like almost all the rev challenges I solve, I try to leave all the thinking to Z3 and do just the reversing part. If I can replicate this program in Python at least, it should be possible to have Z3 solve the problem.

Decompiling, sort of

Skipping past the input reading function, the first function references two other functions, which in turn reference 2-3 other functions:

image-20220804230149732

I generated a graph of xrefs to get an idea of what was happening. Naming convention is just adding on an A/B/C for the first/second/third function call or O for being the only function call.

image-20220804230307765

Not terribly bad. Time to start looking through the functions to get an idea of what they’re doing.

As it shows in the graph, AA is obviously the flag checking function:

image-20220804230635735

That memcmp function compares the “Mixed” input with this string:

image-20220804230846293

It seems to be the same no matter what you type, so this must be the final string to match. This also tells us the length of the input: 43 characters.

Logging with GDB

Next, I wanted to log all the functions being called so I could see if there were any obvious patterns. For this I usually use gdb python since it’s easier in my opinion to read and set breakpoints than clicking through the breakpoint UI of IDA or trying to use the poorly documented IDC.

import gdb

#gdbpy start
offAddress = 0

def setBreakEvent(callback, addr, inst=None):
    class GdbBreakpoint(gdb.Breakpoint):
        def __init__(self, callback, addr, inst):
            global offAddress
            super(GdbBreakpoint, self).__init__("*" + hex(addr + offAddress), type=gdb.BP_BREAKPOINT, internal=False)
            self.callback = callback
            self.inst = inst
            
        def stop(self):
            pos = readReg("rip")
            res = None
            if inst == None:
                res = self.callback(pos)
            else:
                res = self.callback(inst, pos)
            
            if res == True:
                return res
            else:
                return False
            
    GdbBreakpoint(callback, addr, inst)

def getEntryPoint():
    ENTRY_POINT_STR = "\tEntry point: "
    fileInfo = gdb.execute("info file", to_string=True)
    infoLines = fileInfo.splitlines()
    for line in infoLines:
        if line.startswith(ENTRY_POINT_STR):
            entryPoint = int(line[len(ENTRY_POINT_STR):],0)
            return entryPoint
    
    return None

def readReg(reg):
    gdbValue = gdb.selected_frame().read_register(reg)
    return int(gdbValue.cast(gdb.lookup_type("long")))
#gdbpy end

def logCall(text, addr):
    print(text)
    return False

gdb.execute("set print addr off")
gdb.execute("set pagination off")
gdb.execute("set disassembly-flavor intel")

gdb.execute(f"file cpsc", False)
fileAddress = getEntryPoint()
gdb.execute("starti")
offAddress = getEntryPoint() - fileAddress

setBreakEvent(logCall, 0x2d10, "a")
setBreakEvent(logCall, 0x2870, "aa")
setBreakEvent(logCall, 0x2e80, "ab")
setBreakEvent(logCall, 0x2970, "aba")
setBreakEvent(logCall, 0x34a0, "abb")
setBreakEvent(logCall, 0x2fa0, "abba")
setBreakEvent(logCall, 0x3120, "abbaa")
setBreakEvent(logCall, 0x2980, "abbaaa")
setBreakEvent(logCall, 0x3270, "abbaab")
setBreakEvent(logCall, 0x2940, "abbaaba")
setBreakEvent(logCall, 0x2a60, "abbaabb")
setBreakEvent(logCall, 0x3620, "abbaabbo")
setBreakEvent(logCall, 0x2c20, "abbaabboa")
setBreakEvent(logCall, 0x2830, "abbaabbob")
setBreakEvent(logCall, 0x2990, "abbaabboc")
setBreakEvent(logCall, 0x3850, "abbaabboco")
setBreakEvent(logCall, 0x2b40, "abbaabbocoa")
setBreakEvent(logCall, 0x2860, "abbaabbococ")

gdb.execute("c")

Let’s see what happens if we type test.

--- M I X E R ---
Welcome to the Mixer.
Please enter the password: test
a
ab
abb
abb
abb
abba
abb
abbaa
abbaab
abbaabb
abbaabbo
abbaabbob
abbaabboa
abbaabbo
abbaabbob
abbaabboa
abbaabbo
abbaabboc
abbaabboco
abbaabbob
abbaabbocoa
abbaabboco
abbaabbob
abbaabbocoa
abbaabboco
abbaabbococ
abbaaba
abbaaa
abba
abb
abb
abba
abb
abbaa
abbaab
abbaabb
abbaabbo
abbaabbob
abbaabboa
abbaabbo
abbaabbob
abbaabboa
abbaabbo
abbaabboc
abbaabboco
abbaabbob
abbaabbocoa
abbaabboco
abbaabbob
abbaabbocoa
abbaabboco
abbaabbococ
abbaaba
abbaaa
abbaa
abbaab
abbaabb
abbaabbo
abbaabbob
abbaabboa
abbaabbo
abbaabbob
abbaabboa
abbaabbo
abbaabbob
abbaabboa
abbaabbo
abbaabbob
abbaabboa
abbaabbo
abbaabboc
abbaabboco
abbaabbob
abbaabbocoa
abbaabboco
abbaabbob
abbaabbocoa
abbaabboco
abbaabbob
abbaabbocoa
abbaabboco
abbaabbob
abbaabbocoa
abbaabboco
abbaabbococ
abbaaba
abbaaa
aba
aa
Mixed: 8d368fad
Wrong.

And what about tesu?

--- M I X E R ---
Welcome to the Mixer.
Please enter the password: tesu
a
ab
abb
abb
abb
abba
...
abbaabbob
abbaabbocoa
abbaabboco
abbaabbococ
abbaaba
abbaaa
aba
aa
Mixed: c64f160b
Wrong.

I’m not going to paste it all again, but it’s the exact same output as test. It turns out the control flow only depends on string length but not on string content.

That’s great if I only wanted to know what functions were being called, but it doesn’t tell me what the argument values were. I realized a pattern in the code that probably represented arguments.

image-20220806180248101

I manually went through and edited the breakpoint logs to have values so I could get a better idea of what was happening. Here’s the new script output for test (data is a pointer to input memory):

--- M I X E R ---
Welcome to the Mixer.
Please enter the password: test
a(data[0], 4, 43, 0, data[2192])
ab(data[0], 0, 4)
abb(data[0], 4, 1)
abb(data[0], 2, 2)
abb(data[0], 1, 4)
abba(1, 1, 1, data[0], 2)
abb(data[1], 1, 5)
abbaa(1, 1, 1, data[0], 2)
abbaab_mix(data[0], data[0], 1, data[1], 1, 0, 2)
abbaabb_mul1(data[0], 2, 2)
abbaabbo_xortext1(94, 0, 0, data[0], 2, 0, 2)
abbaabbob_encbyte()
abbaabboa(0, 0, data[0], 2, 2, 0, 253)
abbaabbo_xortext1(253, 1, 0, data[0], 2, 0, 2)
abbaabbob_encbyte()
abbaabboa(1, 0, data[0], 2, 2, 0, 171)
abbaabbo_xortext1(171, 2, 0, data[0], 2, 0, 2)
abbaabboc_mul2(171, 0, data[0], 2, 2)
abbaabboco_xortext2(102, 0, data[0], 2, 2)
abbaabbob_encbyte()
abbaabbocoa_setchar(0, data[0], 2, 2, 148)
abbaabboco_xortext2(148, 1, data[0], 2, 2)
abbaabbob_encbyte()
abbaabbocoa_setchar(1, data[0], 2, 2, 123)
abbaabboco_xortext2(123, 2, data[0], 2, 2)
abbaabbococ()
abbaaba()
abbaaa()
abba(2, 2, 2, data[0], 1)
abb(data[2], 2, 3)
abb(data[2], 1, 6)
abba(1, 1, 1, data[2], 3)
abb(data[3], 1, 7)
abbaa(1, 1, 1, data[2], 3)
abbaab_mix(data[2], data[2], 1, data[3], 1, 2, 3)
abbaabb_mul1(data[2], 2, 3)
abbaabbo_xortext1(141, 0, 0, data[2], 2, 0, 3)
abbaabbob_encbyte()
abbaabboa(0, 0, data[2], 2, 3, 0, 130)
abbaabbo_xortext1(130, 1, 0, data[2], 2, 0, 3)
abbaabbob_encbyte()
abbaabboa(1, 0, data[2], 2, 3, 0, 154)
abbaabbo_xortext1(154, 2, 0, data[2], 2, 0, 3)
abbaabboc_mul2(154, 0, data[2], 2, 3)
abbaabboco_xortext2(153, 0, data[2], 2, 2)
abbaabbob_encbyte()
abbaabbocoa_setchar(0, data[2], 2, 2, 70)
abbaabboco_xortext2(70, 1, data[2], 2, 2)
abbaabbob_encbyte()
abbaabbocoa_setchar(1, data[2], 2, 2, 79)
abbaabboco_xortext2(79, 2, data[2], 2, 2)
abbaabbococ()
abbaaba()
abbaaa()
abbaa(2, 2, 2, data[0], 1)
abbaab_mix(data[0], data[0], 2, data[2], 2, 0, 1)
abbaabb_mul1(data[0], 4, 1)
abbaabbo_xortext1(47, 0, 0, data[0], 4, data[2], 1)
abbaabbob_encbyte()
abbaabboa(0, 0, data[0], 4, 1, data[2], 141)
abbaabbo_xortext1(141, 1, 0, data[0], 4, data[2], 1)
abbaabbob_encbyte()
abbaabboa(1, 0, data[0], 4, 1, data[2], 29)
abbaabbo_xortext1(29, 2, 0, data[0], 4, data[2], 1)
abbaabbob_encbyte()
abbaabboa(2, 0, data[0], 4, 1, data[2], 41)
abbaabbo_xortext1(41, 3, 0, data[0], 4, data[2], 1)
abbaabbob_encbyte()
abbaabboa(3, 0, data[0], 4, 1, data[2], 41)
abbaabbo_xortext1(41, 4, 0, data[0], 4, data[2], 1)
abbaabboc_mul2(41, 0, data[0], 4, 1)
abbaabboco_xortext2(51, 0, data[0], 4, 4)
abbaabbob_encbyte()
abbaabbocoa_setchar(0, data[0], 4, 4, 173)
abbaabboco_xortext2(173, 1, data[0], 4, 4)
abbaabbob_encbyte()
abbaabbocoa_setchar(1, data[0], 4, 4, 143)
abbaabboco_xortext2(143, 2, data[0], 4, 4)
abbaabbob_encbyte()
abbaabbocoa_setchar(2, data[0], 4, 4, 54)
abbaabboco_xortext2(54, 3, data[0], 4, 4)
abbaabbob_encbyte()
abbaabbocoa_setchar(3, data[0], 4, 4, 141)
abbaabboco_xortext2(141, 4, data[0], 4, 4)
abbaabbococ()
abbaaba()
abbaaa()
aba()
aa(data[0], 4, 43, 0, data[2192])
Mixed: 8d368fad
Wrong.

This makes it a bit easier to understand what’s going on. You might notice I also renamed some things in this script. Here’s what I found about the functions from a mix of debugging and reading code.

ABBAAB_mix(dst, srcB, lenB, srcA, lenA, ?, ?)

Reorders the input buffer in memory (srcA and srcB are pointers somewhere in the input.) It first makes a new array. It takes the byte at srcA and copies it to the array and then subtracts 1 from lenA. Then it adds the byte at srcB at subtracts 1 from lenB. It keeps switching between these two until lenA and lenB are 0. It then reverses this array and overwrites whatever is at dst. For the implementation, see the scramble function in the Python simulator code below.

ABBAABB_mul1(?, ?, value)

Returns (value * 0x2f) & 0xff. This value is passed into the first argument of ABBAABBO.

ABBAABBO_xortext1(value, off, ?, ptr, ?, ?, ?)

Returns value ^ *(ptr+off). This value is passed into ABBAABBOB.

ABBAABBOA_setchar(off, ?, ptr, ?, ?, ?, value)

Sets *(ptr+off) = value.

ABBAABBOB_encbyte(value)

“Encrypts” the value. This is a 1:1 conversion from one byte value to another. For example, 1 becomes 195, 2 becomes 133, etc. This value is passed into the fifth argument of ABBAABBOC or ABBAABBOCOA.

ABBAABBOC_mul2(?, ?, ?, ?, value)

Returns (value * 0x33) & 0xff. This value is passed into the first argument of ABBAABBOCO.

ABBAABBOCO_xortext2(value, off0, ptr, ?, off1)

Returns value ^ *(ptr + (off1 - 1 - off0)). This value is passed into ABBAABBOB.

ABBAABBOCOA_setchar(off0, ptr, off1, ?, value)

Sets *(ptr + (off1 - 1 - off0)) = value.

Writing the simulator

At this point I have no idea what’s going on, but I start to write an emulator in Python anyways. As long as I can get a simulation, it should be pretty easy to port it over to z3. To build the simulator, I decided to just parse the output from GDB because it sounded easier than trying to understand the rest of the decompiled code (this made me feel awful inside, but it’s a CTF so it’s fine.)

Normally the program calculates a value and then passes it onto the next function. There is always only one variable in each function call. Any other argument is constant no matter the input or I don’t read it in my script. Because of this, I can use a single temp variable to represent the output from the last function and not worry about actually passing values between functions.

import sys

class SimSim:
    def loadText(self, text):
        self.text = bytearray(text.encode("utf-8"))

    def start(self, lines):
        self.lines = lines
        self.lineIdx = 0
        self.tmp0 = 0

        while True:
            self.step()
            if self.lineIdx >= len(self.lines):
                break
            
        
    def scramble(self, startx, countx, starty, county):
        newList = []
        actualStartx = startx
        actualCountx = countx
        actualCounty = county
        while True:
            if county > 0:
                newList.append(self.text[starty])
                starty += 1
                county -= 1
            
            if countx > 0:
                newList.append(self.text[startx])
                startx += 1
                countx -= 1
                
            if countx == 0 and county == 0:
                break
        
        newList.reverse()
        
        for i in range(0, actualCountx + actualCounty):
            self.text[i + actualStartx] = newList[i]
    
    # get nth argument (regular number)
    def getnarg(self, idx):
        line = self.lines[self.lineIdx]
        inText = line[line.find("(")+1:line.find(")")]
        inTextSpl = inText.split(",")[idx]
        return int(inTextSpl)
    
    # get nth pointer argument (data[XXX])
    def getparg(self, idx):
        line = self.lines[self.lineIdx]
        inText = line[line.find("(")+1:line.find(")")].replace(" ", "")
        inTextSpl = inText.split(",")[idx]
        return int(inTextSpl[5:-1])
    
    def encrypt(self, num):
        esi = num
        ecx = num
        ecx <<= 6
        esi ^= ecx
        ecx = esi
        ecx = (ecx >> 8 << 8) | ((ecx & 0xff) >> 7)
        ecx ^= esi
        esi = ecx << 1
        ecx ^= esi
        return ecx & 0xff
    
    def step(self):
        line = self.lines[self.lineIdx]
        if line.startswith("abbaab_mix("):
            cx = self.getnarg(2)
            cy = self.getnarg(4)
            sx = self.getparg(1)
            sy = self.getparg(3)
            self.scramble(sx, cx, sy, cy)
        elif line.startswith("abbaabb_mul1("):
            val = self.getnarg(2)
            self.tmp0 = (val * 0x2f) & 0xff
        elif line.startswith("abbaabbo_xortext1("):
            a1 = self.getnarg(1)
            a4 = self.getnarg(4)
            if a1 == a4:
                self.lineIdx += 1
                return

            a3 = self.getparg(3)
            
            txt = self.text[a3 + a1]
            self.tmp0 ^= txt
        elif line.startswith("abbaabbob_encbyte("):
            self.tmp0 = self.encrypt(self.tmp0)
        elif line.startswith("abbaabboa("):
            a2 = self.getparg(2)
            a0 = self.getnarg(0)
            self.text[a2 + a0] = self.tmp0
        elif line.startswith("abbaabboc_mul2("):
            a4 = self.getnarg(4)
            self.tmp0 = (a4 * 0x33) & 0xff
        elif line.startswith("abbaabboco_xortext2("):
            a2 = self.getparg(2)
            a4 = self.getnarg(4)
            a1 = self.getnarg(1)
            if (a4 - 1 - a1) >= 0:
                self.tmp0 = self.tmp0 ^ self.text[a2 + (a4 - 1 - a1)]
            
        elif line.startswith("abbaabbocoa_setchar("):
            a1 = self.getparg(1)
            a2 = self.getnarg(2)
            a0 = self.getnarg(0)
            if (a2 - 1 - a0) >= 0:
                self.text[a1 + (a2 - 1 - a0)] = self.tmp0
        
        self.lineIdx += 1

sim = SimSim()
sim.loadText(sys.argv[1])

# output from gdb script
with open("cpsc_debug_output.txt") as file:
    lines = file.readlines()
    lines = [line.rstrip() for line in lines]

sim.start(lines)
print(bytes(sim.text).hex())

And testing it on test again:

$ python3 sim_cpsc.py test
8d368fad

Good, it matches what the simulator does. I made sure it wasn’t a fluke by typing something else (of the same length) in the real cpsc program and comparing with the Python script without updating the cpsc_debug_output.txt file.

$ ./cpsc
--- M I X E R ---
Welcome to the Mixer.
Please enter the password: butt
Mixed: c8ac38f0
Wrong.

$ python3 sim_cpsc.py butt
c8ac38f0

Nice, it’s working.

Z3

Now it’s time to get z3 solving it. In a previous challenge where I wrote a simulator (dicecraft), I made minimal changes to my simulator to get it working with z3. However in this one it’s a bit tougher. We have a mix function and a encode byte function which z3 can’t easily handle. I decided it might be easier to unroll everything to make it easier for z3 to do its job. I had my teammate sera help with z3 since I was struggling.

Encrypt function

As I said before, the encrypt function takes a one byte input and produces a one byte output. We couldn’t find anything about making a lookup table with Z3, so we generated a huge chain of if statements.

obf_table = [0, 195, 133, 70, 12, 207, 137, 74, 24, 219, 157, 94, 20, 215, 145, 82, 48, 243, 181, 118, 60, 255, 185, 122, 40, 235, 173, 110, 36, 231, 161, 98, 96, 163, 229, 38, 108, 175, 233, 42, 120, 187, 253, 62, 116, 183, 241, 50, 80, 147, 213, 22, 92, 159, 217, 26, 72, 139, 205, 14, 68, 135, 193, 2, 192, 3, 69, 134, 204, 15, 73, 138, 216, 27, 93, 158, 212, 23, 81, 146, 240, 51, 117, 182, 252, 63, 121, 186, 232, 43, 109, 174, 228, 39, 97, 162, 160, 99, 37, 230, 172, 111, 41, 234, 184, 123, 61, 254, 180, 119, 49, 242, 144, 83, 21, 214, 156, 95, 25, 218, 136, 75, 13, 206, 132, 71, 1, 194, 131, 64, 6, 197, 143, 76, 10, 201, 155, 88, 30, 221, 151, 84, 18, 209, 179, 112, 54, 245, 191, 124, 58, 249, 171, 104, 46, 237, 167, 100, 34, 225, 227, 32, 102, 165, 239, 44, 106, 169, 251, 56, 126, 189, 247, 52, 114, 177, 211, 16, 86, 149, 223, 28, 90, 153, 203, 8, 78, 141, 199, 4, 66, 129, 67, 128, 198, 5, 79, 140, 202, 9, 91, 152, 222, 29, 87, 148, 210, 17, 115, 176, 246, 53, 127, 188, 250, 57, 107, 168, 238, 45, 103, 164, 226, 33, 35, 224, 166, 101, 47, 236, 170, 105, 59, 248, 190, 125, 55, 244, 178, 113, 19, 208, 150, 85, 31, 220, 154, 89, 11, 200, 142, 77, 7, 196, 130, 65]

st = f"c254 = If(inp == 254, BitVecVal({obf_table[254]}, 32), BitVecVal({obf_table[255]}, 32))\n"
for i in range(253, -1, -1):
    st += f"c{i} = If(inp == {i}, BitVecVal({obf_table[i]}, 32), c{i + 1})\n"

print(st)

Pretty terrible, but hey, who cares if it gets the job done.

Scramble function

This one is a bit tougher to deal with since it involves moving the values around in the array. Sera realized that the indices are always constant and we could scramble the indices of the array instead of the actually array elements. Then at the end, we would just scramble the comparison array into the same order.

Variables “over time”

After doing these two things, z3 still wouldn’t solve anything. Sera suggested that assigning a value to the same variable multiple times might be a bad idea, and to instead suffix each variable with the “time value” or “assignment index”. Originally we would have something like this:

tmp0 = 12
tmp0 = data[0] ^ tmp0

And instead we would use this:

tmp0_0 = 12
tmp0_1 = data0_0 ^ tmp0_0

(This probably wasn’t necessary as we’ll see below.)


After all this, I still couldn’t get z3 to solve on my Windows machine. No matter how long I waited, it didn’t do anything. Giving up, I pasted it onto Discord and worked on different challenges, hoping someone would see something I missed. And then I saw this:

image-20220807180544918

The script had been working the whole time and my z3 just didn’t want to work. I switched over to my Linux machine which was able to solve it in about 35 seconds 🤦.

The amazing part about solving this challenge is that I still didn’t understand how it worked, but just being able to simulate it in Python is good enough. Would it have been faster for me to read into it a little more? Probably. But who cares, it’s a CTF. All that matters at the end of the day is that it was solved.

$ ./cpsc
--- M I X E R ---
Welcome to the Mixer.
Please enter the password: uiuctf{n41tv3_func7iona1_pr0gr4mm1ng_1n_C!}
Mixed: e338e9cc0199e8c24b43760f2277cf56f9b7ddff343aaf116fe26cafca4538cfb9c26477e377d19a301e13
Correct! That's the flag!

Files:

logcpsc.py GDB logging script

simcpsc.py Python simulation

cpsc_debug_output.txt Result of GDB logging script for 43 character input

generate_encrypt.py Generate cpsc_solve.py encrypt function

generate_run.py Generate cpsc_solve.py go function

cpsc_solve.py Z3 solve script