The Rocky Road To Pwn - Part Three
- Part One: Format Print Basic and Elementary Format Print Attack
- Part Two: Relocation(ASLR) and Data Manipulation With Format Print
- Part Three: A real CTF challenge
In this final part of our Pwn trilogy, we will tackle an actual CTF challenge with limitations. Previously, we explored the format print function in the C standard library, discussed the format print attack, and covered Address Space Layout Randomization (ASLR). We also learned how to bypass ASLR. To wrap things up, we will demonstrate how to manipulate data at arbitrary memory addresses on the stack using the techniques from the previous parts.
The challenge is from SaplingCTF 2023, in which we participated as team Senioritis. We were the only ones to solve the last Pwn challenge, the final assignment of CPSC233. This post also serves as a write-up for the challenge. We highly recommend that readers review the previous parts of the Pwn trilogy before diving in. For those with a strong foundation and thorough mastery of the earlier parts of the trilogy, you will enjoy it.
Reconnaissance
The challenge includes an ELF binary and a C source code file. The C source code file exposes the programming logic of the binary executable. However, understanding lower system details like stack frame layout requires additional work. For instance, similar to the first and second parts, we could either statically disassemble the binary or set a runtime breakpoint under a debugger to analyze the stack frames. This is how we worked out the stack frame of the main function.
1 | ┌──────────────────────────────┐ |
In contrast to stack frames in the previous parts of our trilogy, we have an unusual octet above the stack pointer (RSP). This octet is reserved for the return address of the callee’s stack frame, which is also the address of the next instruction after the corresponding call instruction. We focus on the format print attack, so the callee is likely the format print function. We won’t delve deeply into this topic now but reserve it for later paragraphs. Given the stack frame layout, we can comment on the stack variables from the provided C source code and the stack offset related to the stack pointer (RSP) and the stack base (RBP).
1 | int main() { |
The challenge requires us to gain access to a shell using binary exploitation, as there is no hardcoded flag or loaded program in the virtual memory. In contrast to the win function in the second part of our trilogy, this challenge offers a memory page that is both writable and executable, allowing us to input hexadecimal-encoded bytes. A user-controllable format string feedback is also suitable for a format print attack. We aim to create and load a shellcode into the designated memory section, then take control of the program’s flow using a format print attack and divert it to our loaded shellcode.
The challenge author graciously provided the relocated stack address through the exam ID. Based on part two, the relocated executable address remains to bypass ASLR further. Additionally, we can load 12 consecutive bytes onto an aligned stack variable called name, partially covering the stack canary. This allows us to perform indirect reading and writing to arbitrary locations by loading the target address into name. Although the main function does not return, as it directly calls exit before returning, we could load the on-stack address of the return address of the printf – the proposed callee of the main function – and override it to the static shellcode address 0x2333000. However, the challenge author restricts the read_shellcode procedure call to loading only two hexadecimal bytes each time without any looping mechanism. Since building a shellcode with only two bytes is not easy, we had to somehow loop it.
The Looping Curse
In the 2016 film Doctor Strange, a curse traps Dormamu, a multi-dimensional ruler, in a time loop, compelling him to leave Earth alone. Similarly, we worshiped Dr.Strange and used his superpower to spell a looping curse to trap the challenge’s executable in a loop. The looping curse is necessary because, unlike the example in part two, the challenge’s program doesn’t loop by itself and exits after invoking the format print function. Therefore, a form of “magic” matters to create an artificial read-invoke loop.
In the previous section, we discussed the possibility of loading the on-stack address of the callee’s return address into the name variable and then modifying the return address during a format print call. In this scenario, the callee function would be printf itself, meaning that printf would modify its return address and transfer control to some other instructions instead of the next instruction that calls printf in main. To further explain, we disassembled the main function but focused on the instructions near the printf procedure call.
1 | 0000000000000a78 <main>: |
The main function calls the format print procedure at memory address 0x0ba7, and the return address for the printf function should point to the move instruction at address 0x0bac. Considering that relocations in ASLR align with the virtual memory page size, typically 4096, the least significant byte (0xac in 0x0bac) of the instruction address remains unchanged after relocation. Even though the relocated executable address is unknown, we can still modify the least significant byte of the return address in the printf stack frame to create a loop and redirect the control flow back to previous instructions. The destination instruction must be close enough, as we can only reliably modify the least significant byte of the return address. To jump to a distant address, like a long jump, we would need to further bypass ASLR and determine the relocated executable address.
Remember that the address we target for indirect reading and writing loads to the stack buffer is given by name, 32 bytes off the stack pointer (RSP). As part two discusses, the format print function uses index 10 for the positioned parameter referring to the effective pointer at name. We can use the half-half length modifier in combination with the positioned parameter to override the least significant byte, resulting in %10$hhn.
Additionally, the placeholder to alter data usually follows a placeholder with a field width specifier to specify the target value. For example, we want to redirect the control flow to the line where the program reads feedback. Therefore, we select the move instruction at 0x0b6d, which initiates the parameters before invoking the puts@plt procedure call. Under this ground, we aim to write 0x6d to the least significant byte of the return address of printf@plt‘s stack frame. Since 6dH is 109 in decimal, we would use %109c to specify the target value. Combining these, we end up with %109c%10$hhn as the spell of our looping curse.
The challenge targets a Linux-based x86_64 system for verifying our looping curse spell under a matching system. Like part two, the system comes with Python v3.12.4 and pwnlib 4.13.0. Thus, we could create the following proof-of-concept (PoC).
1 | #!/usr/bin/env python3 |
The PoC script begins by processing the output of the challenge and extracting the leaked stack address, which is provided by the challenge’s author and assists in bypassing ASLR. Subsequent operations reveal the stack pointer (RSP) and deduce the potential location of the return address for potential function calls. The magic spell is then applied as the feedback, with the target address already loaded onto the stack via the ‘name’ buffer. Ideally, the program should return to displaying the prompt to the standard output before requesting feedback. We can confirm this once the PoC script activates interactive mode.
1 | [skid.t@BlackArch playground]$ python ./loop-curse-poc.py |
When the script reaches the interactive stage, it forwards the prompt asking for course feedback from the program. This indicates success as the script has already provided the course feedback once, and the program didn’t exit but ask for the feedback again. We manually typed the spell three times for a solid confirmation, then followed an EOF. The program repeatedly asked for feedback and exited when it reached the EOF. Note that the long line with tailing C is the echoed feedback as the %109c placeholder prints 109 characters, whereas the leading 108 characters are empty spaces for padding purposes. The last character is the decoded value from the least significant byte of the source index register(RSI). With the excellent work of the looping curse, we could recurrently trigger the format print function with different payloads, which is essential for further exploitation steps.
The Two Pointers
The looping curse helps continuously deploy various format print payloads, preventing the program from exiting. Since the challenge only permits loading two shellcode bytes once, we could repeatedly loop and load the shellcode bytes until all have been loaded. To achieve this, it is necessary to offset the func pointer by two during each iteration of the looping process. One approach is to load the address of the func pointer and overwrite it with format print payloads, similar to overwriting the return address. However, the challenge restricts us from loading only 12 bytes on the stack, which is inefficient for two consecutive pointers. Since the first eight bytes must always point to the return address of the format print function, we must use other existing on-stack data to help load the shellcode.
Suppose an out-of-box pointer refers to the stack variable func. We could overwrite func with the cursed conversion specifier discussed in part two of our trilogy. The out-of-box pointer and the func pointer form a two-pointer pattern, where one on-stack pointer refers to another on-stack pointer. As an attacker, we could utilize the out-of-box pointer to control the func pointer, which allows us to indirectly read from and write to arbitrary virtual memory addresses. Even though the challenge provides a read_shellcode procedure call that eliminates the need to encode the shellcode in a format string, the two-pointer pattern helps in creating a control pointer of the func pointer when an out-of-box gift is not available.
1 | ┌───────────────────┐ ┌───────────────────┐ ┌───────────────────┐ |
To locate an out-of-box control pointer for the func function, we iteratively traversed the stack using the %{index}$016llX placeholder. In this placeholder, {index} is replaced by the actual position parameter, starting from 6 and increasing by one. This placeholder revealed an 8-byte memory thunk as a 16-character hexadecimal number. If any of these values matched the address of the func pointer, our small search script would stop and display the corresponding address, the value, and the index.
1 | r = process([exe.path]) |
Unfortunately, the search script failed with an EOFError preceding the message that the challenge program exited with SIGSEGV, which usually occurs when accessing uninitialized pages. This issue arises as our search script may cause the challenge program to read beyond the stack, meaning there’s no out-of-the-box control pointer for the func pointer.
1 | [*] '/home/skid.t/playground/cpsc233_final' |
We encountered a situation where there was no straightforward way to control the func pointer. However, there may be control pointers for other pointers on the stack that refer to somewhere else on the stack. These control pointers can modify the value of their partner pointer to arbitrary stack addresses. Given that the saved RBPs on the stack form a linked list, this manipulation is possible, providing enough pointer pairs.
The RBP Chain
In most x86_64 systems, architects prefer that each function save the caller’s stack base pointer (RBP) on the current stack frame. This design leads to a linked list (a chain) that starts with the RBP register and extends to the C runtime function that initializes the first stack frame.
1 | ┌──────────────────┐ ┌──────────────────┐ |
The RBP chain provides many pointer pairs, which we need. In this case, one pointer refers to another pointer, and the second one refers to a location on the stack. The first pointer acts as the control pointer for the second one, allowing us to partially overwrite the second pointer and direct it to arbitrary stack addresses. In our situation, we want to direct it to the func pointer. This way, we can modify the func pointer and load the entire shellcode. To do this, we updated the search script to look for pointer pairs instead of the exact control pointer for the func pointer.
1 | r = process([exe.path]) |
The updated search script now remembers values encountered and keeps a reverse mapping from the value to the corresponding index in the format string. When iterating to a new stack address, the script checks if the current address is in reverse mapping. If found, the script checks whether its value refers to a stack address. Both checks pass; the script records the pointer pair and adds it to the pointer_pairs list. Even though we only need one pointer pair to create the control pointer for the func pointer, the script yields three for redundancy. The pointer_pairs collects all the yielded pointer pairs for further steps.
1 | [+] Starting local process '/home/skid.t/playground/cpsc233_final': pid 889464 |
The script performed well in completing its mission. It scanned the process’s current system call stack and identified three potential pointer pairs. It’s important to note that the first pair consists of the exam_id and name, essential for the looping process. Therefore, we must exclude the first pair and proceed with the second one. The second pair appears to be an RBP chain, as expected. We’re now prepared to create the control pointer for the func pointer with the discovered pointer pair.
The Construction Zone
Welcome to the construction zone. Here, we use the pointer pairs that were previously discovered to construct the control pointer of the func pointer. Remember that we excluded the first pointer pair and picked the second one. The straightforward way to construct the control pointer for func is by partially overwriting the control pointer (the second one in the pointer pair) with its meta control pointer (the first in the pointer pair). This means that after partial overwriting, the value of the control pointer will be identical to the address of func.
The Straightforward Attempt
A brief Python snippet implements the straightforward idea, performing two overwritings in a single format string. The first overwriting involves the return address of the format print function, and the second overwriting targets the control pointer utilizing the meta control pointer. Recall that the n conversion specifier writes the number of up-to-now printed characters to the designated location. When performing two overwritings in a single format string, the former interference with the latter creates a challenge. To address this, we utilized the modular congruence trick discussed in part two of our trilogy, which ensures that both destinations receive the correct values.
1 | # setup the control pointer for func |
Furthermore, the Python code snippet dumps the overwritten control pointer and then checks its value after modification. This quality assurance step ensures everything works perfectly in the construction zone. Ideally, it should print log information showing the address of the func pointer. Unfortunately, an error occurred when running the script.
1 | Traceback (most recent call last): |
The EOFError occurred as the script was expecting to receive CPTR, which is our signal that the following sixteen characters make up the echoed value of the control pointer. It seems that the program exited, which closed the IO pipes and led to the EOFError. Something must be wrong since we could replicate the exact error by running the script multiple times. Therefore, we ran the script with a debug message to reveal the behind-the-scenes IO conversations, and luckily, we found a glitch.
1 | [skid.t@BlackArch playground]$ python ./ctrl_ptr_poc.py DEBUG |
The script sent a 25-byte feedback to the program, but the program only reads 24 bytes. As a result, the payload was cut off, causing subsequent issues. Therefore, we need to find a way to compress the size of the format string.
The Redemption Of Fantastic Manual
After carefully reading the manual of the format print functions and a few desperate cries at 4 am, we stumbled upon something interesting in the field width section.
Instead of a decimal digit string one may write “*” or “*m$” (for some decimal integer m) to specify that the field width is given in the next argument, or in the m-th argument, respectively, which must be of type int.
We were thrilled to discover that the field width parameter could be passed indirectly. We could swap out %55451c in the format string with something like %*xxc, where xx is a two-digit integer. This means we need to find a location to load 545451 as an integer on the stack. According to the manual, the indirect field width must be of type int; we just need to find a 4-byte buffer for the value. Remember the name buffer – even though it’s only 8 bytes long, the program reads 12 bytes, partially overlapping the stack canary. Considering that the main function exits without returning and we could call the exit syscall in our shellcode, it does not matter whether the stack canary is preserved or corrupted.
1 | ┌───────────────────────────────────┐ |
The stack canary is 40 bytes away from the stack pointer (RSP). Therefore, the appropriate position-parameter index should be 11. We use %*11$c as the placeholder in the format string %109c%10$hhn%*11$c%12$hn. This allows us to overwrite the control pointer with the least significant four bytes of the stack canary. When the format print function handles %*11$c, it only concerns the exact four bytes. The remaining four bytes of the stack canary are entirely ignored by the main function, the format print function, and us.
To load the specified value 55451 and overwrite the stack canary, we need to redirect the control flow back to the Enter your name stage. Unlike redirecting the control flow back to reading feedback, we must overwrite two bytes of the return address of ‘printf’ instead of only the least significant byte. Given this situation, we must consider bypassing the Address Space Layout Randomization (ASLR) entirely.
The Battle Of Relocation
In the hope of bypassing the ASLR entirely, we launched the battle of relocation. The objective of this effort is to uncover the relocated base address of the ELF segments. Any address from the ELF segments after relocation helps, as we can use basic arithmetic to calculate the relocated base address.
The first thing to note is the return address of the main function. However, the main function returns to the C runtime functions and belongs to the libc sections, which differ from the ELF segments. What about the return address of the format print function? Even if we overwrite it with the looping curse, we can still read and print it in the exact format string. With this idea in mind, another Python snippet can be used to uncover the relocated instruction addresses associated with the main function.
1 | payload = b'%109c%10$hhnBGN%10$sEND' |
The payload consists of the looping curse and an indirect reading. The looping curse entails an indirect writing that shares the control pointer with the indirect reading. Unlike direct readings, it is impossible to specify the output format of indirect readings, as raw bytes are printed until a null byte occurs. Therefore, we enclosed the indirect reading with a BGN-END pair to ensure that all bytes are collected without unwanted bytes.
1 | ...... |
The battle went well. The script successfully prompted the relocation of the ELF segments’ base addresses. After verifying the authenticity of the revealed base address, we were confident that we had bypassed ASLR and celebrated the victory.
Construction Complete
We have the compressed format string, a working idea, and the relocated base address of the ELF segments. Therefore, it’s time to complete the construction. To construct the func pointer’s control pointer, we redirect the control flow to name reading. But first, we must calculate the target address with relocation in mind.
1 | # redirect the control flow to name reading |
The address attribute of the ELF object exe represents the relocated base address. In pwnlib, the ELF object represents the relocated ELF segments. We could add the relocated base address to the static offset for any instructions to obtain their relocated addresses. Then, we utilized the relocated address to redirect the control flow to the name reading.
1 | # load the modular congruence of func's address to the 4-byte buffer |
When the control flow reached name reading, we loaded the value onto the stack to construct the control pointer of the func pointer. Since we didn’t want to lose control of the control pointer of the return address of the format print function, we had to load the exact raw address again into the name buffer ahead of the value for overwriting the control pointer to construct. As mentioned earlier, the value to overwrite the control pointer to construct is a modular congruence of the original value. After this, we were ready to build the control pointer of the func pointer with the 4-byte value that had just been loaded.
1 | # partially overwrite the control pointer with the 4-byte buffer |
We’re all set; it’s time to finish the construction. Overwriting the control pointer candidate with the compressed format string mentioned above should be painless. When finished, checking to reach a milestone is always good practice.
1 | # check the current value of the control pointer |
There we go—the construction succeeded. We now have the control pointer of the func pointer set up. It’s time for celebration.
1 | [*] Found stack pointer: 0x7ffdf1715ab0 |
While working in the construction zone, we tried a simple approach but accidentally exceeded the format string length limit. After carefully reading the manual for the format print function, we discovered that we could compress the format string by indirectly passing the field width specifier. We then modified our script to incorporate this idea and successfully identified the new base address of the ELF segments. Finally, we finished the construction, verified that the desired value had been loaded onto the control pointer of the func pointer, and ensured everything was working as expected. We were ready to load the shell code.
The Loading Zone
The construction zone handled most of the preparation work, so it was time for the loading zone. In the loading zone, we assembled the shellcode and loaded it into the designated location. With the preparation work from the construction zone, it was no longer hard to proceed.
The shellcraft module in the pwnlib simplifies assembling shellcode. In this example, we combined a sh with an exit to ensure that the shellcode launches an interactive shell and terminates the victim process without generating additional errors. Ahead of the sh, we used an echo to signal that the control flow is transferred to the shellcode. Since the shellcraft module only generates assembly code, we used the asm function to assemble it into machine code. The read_shellcode function reads two bytes at a time, so we added padding to the assembled shellcode to ensure its length is always even.
1 | # prepair and assemble shellcode |
The code snippet also displays the shellcode in assembly, providing a clear view of its structure. The shellcode comprises four parts: the echo, the shell, the exit, and the trailing nops. The trailing nops are necessary because the func pointer will eventually reference somewhere, and we need to ensure that writing to that location won’t cause any issues. The subsequent part of the script outputs the assembled shellcode in hexadecimal format, which is helpful for diagnostic purposes.
1 | [*] Shellcode in assembly: |
Similar to how we redirected the control flow to the name reading, we can redirect the control flow to shellcode reading by overwriting the least two significant bytes of the return address of the format print function. In contrast to the control flow redirection to the name reading, we already know the base address of the relocated ELF segments. Therefore, we can directly calculate the relocated address of the instructions related to the shellcode reading. We then assembled and delivered a customized looping curse spell.
1 | # prepair for payload uploads |
The most satisfying part of the loading process comes when loading the shellcode into the designated process. Since the read_shellcode function can only take two bytes at a time, we divided the shellcode into multiple chunks, each consisting of precisely two bytes. We loaded these shellcode chunks consecutively using the custom looping spell and partially overwriting the func pointer. As the read_shellcode function requires the shellcode chunks to be in hexadecimal format, we created a helper function called binstr_to_hexstr to convert raw binary strings into hexadecimal encoded strings. Finally, as mentioned earlier, the script prints the hexadecimal representation of the assembled shellcode.
1 | gef➤ hexdump byte --size 112 0x2333000 |
Please repeat after me: checking and ensuring we reach milestones is always good practice. In this case, we encountered a glitch. The assembled payload is supposed to have b8 2f as the 42nd and 43rd bytes, but it turned out to be 00 b8, which is unexpected. This discrepancy occurred because the modular_adjusted in the script might be zero. However, using a placeholder like %0c is against the rules outlined in the format print function’s manual.
An optional decimal digit string (with nonzero first digit) specifying a minimum field width.
When encountering a placeholder like %0c, the printf function won’t skip the placeholder as if there’s nothing to print. Instead, it will ignore the field width modifier and print one character, which can mess up the subsequent overwritings. To prevent this, we must check if the modular_adjusted is zero and intentionally skip the corresponding placeholder when constructing the format string.
1 | # upload payload |
It is a slight tweak to the code but a massive leap toward success. We dropped in the change, reran the script, and examined the loaded shellcode with the debugger. Everything appeared flawless.
1 | gef➤ hexdump byte --size 112 0x2333000 |
We successfully assembled the shellcode and loaded it into the target process. We encountered a glitch, but the solution wasn’t too complicated. The only thing left was handing over the control flow to the shellcode, which led to the last battle.
The Last Battle
In the final battle of part three of the Pwn trilogy, we want to run the shellcode. Similar to how we take over the control flow and form a loop, we aim to hand over the control flow to the shellcode by overwriting the return address of the format print function. This way, when the function returns, it will run the shellcode. Unlike partially overwriting the return address to form the loop, we need to overwrite all 8 bytes together this time.
1 | 64-bit return address in little-endian: |
We reconstructed the control pointer, which refers to the most significant 4 bytes (M4SB) of the return address, given that the equivalent pointer at the name buffer already refers to the least significant 4 bytes (L4SB) of the return address. When a placeholder in a format string uses the n conversion specifier without a field width specifier, it overwrites the entire 4-byte integer. Therefore, we can overwrite all 8 bytes of the return address at once with the control pointer and the name effective pointer.
1 | ┌────────────┬────────────┐ |
Since the target address 0x2333000 is too long to fit in the length-constrained format string, we load it into the 4-byte buffer, partially taken from the stack canary, following the name buffer. This allows us to compress the format string to an acceptable length, similar to how we compressed the format string when constructing the control pointer. We then worked out a snippet regarding this approach.
1 | # redirect the control flow to name reading |
We added two more lines at the end of the snippet. One waits for the shellcode to be successfully executed. Given the shellcode launches an interactive shell, the second line activates the alternative mode of the pwn script. We could then run common shell commands in the interactive mode.
1 | [*] '/home/skid.t/playground/cpsc233_final' |
There we go. We won the last battle and obtained the shell. We played a little with the shell and retrieved the flag, where we successfully completed the challenge. The Pwn trilogy concludes here. Considering all the efforts, it’s been such a rocky road to Pwn.
Conclusion
The final assignment for CPSC233 is a challenging and complex task in an entry-level CTF. It builds upon basic knowledge from university-level computer system courses and requires participants to go beyond. We had never considered delving into the C standard libraries and studying the internals of the format print functions, but we enjoyed the fantastic rabbit hole. In general, the challenge tests the participants’ ability to conduct research. We learned a lot during this process, which motivated us to share, resulting in the Pwn trilogy.
In the Pwn trilogy, we examine the lower system view of the format printf function. Then, we explore how to read data that is not intended to be read. Later, we delve into how to modify data using the format printf function. We also discuss Address Space Layout Randomization (ASLR) and methods to bypass it. Finally, we use a CTF challenge to summarize and practice all the skills discussed. We hope you enjoy the Pwn trilogy. Happy hacking.