September 05, 2016

Patching Binaries: Strings

Patching Binaries will be a series of articles about how to extract information and modify program behavior. It focuses on the Mac Mach-O executable format for the x86-64 architecture, but the techniques are similar for other formats.

The files used in this article can be found here.

One of the first things to look at in a binary, when trying to determine information about it, is the constant strings saved within it. In Mach-O binaries these can be found in section __cstring of segment __TEXT.

Compile the example program “test.cc” as “test” and run it:

% ./test
std::string example.
And a char* string, too.

To list a programs’ strings there are at least two programs to help us do that; otool and strings.

The most simple is using strings:

% strings -o -t x test
1f50 std::string example.
1f65 And a char* string, too.

It shows the strings and their hexadecimal offsets in the binary.

A little more detail can be obtained with otool though:

% otool -s __TEXT __cstring test
test:
Contents of (__TEXT,__cstring) section
0000000100001f50   73 74 64 3a 3a 73 74 72 69 6e 67 20 65 78 61 6d
0000000100001f60   70 6c 65 2e 00 41 6e 64 20 61 20 63 68 61 72 2a
0000000100001f70   20 73 74 72 69 6e 67 2c 20 74 6f 6f 2e 00 0a 00

% otool -v -s __TEXT __cstring test
test:
Contents of (__TEXT,__cstring) section
0000000100001f50   std::string example.
0000000100001f65   And a char* string, too.
0000000100001f7e   \n

To understand the layout of the binary block above the strings are end-to-end:

0000000100001f50   73 74 64 3a 3a 73 74 72 69 6e 67 20 65 78 61 6d
0000000100001f60   70 6c 65 2e 00

0000000100001f65   41 6e 64 20 61 20 63 68 61 72 2a 20 73 74 72 69
0000000100001f75   6e 67 2c 20 74 6f 6f 2e 00
                    
0000000100001f7e   0a 00

Notice the zero byte for termination at the end of each string!

We now know that there are two “interesting” strings at offset 1f50 and 1f65, which means if we read the binary’s data at those positions we will find the strings.

The next natural thing to do is modifying the strings. I wrote a python script “patch.py” to patch a binary at an offset with a new string.

Let’s change the first string to become "Hello, World!":

% ./patch.py test 0x1f50 "Hello, World!       "
Patching "test" at offset 8016 with: "Hello, World!       " (20 bytes)
Read 15392 bytes
Size of string at 8016 is 20 bytes
Changing values at 8016 to 8036
Writing new data

% ./test
Hello, World!
And a char* string, too.

Voila!

Note that the size of the string we patched is 20 bytes, which means the value to overwrite it cannot be longer than that. To visually clear the old value we put spaces as padding at the end.

The contents of the binary are now changed to the following:

% otool -s __TEXT __cstring test
test:
Contents of (__TEXT,__cstring) section
0000000100001f50   48 65 6c 6c 6f 2c 20 57 6f 72 6c 64 21 20 20 20
0000000100001f60   20 20 20 20 00 41 6e 64 20 61 20 63 68 61 72 2a
0000000100001f70   20 73 74 72 69 6e 67 2c 20 74 6f 6f 2e 00 0a 00

Notice the 7 spaces (byte 20) after our string.

As a second example compile “password.cc” into “password” and run it:

% ./password
Password: test
Invalid!
Password: password
Invalid!
Password: 1234
Invalid!
Password: ^C

Okay, so maybe we can’t just outright guess it. Let’s look at the strings in the binary instead for clues:

% otool -v -s __TEXT __cstring password
password:
Contents of (__TEXT,__cstring) section
0000000100001f28  Password:
0000000100001f33  AK9FJ31P
0000000100001f3c  You've entered the correct password!\n
0000000100001f62  Invalid!\n

One value looks curious: “AK9FJ31P”

% ./password
Password: AK9FJ31P
You've entered the correct password!

Keep in mind that this is a very simple example and nothing akin to a real-world example. It just illustrates the technique of finding constant strings in a binary that can be used in different ways.

Lastly we will briefly see how the offsets to the constant strings relate to the machine code instructions of the executable. We will do this by using the debugger LLDB to find the spot where the password of the previous example is loaded into memory.

We start by running LLDB and braking when the main() is called:

% lldb ./password
(lldb) target create "./password"
Current executable set to './password' (x86_64).
(lldb) b main
Breakpoint 1: where = password`main, address = 0x0000000100000650
(lldb) r
Process 84336 launched: './password' (x86_64)
Process 84336 stopped
* thread #1: tid = 0x212d03, 0x0000000100000650 password`main, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
    frame #0: 0x0000000100000650 password`main
password`main:
->  0x100000650 <+0>:  pushq  %rbp
    0x100000651 <+1>:  movq   %rsp, %rbp
    0x100000654 <+4>:  subq   $0xd0, %rsp
    0x10000065b <+11>: xorl   %esi, %esi

Since we do not know the address of the instructions that loads the string we have to step through the program until we find it. To get a better overview we will disassemble the main():

(lldb) dis
password`main:
->  0x100000650 <+0>:   pushq  %rbp
    0x100000651 <+1>:   movq   %rsp, %rbp
    0x100000654 <+4>:   subq   $0xd0, %rsp
    0x10000065b <+11>:  xorl   %esi, %esi
    0x10000065d <+13>:  movl   $0x18, %eax
......
    0x1000006fb <+171>: jmp    0x1000006d7               ; <+135>
    0x100000700 <+176>: jmp    0x100000705               ; <+181>
    0x100000705 <+181>: movq   0x18fc(%rip), %rdi        ; (void *)0x00007fff78ba32f8: std::__1::cout
    0x10000070c <+188>: leaq   0x1815(%rip), %rsi        ; "Password: "
    0x100000713 <+195>: callq  0x100000820               ; std::__1::basic_ostream<char, std::__1::char_traits<char> >& std::__1::operator<<<std::__1::char_traits<char> >(std::__1::basic_ostream<char, std::__1::char_traits<char> >&, char const*)
    0x100000718 <+200>: movq   %rax, -0xa8(%rbp)
    0x10000071f <+207>: jmp    0x100000724               ; <+212>
    0x100000724 <+212>: movq   0x18d5(%rip), %rdi        ; (void *)0x00007fff78ba3250: std::__1::cin
......
    0x100000737 <+231>: movq   %rax, -0xb0(%rbp)
    0x10000073e <+238>: jmp    0x100000743               ; <+243>
    0x100000743 <+243>: leaq   0x17e9(%rip), %rax        ; "AK9FJ31P"
......

From the above excerpt it is shown that the "Password: " text is loaded at 0x10000070c, and that the input of std::cin at 0x100000724 is compared with the real password "AK9FJ31P" on 0x100000743. But let’s look closer at that last address:

(lldb) dis -b -c 1 -s 0x100000743
password`main:
    0x100000743 <+243>: 48 8d 05 e9 17 00 00  leaq   0x17e9(%rip), %rax        ; "AK9FJ31P"

We know that "AK9FJ31P" resides at offset 1f33 but here it uses 17e9, why? To understand this it is necessary to know that it uses relative addressing. So the leaq (load effective address) is invoked at 743 and if we add that to 17e9 then we get 1f2c. That’s still not the correct offset! However, looking at the assembly code we see that it uses 7 bytes, and now the equation fits: 743+7+17e9=1f33!

Note that we used an unstripped binary for this example, which means it still contains useful information. However, with a real-world example it will most likely be stripped of symbols.

Let’s remove the symbols:

% strip password

Now when we try to hook up on the main() it is evident that we can’t because the symbol is not known:

% lldb ./password
(lldb) target create "./password"
Current executable set to './password' (x86_64).
(lldb) b main
Breakpoint 1: no locations (pending).
WARNING:  Unable to resolve breakpoint to any actual locations.

Instead we will run the program until it asks for the password and then break to see a backtrace to get our bearings:

(lldb) r
Process 86636 launched: './password' (x86_64)
Password: Process 86636 stopped
* thread #1: tid = 0x225927, 0x00007fff9218df8a libsystem_kernel.dylib`__read_nocancel + 10, stop reason = signal SIGSTOP
    frame #0: 0x00007fff9218df8a libsystem_kernel.dylib`__read_nocancel + 10
libsystem_kernel.dylib`__read_nocancel:
->  0x7fff9218df8a <+10>: jae    0x7fff9218df94            ; <+20>
    0x7fff9218df8c <+12>: movq   %rax, %rdi
    0x7fff9218df8f <+15>: jmp    0x7fff921887cd            ; cerror_nocancel
    0x7fff9218df94 <+20>: retq
(lldb) bt
* thread #1: tid = 0x225927, 0x00007fff9218df8a libsystem_kernel.dylib`__read_nocancel + 10, stop reason = signal SIGSTOP
  * frame #0: 0x00007fff9218df8a libsystem_kernel.dylib`__read_nocancel + 10
    frame #1: 0x00007fff8d103155 libsystem_c.dylib`_sread + 16
    frame #2: 0x00007fff8d102769 libsystem_c.dylib`__srefill1 + 24
    frame #3: 0x00007fff8d102884 libsystem_c.dylib`__srget + 14
    frame #4: 0x00007fff8d0fe52b libsystem_c.dylib`getc + 52
    frame #5: 0x00007fff8dd38051 libc++.1.dylib`std::__1::__stdinbuf<char>::__getchar(bool) + 119
    frame #6: 0x00007fff8dd2ee4c libc++.1.dylib`std::__1::basic_istream<char, std::__1::char_traits<char> >::sentry::sentry(std::__1::basic_istream<char, std::__1::char_traits<char> >&, bool) + 200
    frame #7: 0x000000010000089e password`___lldb_unnamed_symbol3$$password + 46
    frame #8: 0x0000000100000737 password`___lldb_unnamed_symbol1$$password + 231
    frame #9: 0x00007fff9a14a5ad libdyld.dylib`start + 1

The backtrace shows two function calls in our password executable. Taking a look at frame 8 shows the same location we used previously to argument about the address of the constant string being loaded:

(lldb) f 8
frame #8: 0x0000000100000737 password`___lldb_unnamed_symbol1$$password + 231
password`___lldb_unnamed_symbol1$$password:
    0x100000737 <+231>: movq   %rax, -0xb0(%rbp)
    0x10000073e <+238>: jmp    0x100000743               ; <+243>
    0x100000743 <+243>: leaq   0x17e9(%rip), %rax        ; "AK9FJ31P"
    0x10000074a <+250>: leaq   -0x88(%rbp), %rcx

Continue to read the second part about “jumping the fence” by employing jumps and bypassing instructions.

Posted with : Patching, x86-64, C++, Python, LLDB, Disassembling