This post is about an interesting race condition bug I ran into when working on a small feature improvement for poet a while ago that I thought was worth writing a blog post about.
In particular, I was improving the download-and-execute capability of poet
which, if you couldn’t tell, downloads a file from the internet and executes
it on the target. At the original time of writing, I didn’t know about
tempfile module and since I recently learned about it, I wanted
to integrate it into poet as it would be a significant improvement to
the original implementation. The initial patch looked like this.
1 2 3 4 5 6
This code downloads a file from the internet, writes it to a tempfile on
disk, sets the permissions to executable, executes it in a subprocess.
In testing this code, I observed some puzzling behavior: the file was
never actually getting executed because it was suddenly ceasing to
exist! I noticed though that when I used
subprocess.call() or used
.wait() on the
Popen(), it would work
fine, however I intentionally didn’t want the client to block while the
file executed its arbitrary payload, so I couldn’t use those functions.
The fact that the execution would work when the
Popen call waited for
the process and didn’t work otherwise suggests that there was something
going on between the time it took to execute the child and the time it took
with block to end and delete the file, which is
default behavior. More specifically, the
file must have been deleted at some point before the
exec syscall loaded
the file from disk into memory. Let’s take a look at the implementation of
subprocess.Popen() to see if we can gain some more insight:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102
_execute_child() function is called by the
constructor and implements child process execution. There’s a lot of
code here, but key parts to notice here are the
os.fork() call which
creates the child process, and the relative lengths of the following
if blocks. The check
if self.pid == 0 contains the code for executing
the child process and is significantly more involved than the code
for handling the parent process.
From this, we can deduce that when the
subprocess.Popen() call executes
in my code, after forking, while the child is preparing to call
os.execve, the parent simply returns, and immediately exits the
block. This automatically invokes the
f.close() function which deletes
the temp file. By the time the child calls
os.execve, the file has been
deleted on disk. Oops.
I fixed this by adding the
delete=False argument to the
NamedTemporaryFile constructor to suppress the auto-delete functionality.
Of course this means that the downloaded files will have to be cleaned up
manually, but this allows the client to not block when executing the file
and have the code still be pretty clean.
Main takeaway here: don’t try to
NamedTemporaryFile as the
last statement in the tempfile’s