MoonGen is a high-speed scriptable packet generator. The whole load generator is controlled by a Lua script: all packets that are sent are crafted by a user-provided script. Thanks to the incredibly fast LuaJIT VM and the packet processing library DPDK, it can saturate a 10 GBit Ethernet link with 64 Byte packets while using only a single CPU core. MoonGen can achieve this rate even if each packet is modified by a Lua script. It does not rely on tricks like replaying the same buffer.
MoonGen can also receive packets, e.g. to check which packets are dropped by a system under test. As the reception is also fully under control of the user’s Lua script, it can be used to implement advanced test scripts. E.g. one can use two instances of MoonGen that establish a connection with each other. This setup can be used to benchmark middle-boxes like firewalls.
MoonGen focuses on four main points:
You can have a look at the slides from a recent talk or read a draft paper for a more detailed discussion of MoonGen’s internals.
MoonGen is basically a Lua wrapper around DPDK with utility functions for packet generation. Users write custom scripts for their experiments. It is recommended to make use of hard-coded setup-specific constants in your scripts. The script is the configuration, it is beside the point to write a complicated configuration interface for a script.
The following diagram shows the architecture and how multi-core support is handled.
Execution begins in the _ master task _ that must be defined in the user’s script. This task configures queues and filters on the used NICs and then starts one or more _ slave tasks _ .
Note that Lua does not have any native support for multi threading. MoonGen therefore starts a new and completely independent LuaJIT VM for each thread. The new VMs receive serialized arguments: the function to execute and arguments like the queue to send packets from. Threads only share state through the underlying library.
The example script hello-world.lua shows how this threading model can be used to implement a typical load generation task. It implements a QoS test by sending two different types of packets and measures their throughput and latency. It does so by starting two packet generation tasks: one for the background traffic and one for the prioritized traffic. A third task is used to categorize and count the incoming packets.
Intel commodity NICs like the 82599, X540, and 82580 support time stamping in hardware for both transmitted and received packets. The NICs implement this to support the IEEE 1588 PTP protocol, but this feature can be used to timestamp almost arbitrary UDP packets. The NICs achieve sub-microsecond precision and accuracy.
Precise control of inter-packet gaps is an important feature for reproducible tests. Bad rate control, e.g. generation of undesired micro-bursts, can affect the behavior of a device under test. However, software packet generators are usually bad at controlling the inter-packet gaps.
The following diagram illustrates how a typical software packet generator tries to control the packet rate.
It simply tries to wait for a specified time after sending a packet. Network APIs often abstract NICs in a way that indicates that the API pushes a packet to the NIC, so this technique might seem reasonable. However, NICs do not work that way. Sending a packet to the API merely places it in a queue in the main memory. It is now up to the NIC (which may or may not be notified by the API about the new packet immediately) to fetch and transmit the packet asynchronously at a convenient time.
This means that trying to push packets to a NIC is futile. This is especially important at rates above 1 GBit/s where nanosecond-level precision is required (length of a minimal sized packet at 10 GBit/s: 67.2 nanoseconds). Sending a single packet requires at least two round trips across the PCIe bus: One to notify the NIC about the updated queue, one for the NIC to fetch the packet. Each PCIe operation introduces latencies and jitter in the nanosecond-range.
Another problem with this approach is that the queues, and therefore batch processing, cannot be used. However, batch processing is an important technique to achieve line rate at high packet rates.
MoonGen therefore implements two ways to prevent this problem.
Intel 10 GbE NICs (82599 and X540) support rate control in hardware. This can be used to generate CBR or bursty traffic with precise inter-departure times.
The hardware supports only CBR traffic. Other traffic patterns, especially a Poisson distribution, are desirable.
The problem that software rate control faces is that it needs to generate an ’empty space’ on the wire. We circumvent this problem by sending bad packets in the space between packets instead of trying to send nothing. The following diagram illustrates this concept.
A bad packet is a packet that is not accepted by the DuT (device under test) and filtered in hardware before it reaches the software. These packets are shaded in the figure above.MoonGen currently use packets with an invalid CRC and an invalid length if necessary. All common NICs drop such packets immediately in hardware as further processing of a corrupted packet is pointless. This does not affect the running software.
If the DuT’s NIC does not do this or if a hardware device is to be tested, then a switch can be used to remove these packets from the stream to generate ‘real’ space on the wire. The effects of the switch on the packet spacing needs to be analyzed carefully, e.g. with MoonGen’s inter-arrival.lua example script.
MoonGen uses LuaDoc. However, MoonGen build system does not yet auto-publish the generated documentation.
Note: You can also use the script bind-interfaces.sh
to bind all currently unused NICs (no routing table entry in the system) to DPDK/MoonGen. build.sh
calls this script automatically. Use deps/dpdk/tools/dpdk_nic_bind.py
to unbind NICs from the DPDK driver.
MoonGen comes with examples in the examples folder which can be used as a basis for custom scripts.
./build/MoonGen ./examples/hello-world.lua 0 0
The two command line arguments are the transmission and reception ports. MoonGen prints all available ports on startup, so adjust this if necessary.
Basic functionality is available on all NICs supported by DPDK . Hardware timestamping is currently supported and tested on Intel 82599, X540 and 82580 chips. Hardware rate control is supported and tested on Intel 82599 and X540 chips.
SnabbSwitch is a framework for packet-processing in Lua. There are a few important differences:
MoonGen team decided for DPDK as back end for the following reasons:
Note that this might change. Using DPDK also comes with disadvantages like its bloated build system and configuration.
Source && Download