STAC Asia - Capturing OPRA with ease
FMADIO had the pleasure of presenting at the STAC Asia event recently, talking about the capture and processing of the most challenging market data feed, OPRA, which carries >170 billion messages per day.
Capturing and processing of any traffic relies on having the right architecture in place. We refer to this as “Capture, Process, Integrate”, and it starts with the hardware to copy data from the wire and aggregate it into the FMADIO 10G, 40G or 100G capture device. The FMADIO appliance must capture and store the traffic in realtime, whilst also accurately timestamping each packet.
The traffic can then be filtered and extracted from the data store based on what processing is required and forwarded to those analytics processes, either locally on the FMADIO 10G, 40G or 100G capture appliance, or on a remote system.
This is all pretty standard for any packet analysis use case; what makes the OPRA use case challenging is the volume of data and the fact that multiple analysis processes are often needed to run simultaneously to meet the needs of different groups.
A description of OPRA, and some recent and projected volume figures can be seen below. The rate of growth in the data volumes on OPRA add to the challenge of building a system that will not only keep up with today’s volumes, but will scale to meet the volumes expected for years to come.
As an example of the way that OPRA data might be captured and processed, the diagram below shows the Capture, Process, Integrate flow for long term storage of OPRA feeds. In this example, the traffic is captured to the FMADIO 10G, 40G or 100G capture appliance, then split into individual feeds using a BPF filter based on multicast address and port number. The traffic for each feed is then split into 1 minute PCAPs, which are named for easy indexing, then compressed using one of a number of supported algorithms, and then sent to a long-term data store over NFS.
The following graphs show the importance of multi-threading this process. In the graph below, we can see the performance of the process using different numbers of CPU cores to process different volumes of multicast groups. The numbers of CPUs are represented by different colours and numbers of multicast groups are shown as buckets along the X axis. On the Y axis, 100% represents real-time processing, anything above that shows that the process is falling behind real-time.
The results are pretty clear, that the processing time follows a fairly linear increase, both based on the number of available CPUs and on the number of multicast groups being processed.
The graph shows the same data, but without the 1 CPU data, giving a more granular view of the results for higher CPU counts. In this case we can see that the performance still increases linearly (or near enough) based on the number of CPUs, but less so for the number of multicast groups. We can also see that only with 48 CPUs, handling 2 groups per CPU, is there enough throughput to handle all 96 groups in better than real-time.
Next we can see what difference is made to the performance when we add compression into the workflow. The figures show performance for 96 multicast groups with different CPU counts for uncompressed against zstd, gzip and lz4 algorithms, showing the relative CPU efficiency of lz4, though this would achieve a lower compression ratio than the other options.
And finally, just to prove that these tests were carried out on a single system, and that the multi-threading works well, here is an htop screenshot from the test appliance.
The testing shown above was carried out on a single FMADIO 100G 1U Gen2 appliance, which has 48 cores/ 96 threads, and is the only single system that we know of that can capture and process the entire OPRA feed in this way. Having said that, we are close to releasing the FMADIO 200G Packet Capture Appliance, which has 200Gbps sustained capture and even bigger CPUs and RAM for even more processing, so as data volumes on OPRA grow, FMADIO’s customers can be confident that they will be able to keep up.
If you’d like to know more about our products, please have a look at our Products page: https://www.fmad.io/products, and if you would like to discuss this or any other use case, please hit the chat window below or email support@fmad.io.