As the first GPU-native real-time stream & hyper-batch processing software, Plasma Engine™ leverages the thousands of cores on a GPU to give its users an enormous advantage over CPU-bound software like Apache Spark.
We’re all about speed at FASTDATA.io. After wowing customers with our benchmark in private demos, we thought it was time to publicly reveal the monumental advantages to processing data through Plasma Engine on NVIDIA GPUs.
Because we ported Apache Spark first, you can execute any Spark workload on Plasma Engine orders of magnitude faster, yet without requiring any changes to your existing code written for Apache Spark.
How much faster is Plasma Engine compared to Apache Spark?
While every workload is different, this simple haversine benchmark compares Apache Spark running on the same system as Plasma Engine. Both tests will run the same Spark SQL query.
As demonstrated below, Apache Spark processes 1.6 million rows per second, compared to Plasma Engine processing the same query at 371 million rows per second. That is 230x faster on the same system!
What’s the relevance of the haversine benchmark?
Haversine is a trigonometric function used by global telecoms, the transportation industry, and others to calculate the distance between two GPS coordinates for many use-cases that are relevant to real-time big data processing.
In this demo, we’re playing the role of a global telecom and looking to calculate the cost of every call placed in our network. The data input will be a stream of GPS coordinates (latitude and longitude). We will then use a simple SparkSQL query to filter this stream and only output GPS coordinates that are over a certain threshold distance apart. Longer distance calls may be billed at a higher rate.
We will run this benchmark on a public cloud instance, which has eight vCPUs, 61GB RAM and an NVIDIA GPU, the Telsa V100 with 16GB HBM2 in this case.
Here are the results for Apache Spark, which shows the number of rows processed per second:
Here are the results for Plasma Engine:
*This is the same chart, but the flat yellow line you see at the bottom is Spark.
This chart shows Plasma Engine throughput at 6GB/s (that’s bytes, not bits):
Here are all of the charts together:
The benchmark results speak for themselves: Plasma Engine processed 371 million rows per second, while Apache Spark only processed 1.6 million rows per second.
That is more than two orders of magnitude faster on the same cloud instance.
What can this do for your business?
Processing data 230x faster not only allows your enterprise to increase its existing revenue streams and cut costs; more importantly, it creates opportunities for new lines of revenue previously deemed impossible or uneconomical.
Perhaps the best part for your businesses is that Plasma Engine’s high efficiency will reduce your total data processing costs more than 50%, while massively cutting your power and space requirements in the data center by over 90%.
Finally, a 90% reduction in power usage means a 90% reduction of carbon output into our atmosphere.No matter the industry -- Telecom, Retail, IoT, eCommerce, Finance, AdTech, Security, Energy & Utilities -- the ability to process data two orders of magnitude faster while cutting infrastructure costs has significant implications for every business.
Ready to go fast with Plasma Engine on your on-premise system or favorite cloud instance? It’s easy! Sign up today for a Test Flight POC to try our point-and-click demo or test your own.
Technical Benchmark Parameters
The benchmark runs Plasma Engine v1.1.1 with call data record (CDR) CSV dataset with the following schema:
lat1 & lon1 - location of callerlat2 & lon2 - location of caller
val haversineSchema = new StructType()
Dataset contains single CSV file generated for predefined calls quantity. 1000 symlinks to this file emulates the big streamingCSV file dataset. Plasma Engine uses the same data with the Apache Arrow format. CDR example:
Spark SQL query filters calls where distance between the two is greater than 20000:
select lat1 from stream where (asin(pow((sin((lat2-lat1) / cast(2 as float)) * sin((lat2-lat1) / cast(2 as float)) + cos((lat1))*cos((lat2)) * sin((lon2-lon1) / cast(2 as float)) * sin((lon2-lon1) / cast(2 as float))), cast(0.5 as float))) * cast(12742 as float)) >cast(20000 as float)
- AWS p3.2xlarge is a NVIDIA Tesla V100 GPU, 8 vCPUs (Broadwell Xeon), 61 GB of RAM, 10 Gbps NIC
- 16 GB driver memory
- 8 executors