ARM yourself with KumoMTA

  • September 19, 2023

The title of this piece was the topic of much debate.  There were so many gems to choose from.

"Run KumoMTA without spending an ARM and a leg"
"Use KumoMTA and do no h.ARM"
"ARM wrestling KumoMTA"
"Get a leg up with running on ARM"

If you have a better suggestion, let us know.  Maybe we will award the best suggestion with a free copy of KumoMTA. Nudge nudge wink wink. Just joking, KumoMTA is always free to download and use.

In any case, let's talk about ARM.

One of the things that KumoMTA is very good at is running in current OS environments. While some of our competitors struggle to keep up with new releases for recent OS updates, KumoMTA will run happily on almost any Linux distro.  Rocky 9 is no problem, nor is Ubuntu 22 or RHEL 9, just to name a few. Aside from Linux distro, the other factor to consider is the underlying system architecture.  Where the vast majority of these are Intel X86 or AMD64 based, there is a growing number of requests to support the ARM-based servers that are often much cheaper to operate.  And yes, of course, KumoMTA runs on those too.

Why is this important?

An important factor in selecting an Operating System is understanding the hardware it will be running on, and up until recently, the overwhelming favourite for most server applications was Intel X86 or AMD64  architecture (sometimes referred to as x86/64).  If you play around with embedded controllers or custom hardware, then you may be more familiar with ARM processors, but for the most part, popular commercial servers have used the Intel X86 architecture. Right now, a ground-swell of change is happening where more ARM-based servers are being made available at very competitive rates, compelling System Operators and Network Managers to reconsider deploying servers onto ARM-based architecture. Moving from x86/64 to ARM could save a significant amount of operating budget.

Just looking at the EC2 instance pricing on AWS will give you a rough ballpark of savings. In the images below, the t4g.2xlarge is the ARM instance at $0.2688/hour Linux on-demand pricing. The same size image (8vCPU, 32GB RAM) in the x86/64 format labelled t2.2xlarge is $0.3712/hour. Over 24 hours that will save $2.46. That is $897 per year, per instance on EC2 costs alone representing a 27% savings.  Apply that percentage to what you are paying now for public cloud and quickly realize the impact.   
t4g-2xlarge

t2-2xlarge

But wait, there's more!

We ran some performance tests on the two instances above.  These are effectively the same "size" but with different processors so any performance difference should be due entirely to the processor selection. 

Sending 100,000 100KB messages as a test sample, the x86/64 version ran at a rate of 3.663 million msgs/hour. Doing the same test on the ARM instance ran at 3.804 million msgs/hour.

This is a 4% improvement in performance, which means cheaper, faster MTAs.  

Have we got your attention yet?

Okay, so what is ARM again?

The Intel X86 (and AMD64) processor is so ubiquitous that it is hardly even noticed by many people. Still, if you look closely at a software release, it will always include a reference to the architecture it runs on.  For instance, the Ubuntu 22 OS you might be running is likely named something like "Ubuntu-22.4-x86/64" indicating that it was built to run on an Intel or AMD 64-bit processor. By comparison, the same operating system has an ARM 64 version called something like Ubuntu-22.4-ARM64, which is compiled to run on an ARM processor. 

The big difference is in how the processor handles calculations.  The Intel/AMD64 X86/64 processor is a CISC (Complex Instruction Set Computing) architecture.  It processes one instruction at a time. ARM processors use a RISC (Reduced Instruction Set Computing) Architecture. RISC processors have been around for a long time but are typically used in smaller systems like your smartphone. Recently, they have become more popular, particularly in public cloud computing environments.

So what does this all have to do with KumoMTA?

KumoMTA was built to run efficiently in bare-metal and cloud environments. If you are managing a cluster of servers in public cloud, you might be able to realize significant savings moving from x86/64 to ARM-based servers.  So of course, we built and tested to make sure it works.

One of the goals of the Kumo Corp team is to use advanced technologies to help customers be more efficient.  Reducing server footprint is a key factor in selecting KumoMTA over other less efficient products, but doing that with ARM processors can also reduce power demand which helps everyone.

Building KumoMTA on Ubuntu 22 ARM64

OK, so this is probably why you clicked on this blog post in the first place. Deploying KumoMTA onto an ARM-based server is super easy and is exactly the same way you would build from source on any Linux distro.

First, set up your physical or cloud server with an ARM processor and ARM-compiled Operating System. If you don’t have one and want to try it, Azure has an Ubuntu 22-ARM64 image available to use.

Following the instructions in the documentation, clone the repo and get any required dependencies:

git clone https://github.com/KumoCorp/kumomta.git
cd kumomta
./get-deps.sh

And install rust:

curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y
source ~/.profile
source ~/.cargo/env
rustc -V
Then build KumoMTA and install it 
cargo build --release
assets/build-deb.sh

After that, it is really about tuning your configuration and testing, but make sure you follow the instructions so you don't miss any steps.

If you are open to sharing your experience installing KumoMTA on ARM-based architecture, we would love to hear your story.