Exploring Ecosystem Partners for VMware Cloud on AWS: KernelCare


VMware Cloud on AWS helps enterprises deploy hybrid cloud service stacks and data centers. Its familiar vSphere operating environment makes it easy to migrate mission-critical services and applications. Building them on Linux yields versatility and a high ROI, but security and availability are harder to get. That’s because the Linux kernel needs constant patching, and that means downtime and waiting for reboot cycles. Live patching is a way of updating Linux kernels without stopping and restarting them. This article explains what that is, how KernelCare does it, and how the technology fits neatly into VMware Cloud on AWS operating ethic.

Enterprises seeking lower costs and elastic deployments turn to VMware’s hybrid cloud solution VMware Cloud on AWS. VMware makes it easy to develop, scale, and move entire applications to and from the cloud, looking after much of the heavy-lifting DevOps needed to do, to guarantee the smooth running of your service stack.

But VMware’s responsibilities end where the ESXi VMs running your applications begin. With Linux VMs, uptime gets degraded by rebooting, necessary to install kernel updates and patches. Clustering and high-availability configurations can make VM restarts invisible to service users. But, increasingly, enterprises looking to shave costs question the CapEx spent on redundant fail-over resources, and the OpEx costs of administering complex system topologies.

This is where KernelCare’s live patching technology comes into play.

Live Patching

Live patching is a way of updating a Linux kernel without stopping the environment in which it’s running, be it bare-metal or virtualized. MIT students developed the technique in 2008, going on to found Ksplice Inc., later bought by Oracle. Live patching emerged from the need to close the vulnerability gap, the time from when a kernel patch is available to when it’s installed.

To see why the gap exists, let’s look at some statistics.

  • In 2008, the Linux kernel was around 10 million lines of code. Today, it’s well over 24 million.
  • Between January 2016 and December 2018, researchers found 847 vulnerabilities in the Linux kernel.

Changes to kernel code means frequent releases and updates. Every update results in downtime, because systems must be rebooted to load the new code into memory. The standard way to update a Linux kernel is with package management tools, such as yum or apt-get, followed by a restart of the OS. KernelCare patches live running kernel code at the binary level, in memory, without power-cycling. Users logged into the system see nothing; for them, the kernel never stops. System processes wait milliseconds while the kernel module suspends and restarts processor threads.

How it works

KernelCare has three components:

  • Patch server: Stores patches for each kernel version and architecture type. It can be the central KernelCare server, or an on-premises one.
  • Agent: A small program that runs in the background, checking patch servers for patches at configurable intervals.
  • Kernel module: Performs the patching when instructed by the agent, handling the logic and mechanics of pausing and restarting a kernel’s processes.

Together, these orchestrate the patching process. These four key steps happen in milliseconds:

  1. Poll: The agent polls the patch server. When one is available, it is downloaded and passed to the kernel module.
  2. Freeze: The module takes over, suspending processing (except its own) to make an inventory of active processor threads.
  3. Load: The new binary code is loaded into privileged kernel space and affected entries in the thread inventory are remapped.
  4. Resume: The kernel restarts and processing continues.

A team of kernel programmers respond to announcements of kernel vulnerabilities, and write, test and release patches for them in the shortest time possible. In many cases this is much quicker than the main enterprise Linux vendors. This is because KernelCare’s sole focus is kernel security, and none of its other functionalities or kernel ABIs (Application Binary Interfaces).

KernelCare patches are cumulative binary packages, custom-built for each supported kernel version, and GPG-signed for security. Patch updates are fully auditable, and can be selectively pre-tested, approved, or abandoned and rolled back.

The KernelCare Difference

There are two ways to build live kernel patches: incremental and monolithic.

Incremental patches build on one another, each altering the one before. Patch stacking lets vendors release small, targeted fixes, quickly. But, over time, they can make a system unstable so that a full kernel upgrade becomes necessary. You must keep track of the patches you’ve installed, and in what order. If you don’t, you risk your system’s behavior changing in unpredictable ways.

At KernelCare, we build patches using a monolithic approach. Monolithic patches are stand-alone units that don’t depend on previous ones. Each new release replaces rather than adds to the existing patch. It means a more stable platform with longer server uptimes, often in the thousands-of-days range.

Installing KernelCare

Full documentation is here.

  1. Go to kernelcare.com/trial to get a free 30-day key.
  2. In a terminal, enter this command to download the installer script and run it.
    sudo wget -qq -O – https://kernelcare.com/installer | bash
  3. Register, using the key from step 1:
    sudo /usr/bin/kcarectl –register < key >

Once installed, the KernelCare agent will check for patches every four hours, by default. If there are any, it will download and install them, keeping your kernel safe without impacting service availability. There is nothing else to do, but if you need it, a command-line tool, kcarectl, lets you check the status and do manual operations. Here are some examples.


kcarectl –info # check status
kcarectl –patch-info # show all patch information
sudo kcarectl –auto-update # force instant refresh
sudo kcarectl –unregister # deregister license

KernelCare requires an HTTPS connection to two servers. If your servers can access the Internet, you can get patches from the KernelCare patch server, even with NAT. If a direct internet connection is not available a proxy can be used by setting these environment variables for the user account running KernelCare:


export http_proxy=http://proxy.domain.com:port
export https_proxy=http://proxy.domain.com:port

Summary

Hybrid cloud-based infrastructures make enterprises more agile and reactive. VMware Cloud on AWS takes away much of the overhead of deploying and maintaining enterprise-class data centers and application stacks.

Hundreds, perhaps thousands, of virtual Linux instances power the services that your business depends on. The Linux kernel is always changing, advancing in functionality and performance, hardening in security and stability. Updating it means restarts of all nodes running Linux.

Many applications can be updated without shutting them down. Until live patching was invented, the Linux kernel was the same. Now, with KernelCare, system administrators can avoid restarting instances when their Linux kernels need security patches. KernelCare offers a unified approach to routine kernel patch management, one that works the same with on-premises ESXi instances as it does under VMware Cloud on AWS.

Additional Resources