This article was co authored by Bill Shetti and Pierre Tessier.
For a demonstration of the solution discussed in this article, please see this video posted by Boskey Savla: https://youtu.be/B9JXSeVZ8MM
Monitoring is a cornerstone of the process of characterizing, analyzing, and optimizing applications. As an increasing number of applications either move to or are built atop container technologies, the challenges of monitoring these ephemeral workloads become a greater part of the operational burden for application developers and operators. Kubernetes is the focus of this blog due to the metrics which can be extracted from the platform itself, as well as its pervasive presence in the current container landscape.
Kubernetes is an open-source system for automating deployment, scaling, and management of containerized applications that was originally designed by Google and now maintained by the Cloud Native Computing Foundation. Between deployments from the open-source codebase, as well as the commercial offerings based on that code, Kubernetes is becoming a dominant industry platform for containers in production. The items pertaining to extracting and analyzing metrics from the containers themselves may apply to other container ecosystems as well.
How is monitoring Kubernetes different?
Containerized systems such as Kubernetes environments present new monitoring challenges as compared to virtual machine-based, compute environments. These differences include:
- The ephemeral nature of containers
- An increasing density of objects, services, and metrics within a given node
- The focus moves to services, rather than machines
- The consumers of monitoring data have become more diverse
- Changes in the software development lifecycle
To address these factors, the sections below will focus on gathering metrics from two sets of sources. The first of these are the Kubernetes clusters themselves. In this way, information can be gathered which pertains to the arrangement and utilization of resources by the Kubernetes system(s) being monitored. Additionally, metrics gathered from the Kubernetes pods themselves will be discussed as well. These metrics will provide insight into how the services running within and between the individual containers are functioning. Both sets of data are required to gather a complete operational picture of a Kubernetes environment.
The solution proposed here for monitoring Kubernetes is a combination of multiple VMware Cloud services. Wavefront provides the metrics monitoring and alerting platform for both the Kubernetes cluster(s) and the application(s) deployed on them. The VMware Kubernetes Engine™ (VKE) service provides the Kubernetes clusters used to validate this approach.
VMware Kubernetes Engine™
VKE is an enterprise-grade Kubernetes-as-a-Service offering in the VMware Cloud Services portfolio that provides easy to use, secure, cost effective, and fully managed Kubernetes clusters. VKE enables users to run containerized applications without the cost and complexity of implementing and operating Kubernetes.
At the heart of VKE is the VMware Smart Cluster™. The Smart Cluster automates the selection of compute resources to constantly optimize resource usage, provide high availability, and reduce cost. This construct removes the need for educated guesses around cluster definition and sizing for optimal compute resources. It also enables the management of cost-effective, scalable Kubernetes clusters that are optimized for application requirements. These clusters provide built-in high availability with multi-master deployment, routine health checks and self-healing capabilities for Kubernetes clusters. VKE allows users to run applications in a highly available environment without manual infrastructure configuration and maintenance.
What is Wavefront?
Wavefront is a software-as-a-service (SaaS) platform for ingesting, storing, visualizing, and alerting on metrics data. In this context, a metric is a quantitative measure of a defined property at a point in time, which is used to track health or performance. By this definition, all ingested metrics are comprised of a label, a numeric value, and a timestamp. Metrics can be ingested in the Wavefront data format as well as other standard and well-defined formats. Once within the system, metrics can be queried, charted, and used as the basis for alerts.
Gathering Kubernetes Metrics in Wavefront
Wavefront currently includes an integration for ingesting platform level metrics from a Kubernetes cluster and its components. These metrics can be gathered at various levels such as the cluster, node, pod, and individual container. In order to begin gathering these metrics, the first step is to deploy a Wavefront proxy in the Kubernetes cluster. This proxy is deployed as a Kubernetes ReplicationController, which is a construct designed to ensure that a specified number of pod replicas are running at a given time. A YAML file defining this proxy in a Kubernetes environment is defined on Wavefront’s Kubernetes integration setup page, and can be located here. This definition can be used without any alternations on a newly created VKE cluster. After deploying the proxy, a service must be created in order for the proxy container to communicate with the Wavefront platform. The YAML definition can also be located on the integration’s setup page.
Finally, an instance of Heapster must be deployed as well in order to forward metrics from the Kubernetes cluster to the Wavefront proxy deployed previously. Heapster is a cluster-wide aggregator of monitoring and event data. It supports Kubernetes natively and works on all Kubernetes deployments. Heapster runs as a pod in the cluster, similar to how any other Kubernetes application would run. The Heapster YAML file contains a line of particular interest:
This line provides two important options; a chance to define a display name in Wavefront for the Kubernetes cluster being monitored, and the option to include Kubernetes labels as point tags on the metrics themselves. A prefix can also be added to the metrics forwarded by Heapster to aid in further differentiation and granularity when parsing the data in the Wavefront platform. This field could be used to signify a given line of business, project, application, or service. Details on the available deployment options can be found here.
Once this process has been completed, verify that metrics are being ingested by logging into Wavefront and navigating to the ‘Integrations’ page. From there, locate the ‘VMware Kubernetes Engine’ tile and click on it. Once on this screen, select the ‘Metrics’ view and a customer source (the clusterName value from the Wavefront proxy deployment). If the ‘Metrics Ingestion Rate PPS’ is positive, Wavefront is receiving metrics from the Kubernetes cluster.
Ingesting application metrics from containers
Once Kubernetes cluster metrics have begun to appear in Wavefront, the next step will be to gather metrics from the deployed Kubernetes objects. In this example, a sidecar methodology is used to forward application metrics from the Kubernetes containers within a pod to the in-cluster Wavefront proxy and eventually to the Wavefront cloud. A Pod may bound an application composed of multiple co-located containers that need to share resources. A sidecar container is one that enhances or extends the capability of an existing container and is deployed within the same Kubernetes pod. The Pod wraps these containers and their storage resources into a single entity. One of the reasons to use a sidecar type configuration is to avoid having to make any changes to the existing container images currently in use. The sidecar container (Telegraf is the Wavefront default for Kubernetes Integration) becomes a repeatable unit of YAML which can be reused across services with very minimal changes. This is depicted in figure 2 below.
In this case, the Telegraf agent essentially acts as an intermediary between the deployed container (in this example a MySQL instance) and the Wavefront proxy. In the diagram below, the sidecar proxy essentially acts as the ‘Agent’ attached to a Host (Kubernetes pod). In Kubernetes, all containers belonging to the same pod share the same network namespace, so these sidecars will be easily addressable to the services they monitor.
This is demonstrated in the Kubernetes YAML file below. In the YAML snippet, a portion of a configmap is defined and written into the mysql.conf configuration file. This configmap defines a set of metric and event data to collect from the MySQL instance running in this example container.
The linkages and specific input and output parameters required developed for this example will be explored in detail in a second post on this topic. The sidecar container image is then configured to push metrics to the Wavefront proxy at a predictable DNS name within the cluster. This may be achieved by deploying the Wavefront proxy with a specific service name (i.e. wavefront-proxy) and in a distinct Kubernetes cluster namespace, such as ‘monitoring’ or ‘Wavefront’ to ensure consistency across clusters. Standard Kubernetes services are assigned a DNS A record for a name of the form my-svc.my-namespace.svc.cluster.local.
The benefit of using sidecar containers to monitor distributed applications on Kubernetes is that the monitoring configuration for your services will be remain similar to the original application specification, so deployment is simple and sharing the same pod service discovery is straightforward and consistent.
This can be a challenge in Kubernetes environments because with a aggregation collector(s), the deployed containers may change dynamically and unpredictably. Configuration can be difficult, but using this architecture, your application metrics will always be monitored, as demonstrated above in figure 1.
By following the methodology outlined above and utilizing Wavefront integrations for additional services, it is possible to ingest both Kubernetes system level and application level metrics into Wavefront via the same Wavefront proxy. This allows for the use of common cluster naming and prefix tagging to correlate and compare metrics from these categories in order to better understand the operation, health, and performance of applications deployed within a Kubernetes environme