Aggregating Application Logs from Kubernetes Clusters using Fluentd to Log Intelligence

One of the key issues with managing Kubernetes is observability, the ability  of admins and developers to observe multiple data points and data sets from the Kubernetes cluster, allowing them to analyze this data in resolving issues.

Observability in Kubernetes uses cluster and application data from the following sources:

  • Monitoring metrics — Pulling metrics from the cluster, through cAdvisor, metrics server, and/or prometheus, along with application data which can be aggregated across clusters in Wavefront by VMware.
  • Logging data — Whether its cluster logs or application log information like syslog, these data sets are important for analysis.
  • Tracing data — this is generally obtained with tools like zipkin, jaeger, etc. and provide detailed flow information about the application.

In this blog we show you how to aggregate application logging data from containers running on kubernetes into VMware Log Intelligence. 

In particular we will investigate how to configure, build and deploy fluentd daemonset to collect application data and forward to Log Intelligence.

A daemonset as defined in Kubernetes documentation is:

“A DaemonSet ensures that all (or some) Nodes run a copy of a Pod. As nodes are added to the cluster, Pods are added to them. As nodes are removed from the cluster, those Pods are garbage collected. Deleting a DaemonSet will clean up the Pods it created.”

Prerequisites 

The following write up assumes the following:

  • application logs are output to stdout from the containers — a great reference is found here in kubernetes documentation
  • privilege access to install fluentd daemonsets into “kube-system” namespace. 

Privilege access may require different configurations on different platforms:

  • KOPs — open source kubernetes installer and manager — if you are the installer then you will have admin access
  • GKE — turn off the standard fluentd daemonset preinstalled in GKE cluster. Follow the instructions here.
  • VMware Cloud PKS — Ensure you are running privilege clusters

Application logs in Log Intelligence

Once configured and deployed, fluentd properly pulls data from individual containers in pods. These logs can be visualized and analyzed in Log Intelligence.

The following captures show logs from a simple Flask application called api_server running on a pod in a Kubernetes cluster.

Application Data
Application Data from Flask Container on Kubernetes (1)


Application Data from Flask Container on Kubernetes (2)

As the charts above show, Log Intelligence is reading fluentd daemonset output and capturing both stdout, and stderr from the application. 

To get a better appreciation for what is being viewed in Log Intelligence, its useful to view the container logs in Kubernetes.

Here is a sample output (in stdout) of logs from the api_server container:

As you can see these logs were output to stdout, and then picked up by fluentd and properly forwarded to Log Intelligence. The log output is pushed into the Kubernetes cluster and managed by Kubernetes.

As noted in Kubernetes documentation:

“Everything a containerized application writes to stdout and stderr is handled and redirected somewhere by a container engine. For example, the Docker container engine redirects those two streams to a logging driver, which is configured in Kubernetes to write to a file in json format.”

Once the application logs are successfully ingested into VMware Log Intelligence, there are various methods to leverage its features to accelerate your troubleshooting without having to dive deep into each log stream. On the home screen, you can view your Recent Alerts which occur per the Alerts Definitions defined. For example, you may want to be alerted when application logs have an error within the content. See below an example of an alert being triggered over a period of time.


Creating an alert

Building, configuring, and deploying fluentd

Fluentd comes with standard daemonsets. Here are few that can be found in the Fluentd github repository:

  • Elasticsearch
  • Syslog
  • GCS
  • S3
  • etc

What about Log Intelligence? Until an official VMware Log Intelligence daemonset is created, the following instructions will help create a fluentd daemonset using the fluentd syslog daemonset.

Create a docker image with the right configuration

First step is to create a docker image with the right configuration for Log Intelligence. This image will be used in deploying the daemonset.

Started with fluentd syslog alpine container from the following repository:

First step is to clone the entire git repo

git clone https://github.com/fluent/fluentd-kubernetes-daemonset.git

Next in the conf directory for the alpine-syslog image build look for fluent.conf

~/fluentd-kubernetes-daemonset/docker-image/v0.12/alpine-syslog/conf/fluent.conf

Next modify this file as follows:

Building, configuring, and deploying fluentd

Fluentd comes with standard daemonsets. Here are few that can be found in the Fluentd github repository:

  • Elasticsearch
  • Syslog
  • GCS
  • S3
  • etc

What about Log Intelligence? Until an official VMware Log Intelligence daemonset is created, the following instructions will help create a fluentd daemonset using the fluentd syslog daemonset.

Create a docker image with the right configuration

First step is to create a docker image with the right configuration for Log Intelligence. This image will be used in deploying the daemonset.

Started with fluentd syslog alpine container from the following repository:

First step is to clone the entire git repo

git clone https://github.com/fluent/fluentd-kubernetes-daemonset.git

Next in the conf directory for the alpine-syslog image build look for fluent.conf

~/fluentd-kubernetes-daemonset/docker-image/v0.12/alpine-syslog/conf/fluent.conf

Next modify this file as follows:

See in bold where we added the fluent-plugin-out-http-ext.

Run the following command in the same directory as the Dockerfile

docker build -t lint-dset .

Next tag and push the image to your favorite repository

I’ve already created one and its available here:

gcr.io/learning-containers-187204/lint-dset

What we have now done is to build an image that can now take two variables when deploying the Kubernetes daemonset enabling connectivity to Log Intelligence.

Configuring and deploying the Fluentd Daemonset

In the same location that you cloned the fluentd daemonset from github, modify the fluentd-daemonset-syslog.yaml in the following directory

cp ~/fluentd-kubernetes-daemonset/fluentd-daemonset-syslog.yaml fluentd-daemonset-LINT.yaml

Modify the fluend-daemonset-LINT.yaml as follows:

Note the modifications to the yaml file above in bold.

Now simply run the daemonset.

kubectl create -f fluentd-daemonset-LINT.yaml

Ensure the fluentd daemonset is up:

Once the daemonset is up, check in Log Intelligence for your logs.

Click here for more information on Log Intelligence. 

About the Authors

Bahubali Shetti

Director of Public Cloud Solutions at VMware

Bahubali is the Director of Public Cloud Solutions for VMware Cloud Services at VMware. He leads a team of Cloud Architects evangelizing and developing solutions for improving public cloud operations (AWS/Azure/GCP). Bahubali was part of the initial team that developed and launched VMware Cloud Services. Previous to VMware, was Director of Product Management at VCE (now Dell) for Cloud Management Products. Between 2011-2014, Bahubali lead operations at Cumulus Networks, lead AWS cloud operations at several startups, and headed an open source routing software project. Between 2008-2010, Bahubali lead the cloud investment practice at Storm Ventures. He spent 9 years at Cisco in product management and business development. He holds a M.S. in Information Networking from Carnegie Mellon and a B.S. in Electrical Engineering from Rutgers Engineering.

Leave a Reply

Your email address will not be published. Required fields are marked *