Advanced Application Logging: Aggregation and Forwarding with Fluentd

This article was originally posted here on September 10 2018.

 

With the expansion and growth of the native public cloud, cloud native and devops principles, application logging has gone through a maturation phase. Thanks to open source projects like Logstash and Fluentd, the opportunities to improve logging while maintaining security and operations have improved.

This article guides us through the benefits of using Fluentd as a node and aggregator for an application deployed on Amazon EC2. It also applies to multi-cloud operations and hybrid-cloud deployments. Think about having a decentralized but standardized method for forwarding logs from nodes (application servers) to aggregators (jumpbox, managment server, etc) and ultimately ending up in a central repository. ElasticSearch, Amazon S3, Google StackDriver, Hadoop and VMware Log Intelligence are few examples for centralized log collection.

Fluentd is an open source project with the backing of the Cloud Native Computing Foundation (CNCF). Fluentd is an open source data collector for unified logging layer that allows for unification of data collection and consumption for a better use and understanding of data. For this example; Fluentd, will act as a log collector and aggregator. Fluentd is utilized to maintain security segmentation while forwarding logs (applications and operating system) from nine servers associated with the Fit Cycle Application to four separate locations through a single management/jump box! Rather than cover all of the components of the application, I will provide a high level overview and highlight that Fluentd is setup as follows:

Fluentd Node Configuration

The first step is understanding how each application server or Fluentd Node is configured.

Input Configuration

# Input from Syslog
<source>
  @type syslog
  port 42185
  bind 127.0.0.1
  tag syslog
</source>

Output Configuration

# Log Forwarding and Local Copy

@type forward
send_timeout 60s
recover_wait 10s
hard_timeout 60s

name mgmt1
host 172.100.2.41
port 24224
</server>

@type file
path /tmp/collectedm
</secondary>
</match>

Notice the IP Address listed under the <server> section, this is the local IP address of the management/jumpbox that is the Fluentd aggregator within the VPC construct.

Fluentd Aggregator Configuration

Input Configuration

# Input from local Syslog
<source>
  @type syslog
  port 42185
  bind 127.0.0.1
  tag syslog
</source>
# Input from Nodes
<source>
  @type forward
  port 24224
  bind 0.0.0.0
</source>

Note the input from Syslog locally and forwarded from existing nodes running Fluentd within the VPC.

Output Configuration

The following configuration items is broken down in sections based upon on forwarding location.

Amazon S3v

This is a simple addition to any Fluentd configuration and the documentation can be found here. Below is the output in Amazon S3.

Amazon S3 Overview Amazon S3 Overview

VMware Log Intelligence

In the following section I utilized the Fluentd out-http-ext plugin found on github. It is also listed on the Fluentd plugin page found here. My peers published a blog a few months ago entitled “Using Fluentd to Send Logs from Any Cloud to VMware Log Intelligence” and it is meant to help you get a base understanding of using Fluentd with application servers. I wont go to far into detail related to this forwarder but my configuration forwards logs to separate instances of VMware Log Intelligence.

<store>
    @type http_ext
    endpoint_url          https://data.mgmt.cloud.vmware.com/le-mans/v1/streams/ingestion-pipeline-stream
    http_method           post
    serializer            json
    rate_limit_msec       200
    raise_on_error        true
    raise_on_http_failure true
    authentication        none
    use_ssl               true
    verify_ssl            false
     <headers>
      Authorization Bearer lZXGxe2hURIDXMlPvvryMlAA2aMzNtU8
      Content-Type application/json
      format syslog
      structure default
     </headers>
   </store>
   <store>
    @type http_ext
    endpoint_url          https://data.mgmt.cloud.vmware.com/le-mans/v1/streams/ingestion-pipeline-stream
    http_method           post
    serializer            json
    rate_limit_msec       200
    raise_on_error        true
    raise_on_http_failure true
    authentication        none
    use_ssl               true
    verify_ssl            false
     <headers>
      Authorization Bearer 2sIvyJ76Imh9dlHWYq98ol4CRe2ZC3vU
      Content-Type application/json
      format syslog
      structure default
     </headers>
   </store>

Below is an example of the output as displayed in Log Intelligence:

Local File

The following example provides little to no value in my environment except my own sanity! Notice I am writing to /tmp and because I am a good systems administrator that directory gets cleared each reboot! Check out the Fluentd documentation for additional detail.

<store>
   @type file
   path /tmp/fluentd/local
   compress gzip
   <buffer>
    timekey 1d
    timekey_use_utc true
    timekey_wait 10m
   </buffer>
  </store>

Below is an example of the /tmp directory after the output of logs to file:

Coding < /pre>

Output (Complete) Configuration — Aggregator

 

Fluentd supports the ability of copying logs to multiple locations in one simple process. The configuration example below includes the “copy” output option along with the S3, VMware Log Intelligence and File methods. Read more about the Copy output plugin here.

 

# Output to S3, VMware Log Intelligence (2x) and Local File
<match **>
  <store>
   @type file
   path /tmp/fluentd/local
   compress gzip
   <buffer>
    timekey 1d
    timekey_use_utc true
    timekey_wait 10m
   </buffer>
  </store>
  <store>
   @type s3
    aws_key_id AKIAJGD3JBHWE2IFX65Q
    aws_sec_key FvjlY91mFWfCkbAtMpD301mYZfAdllS3aW8p/LcA
    s3_bucket fit-b-a-us-w1-00-m
    s3_region us-west-1
    path vpc-5726398/logs
     <buffer tag,time>
      @type file
      path /tmp/fluentd/s3
      timekey 3600 # 1 hour partition
      timekey_wait 10m
      timekey_use_utc true # use utc
      chunk_limit_size 256m
     </buffer>
   </store>
   <store>
    @type http_ext
     endpoint_url          https://data.mgmt.cloud.vmware.com/le-mans/v1/streams/ingestion-pipeline-stream
     http_method           post
     serializer            json
     rate_limit_msec       200
     raise_on_error        true
     raise_on_http_failure true
     authentication        none
     use_ssl               true
     verify_ssl            false
      <headers>
       Authorization Bearer lZXGxe2hIMIDXMlPvvryMlFF2aMzNtU8
       Content-Type application/json
       format syslog
       structure default
      </headers>
   </store>
   <store>
    @type http_ext
     endpoint_url          https://data.mgmt.cloud.vmware.com/le-mans/v1/streams/ingestion-pipeline-stream
     http_method           post
     serializer            json
     rate_limit_msec       200
     raise_on_error        true
     raise_on_http_failure true
     authentication        none
     use_ssl               true
     verify_ssl            false
      <headers>
       Authorization Bearer 2sIvyN76Imh9dlHWYqO5ol4LRe2ZC3vU
       Content-Type application/json
       format syslog
       structure default
      </headers>
   </store>
</match>

Fluentd is a powerful open source solution! In the previous example. Fluentd is utilized to maintain security segmentation while forwarding logs (applications and operating system) from nine servers associated with the Fit Cycle App to four separate locations through a single management/jump box!

If you have interest in logging for Kubernetes based applications, take a look at Bill Shetti’s blog found here.

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *