Using Fluentd to Send Logs from Any Cloud to VMware Log Intelligence

This post was co-authored by Chris McClanahan, Sr Technical Marketing Manager and Munishpal Makhija, Technical Staff.

Fluentd is an open source data collector, which lets you unify data collection and consumption for a better use and understanding of data. The process of sending logs from any workload on any cloud or software defined data center (SDDC) to Log Intelligence can seem unclear. Here, some in-house VMware tech experts will discuss how to do so using the ingestion API process, in a step-by-step manner to give maximum clarity to the process. Overall, Fluentd is used alongside the ingestion API that’s built into Log Intelligence, in order to aggregate logs from any workload running on any cloud platform.

There are two methods by which logs can be ingested into Log Intelligence:

1. Using the Remote Data Collector: The Remote Data Collector (RDC) is a lightweight appliance which is deployed into the environment that you point all your logs to for ingestion into Log Intelligence. The RDC compresses and encrypts the log data and sends it to Log Intelligence for analytics processing. For more information about the RDC please see this blog.

2. Using the API Ingestion process: This process uses a RESTful API to ingest log data directly into Log Intelligence.

Read on for a step-by-step guide from Munishpal Makhija, a member of Technical Staff at VMware, with Chris McClanahan, Senior Technical Marketing Manager, on how to successfully complete this process. They will be using Fluentd along with the ingestion API via an Ubuntu machine built in Azure cloud, in order to push Apache access logs to Log Intelligence.

Steps

Deploy an Ubuntu machine in Azure and give it a public IP. I used Ubuntu 16 for this example. Normally you would just have a single Public IP for all your workloads to talk out but for this example I am just assigning a public IP to the specific workload. Make sure to enable HTTP and SSH.

Web App Screenshot

 

For information on deploying a linux machine in Azure see the following article.

Next you will SSH into the Ubuntu machine you deployed and run the following commands to install Fluentd:

  • More information on installing Fluentd can be found here.
  • sudo apt update
    curl -L https://toolbelt.treasuredata.com/sh/install-ubuntu-trusty-td-agent3.sh | sh
    sudo /usr/sbin/td-agent-gem install fluent-plugin-out-http-ext
    sudo mkdir /tmp/log
    sudo chmod -R 777 /tmp/log
    sudo /etc/init.d/td-agent status
Coding Screenshot

 

  • If you see the above message you have successfully installed Fluentd with the HTTP Output plugin.

Next we need to install Apache by running the following command:

  • Sudo apt install apache2
  • sudo chmod -R 645 /var/log/apache2

Now we need to configure the td-agent.conf file located in the /etc/td-agent folder. I am going to go through each of the configuration sections and explain what they do. At the end I will give you an example configuration file for this example.

Filter Section:

  • <filter *.**>
    @type record_transformer
    <record>
    hostname ${hostname}
    </record>
    </filter>

This section is used to add a record to each log message sent to Log Intelligence through Fluentd. In this example I am adding the key value pair of hostname:value. The value of the key pair being the hostname of the machine that sent the log message.

Match Section:

  • <match>
    type http_ext
    endpoint_url         https://data.mgmt.cloud.vmware.com/le-mans/v1/streams/ingestion-pipeline-stream
    http_method                 post
    serializer                        json
    rate_limit_msec           100
    raise_on_error            false
    raise_on_http_failure true
    authentication              none
    use_ssl                            true
    verify_ssl                      false
    <headers>
    Authorization Bearer <Insert Token from Log Intelligence>
    Content-Type application/json
    format syslog
    structure default
    </headers>
    </match>

This section is used to config what Fluentd is going to do with the log messages it receives from the sources. I will explain the sources a little later. This is the standard configuration Log Intelligence will expect. The one tag that needs modification is the Authorization Bearer section. For this you will need to log into your instance of Log Intelligence and create a new API token to use.

API Keys

 

Select New API Key link to create a new API key for you to use.

New API Key

 

Enter a name for the key.

Generate API Key

 

A new key will be created. You will need to copy the key and put it in the match section above in the following location:

  • Authorization Bearer <Insert Token from Log Intelligence>
  • For this example – Authorization Bearer zXnekZxfzQSXeANPEJOM7cOBbcELbqUV

Source Section 1:

  • <source>
    @type http
    @id input_http
    port 8888
    </source>

This first source section simply is telling the Fluentd agent to accept messages sent to it from the http source. We will use this source a little later to test communication between this workload and Log Intelligence.

Source Section 2:

  • <source>
    @type tail
    format apache2
    path /var/log/apache2/access.log
    pos_file /tmp/log/access_log.pos
    tag apache
    </source>

This source section is the configuration that tells the Fluentd log collector to collect the apache access log. You will notice we are using the tail type for this source. This is a standard plugin of the Fluentd client and will read any log file much like using the command tail -f in linux to look at logs in the console.

Full td-agent.conf file:

  • <filter *.**>
    @type record_transformer
    <record>
    hostname ${hostname}
    </record>
    </filter>
  • <match>
    type http_ext
    endpoint_url         https://data.mgmt.cloud.vmware.com/le-mans/v1/streams/ingestion-pipeline-stream
    http_method           post
    serializer           json
    rate_limit_msec       100
    raise_on_error       false
    raise_on_http_failure true
    authentication       none
    use_ssl               true
    verify_ssl           false
    <headers>
    Authorization Bearer <Insert Token from Log Intelligence>
    Content-Type application/json
    format syslog
    structure default
    </headers>
    </match>
  • <source>
    @type http
    @id input_http
    port 8888
    </source>
  • <source>
    @type tail
    format apache2
    path /var/log/apache2/access.log
    pos_file /tmp/log/access_log.pos
    tag apache
    </source>

The above is the full td-agent.conf file configured for this example.

Now that we have the completed configuration of td-agent we just need to restart the agent using the following command:

  • sudo /etc/init.d/td-agent restart

At this point Fluentd is setup on our workload and ready to send logs to Log Intelligence. Let’s test this first through the Fluentd Http Input method. Run the below command to send a message to Fluentd from the localhost:

Now, from Log Intelligence, navigate to the Log Explorer screen. Enter the full text of the message sent in the above curl command (This is a test message from Fluentd HTTP) and select the eyeglass button to execute the query.

Log Intelligence Test Message

 

You should see a single result from the query showing that you are sending data from your Azure virtual machine to Log Intelligence. If you expand the query you will also notice that your filter to add the hostname to the log message has also worked as expected.

Host Name Screenshot

 

Now that we know logs are flowing to Log Intelligence via the Fluentd agent, let’s test that we are getting access logs from apache. Since you installed apache on your virtual machine, and it has a public IP address, you should be able to hit the default Apache page from any browser using http://<Public IP>.

Apache2 Ubuntu Screenshot

 

If you get to this page then Apache is up and running and now we can look for a log entry from the access.log file on our Azure virtual machine.

From Log Intelligence let’s build a simple query that will find a log that came from our Apache server. In Log Explorer clear the query you did previously to test the communication from Fluentd. Now we are going to build a simple query to pull all logs that include the hostname of our Azure virtual machine. In this example our Azure virtual machine hostname is webb-app-ubuntu. Second, we are going ad another filter to this query to look for the text 404. Your query should look similar to the below image:

Search Blogs Screenshot

 

If you were to run this query you would most likey not see any results because we haven’t triggered a 404 from our Apache server. Let’s do that by entering the following in your browser:

  • http://<Public IP>/test

You should see a response in the browser that looks like:

Not Found Screenshot

 

This will have triggered a 404 error in the Apache access log file on your Apache server.

In Log Intelligence you can now rerun the query you created above. This time you will see a single entry that corresponds to the 404 error you triggered in the browser.

Log Access Screenshot

 

Finally, expanding that log entry will show that you are getting the access logs from your Apache server and that Fluentd is appending the hostname to all log entries coming from your Apache server.

Stream Screenshot

 

Summary: 

This was a short example of how easy it can be to use an open source log collector, such as Fluentd, to push logs directly to Log Intelligence using the ingestion API method. You can do this for any type workload, on any cloud, with any application that writes to a log file. I hope you can use this method and find new and creative ways to push logs into Log Intelligence.

Request access to Log Intelligence today and try it for yourself!

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *