Archive

Archive for January, 2015

Syslog Producer For Apache Kafka

January 16, 2015 Leave a comment

The Go Kafka Client (sponsored by CrowdStrike and developed by Big Data Open Source Security LLC) has had introduced to it a Syslog producer.

There are a few ways to get started with the Syslog Producer

Docker Image, From Source, Vagrant

Checkout the README for more detail https://github.com/stealthly/go_kafka_client/tree/master/syslog

The syslog producer operates in two different modes

Both modes produce to Apache Kafka.

Raw data

In this case you are basically just funneling the data to a Kafka topic without any alterations. The bytes into the server are the bytes written to the Kafka topic.

As a ProtoBuff with metadata

This metadata  is set when the server starts and by the server before sending to Kafka. This ProtBuff ends up being really useful downstream.

The fields for the LogLine ProtoBuf:

  • string line
  • string source
  • repeated tag
  • int64 logtypeid
  • repeated int64 timings

line – This is the log data that is coming into the sys log server. This is unaltered from what was received. It is a required field and set by the syslog server.

source – This has whatever meaning you want to give it. It is meant to be a specific representation of the instance the data is coming from (e.g. i-59a059a8). It is an optional field set by passing in the value on the command line.

tag – This is a structure of key/value pairs. You can add however many key/value pairs you want when starting the server. e.g. (dc=dc1,floor=3,aisle=4,rack2,u=3). It is an optional field set by passing in the value on the command line.

logtypeid – This field (which defaults to 0) can be used to conditionalize the parsing of the line field in your Kafka Consumer. (e.g. 1=rfc5424, 2=rfc5424, 3=gosyslog, 4=etc). It is an optional field set by pass the value on the command line.

timings – This field will be set twice, once when we receive the message and once when we produce to Kafka. The purpose of this field is so that as it flows through the data pipeline (other producers and consumers) the timings can continuously be added to. If your data pipelines are dynamic you can use the tag field when reading/writing to know which timing in the list was for which component. This allows for an end to end latency analysis of the message and the processing and transmission steps at each point in the data pipeline.

You can check out more details about the command line options in the README

/*******************************************
  Joe Stein
  Founder, Principal Consultant
  Big Data Open Source Security LLC
  Twitter: @allthingshadoop
********************************************/