Heka is an open source stream processing software system developed by Mozilla. Heka is a “Swiss Army Knife” type tool for data processing, useful for a wide variety of different tasks, such as:
The following resources are available to those who would like to ask questions, report problems, or learn more:
Heka is a heavily plugin based system. There are five different types of Heka plugins:
Input plugins acquire data from the outside world and inject it into the Heka pipeline. They can do this by reading files from a file system, actively making network connections to acquire data from remote servers, listening on a network socket for external actors to push data in, launching processes on the local system to gather arbitrary data, or any other mechanism. They must be written in Go.
Decoder plugins convert data that comes in through the Input plugins to Heka’s internal Message data structure. Typically decoders are responsible for any parsing, deserializing, or extracting of structure from unstructured data that needs to happen. They can be written entirely in Go, or the core logic can be written in sandboxed Lua code.
Filter plugins are Heka’s processing engines. They are configured to receive messages matching certain specific characteristics (using Heka’s Message Matcher Syntax) and are able to perform arbitrary monitoring, aggregation, and/or processing of the data. Filters are also able to generate new messages that can be reinjected into the Heka pipeline, such as summary messages containing aggregate data, notification messages in cases where suspicious anomalies are detected, or circular buffer data messages that will show up as real time graphs in Heka’s dashboard. Filters can be written entirely in Go, or the core logic can be written in sandboxed Lua code. It is also possible to configure Heka to allow Lua filters to be dynamically injected into a running Heka instance with needing to reconfigure or restart the Heka process, nor even to have shell access to the server on which Heka is running.
Encoder plugins are the inverse of Decoders. They generate arbitrary byte streams using data extracted from Heka Message structs. Encoders are embedded within Output plugins; Encoders handle the serialization, Outputs handle the details of interacting with the outside world. They can be written entirely in Go, or the core logic can be written in sandboxed Lua code.
Output plugins send data that has been serialized by an Encoder to some external destination. They handle all of the details of interacting with the network, filesystem, or any other outside resource. They are, like Filters, configured using Heka’s Message Matcher Syntax so they will only receive and deliver messages matching certain characteristics. They must be written in Go.
Information about developing plugins in Go can be found in the Extending Heka section. Details about using Lua sandboxes for Decoder, Filter, and Encoder plugins can be found in the Sandbox section.
The core of the Heka system is the hekad daemon. A single hekad process can be configured with any number of plugins, simultaneously performing a variety of data gathering, processing, and shipping tasks. Details on how to configure a hekad daemon are in the Configuring hekad section.
Contents:
hekad releases are available on the Github project releases page. Binaries are available for Linux and OSX, with packages for Debian and RPM based distributions.
hekad requires a Go work environment to be setup for the binary to be built; this task is automated by the build process. The build script will override the Go environment for the shell window it is executed in. This creates an isolated environment that is intended specifically for building and developing Heka. The build script should be be run every time a new shell is opened for Heka development to ensure the correct dependencies are found and being used. To create a working hekad binary for your platform you’ll need to install some prerequisites. Many of these are standard on modern Unix distributions and all are available for installation on Windows systems.
Prerequisites (all systems):
Prerequisites (Unix):
Prerequisites (Windows):
Check out the heka repository:
git clone https://github.com/mozilla-services/heka
Run build in the heka directory
cd heka source build.sh # Unix (or `. build.sh`; must be sourced to properly setup the environment) build.bat # Windows
You will now have a hekad binary in the build/heka/bin directory.
(Optional) Run the tests to ensure a functioning hekad.
ctest # All, see note # Or use the makefile target make test # Unix mingw32-make test # Windows
Note
In addition to the standard test build target, ctest can be called directly providing much greater control over the tests being run and the generated output (see ctest –help). i.e., ‘ctest -R pi’ will only run the pipeline unit test.
There are two build customization options that can be specified during the cmake generation process.
For example: to enable the benchmark tests in addition to the standard unit tests type ‘cmake -DBENCHMARK=true ..’ in the build directory.
It is possible to extend hekad by writing input, decoder, filter, or output plugins in Go (see Extending Heka). Because Go only supports static linking of Go code, your plugins must be included with and registered into Heka at compile time. The build process supports this through the use of an optional cmake file {heka root}/cmake/plugin_loader.cmake. A cmake function has been provided add_external_plugin taking the repository type (git, svn, or hg), repository URL, the repository tag to fetch, and an optional list of sub-packages to be initialized.
add_external_plugin(git https://github.com/mozilla-services/heka-mozsvc-plugins 6fe574dbd32a21f5d5583608a9d2339925edd2a7) add_external_plugin(git https://github.com/example/path <tag> util filepath) add_external_plugin(git https://github.com/bellycard/heka-sns-input :local) # The ':local' tag is a special case, it copies {heka root}/externals/{plugin_name} into the Go # work environment every time `make` is run. When local development is complete, and the source # is checked in, the value can simply be changed to the correct tag to make it 'live'. # i.e. {heka root}/externals/heka-sns-input -> {heka root}/build/heka/src/github.com/bellycard/heka-sns-input
The preceeding entry clones the heka-mozsvc-plugins git repository into the Go work environment, checks out SHA 6fe574dbd32a21f5d5583608a9d2339925edd2a7, and imports the package into hekad when make is run. By adding an init() function in your package you can make calls into pipeline.RegisterPlugin to register your plugins with Heka’s configuration system.
Installing packages on a system is generally the easiest way to deploy hekad. These packages can be easily created after following the above From Source directions:
1. Run cpack to build the appropriate package(s) for the current system:
cpack # All # Or use the makefile target make package # Unix (no deb, see below) make deb # Unix (if dpkg is available see below) mingw32-make package # Windows
The packages will be created in the build directory.
Note
You will need rpmbuild installed to build the rpms.
See also
Note
For file name convention reasons, deb packages won’t be created by running cpack or make package, even on a Unix machine w/ dpkg installed. Instead, running source build.sh on such a machine will generate a Makefile with a separate ‘deb’ target, so you can run make deb to generate the appropriate deb package.
A brand new Heka installation is something of a blank canvas, full of promise but not actually interesting on its own. One of the challenges with a highly flexible tool like Heka is that newcomers can easily become overwhelmed by the wide assortment of features and options, making it difficult to understand exactly how to begin. This document will try to address this issue by taking readers through the process of configuring a hekad installation that demonstrates a number of Heka’s common use cases, hopefully providing enough context that users will be able to then adjust and extend the given examples to meet their own particular needs.
When we’re done our configuration will have Heka performing the following tasks:
But before we dig in to that, let’s make sure everything is working by trying out a very simple setup.
One of the simplest Heka configurations possible is one that loads a single file from the local file system and then outputs the contents of that file to stdout. The following is an example of such a configuration:
[LogstreamerInput]
log_directory = "/var/log"
file_match = 'auth\.log'
[PayloadEncoder]
append_newlines = false
[LogOutput]
message_matcher = "TRUE"
encoder = "PayloadEncoder"
Heka is configured via one or more TOML format configuration files, each of which is comprised of one or more sections. The configuration above consists of three sections, the first of which specifies a LogstreamerInput, Heka’s primary mechanism for loading files from the local file system. This one is loading /var/log/auth.log, but you can change this to load any other file by editing the log_directory setting to point to the folder where the file lives and the file_match setting to a regular expression that uniquely matches the filename. Note the single quotes (‘auth\.log’) around the regular expression; this is TOML’s way of specifying a raw string, which means we don’t need to escape the regular expression’s backslashes like we would with a regular string enclosed by double quotes (“auth\\.log”).
In most real world cases a LogstreamerInput would include a decoder setting, which would parse the contents of the file to extract data from the text format and map them onto a Heka message schema. In this case, however, we stick with the default behavior, where Heka creates a new message for each line in the log file, storing the text of the log line as the payload of the Heka message.
The next two sections tell Heka what to do with the messages that the LogstreamerInput is generating. The LogOutput simply writes data out to the Heka process’s stdout. We set message_matcher = “TRUE” to specify that this output should capture every single message that flows through the Heka pipeline. The encoder setting tells Heka to use the PayloadEncoder that we’ve configured, which extracts the payload from each captured message and uses that as the raw data that the output will send.
To see whether or not you have a functional Heka system, you can create a file called sanity_check.toml and paste in the above configuration, adjusting the LogstreamerInput’s settings to point to another file if necessary. Then you can run Heka using hekad -config=/path/to/sanity_check.toml, and you should see the contents of the log file printed out to the console. If any new lines are written to the log file that you’re loading, Heka will notice and will write them out to stdout in real time.
Note that the LogstreamerInput keeps track of how far it has gotten in a particular file, so if you stop Heka using ctrl-c and then restart it you will not see the same data. Heka stores the current location in a “seekjournal” file, at /var/cache/hekad/logstreamer/LogstreamerInput by default. If you delete this file and then restart Heka you should see it load the entire file from the beginning again.
Congratulations! You’ve now successfully run Heka with a full, working configuration. But clearly there are much simpler tools to use if all you want to do is write the contents of a log file out to stdout. Now that we’ve got an initial success under our belt, let’s take a deeper dive into a much more complex Heka configuration that actually handles multiple real world use cases.
As mentioned above, Heka is configured using TOML configuration files. Most sections of the TOML configuration contain information relevant to one of Heka’s plugins, but there is one section entitled hekad which allows you to tweak a number of Heka’s global configuration options. In many cases the defaults for most of these options will suffice, and your configuration won’t need a hekad section at all. A few of the options are worth looking at here, however:
This setting corresponds to Go’s GOMAXPROCS environment variable. It specifies how many CPU cores the hekad process will be allowed to use. The best choice for this setting depends on a number of factors such as the volume of data Heka will be processing, the number of cores on the machine on which Heka is running, and what other tasks the machine will be performing. For dedicated Heka aggregator machines, this should usually be equal to the number of cpu cores available, or perhaps number of cores minus one, while for Heka processes running on otherwise busy boxes one or two is probably a better choice.
In addition to the location of the configuration files, there are two directories that are important to a running hekad process. The first of these is called the base_dir, which is a working directory where Heka will be storing information crucial to its functioning, such as seekjournal files to track current location in a log stream, or sandbox filter aggregation data that is meant to survive between Heka restarts. It is of course important that the user under which the hekad process is running has write access to the base_dir.
The second directory important to Heka’s functioning is called the share_dir. This is a place where Heka expects to find certain static resources that it needs, such as the HTML/javascript source code used by the dashboard output, or the source code to various Lua based plugins. The user owning the hekad process requires read access to this folder, but should not have write access.
It’s worth noting that while Heka defaults to expecting to find certain resources in the base_dir and/or the share_dir folders, it is nearly always possible to override the location of a particular resource on a case by case basis in the plugin configuration. For instance, the filename option in a SandboxFilter specifies the filesystem path to the Lua source code for that filter. If it is specified as a relative path, the path will be computed relative to the share_dir. If it is specified as an absolute path, the absolute path will be honored.
For our example, we’re going to keep the defaults for most global options, but we’ll bump the maxprocs setting from 1 to 2 so we can get at least some parallel behavior:
[hekad]
maxprocs = 2
Once we’ve got Heka’s global settings configured, we’re ready to start on the plugins. The first thing we’ll tackle is getting Heka set up to accept data from statsd clients. This involves two different plugins, a StatsdInput that accepts network connections and parses the received stats data, and a StatAccumInput that will accept the data gathered by the StatsdInput, perform the necessary aggregation, and periodically generate ‘statmetric’ messages containing the aggregated data.
The configuration for these plugins is quite simple:
[StatsdInput]
[StatAccumInput]
ticker_interval = 1
emit_in_fields = true
These two TOML sections tell Heka that it should include a StatsdInput and a StatAccumInput. The StatsdInput uses the default value for every configuration setting, while the StatAccumInput overrides the defaults for two of its settings. The ticker_interval = 1 setting means that the statmetric messages will be generated once every second instead of the default of once every five seconds, while the emit_in_fields = true setting means that the aggregated stats data will be embedded in the dynamic fields of the generated statmetric messages, in addition to the default of embedding the graphite text format in the message payload.
This probably seems pretty straightforward, but there are actually some subtleties hidden in there that are important to point out. First, it’s not immediately obvious, but there is an explicit connection between the two plugins. The StatsdInput has a stat_accum_name setting, which we didn’t need to set because it defaults to ‘StatAccumInput’. The following configuration is exactly equivalent:
[StatsdInput]
stat_accum_name = "StatAccumInput"
[StatAccumInput]
ticker_interval = 1
emit_in_fields = true
The next subtlety to note is that we’ve used a common piece of Heka config shorthand by embedding both the name and the type in the TOML section header. Heka lets you do this as a convenience if you don’t need to use a name that is separate from the type. This doesn’t have to be the case, it’s possible to give a plugin a different name, expressing the type inside the TOML section instead of in its header:
[statsd_input]
type = "StatsdInput"
stat_accum_name = "stat_accumulator"
[stat_accumulator]
type = "StatAccumInput"
ticker_interval = 1
emit_in_fields = true
The config above is ever so slightly different from the original two, because our plugins now have different name identifiers, but functionally the behavior is identical to the prior versions. Being able to separate a plugin name from its type is important in cases where you want more than one instance of the same plugin type. For instance, you’d use the following configuration if you wanted to have a second StatsdInput listening on port 8126 in addition to the default on port 8125:
[statsd_input_8125]
type = "StatsdInput"
stat_accum_name = "stat_accumulator"
[statsd_input_8126]
type = "StatsdInput"
stat_accum_name = "stat_accumulator"
address = "127.0.0.1:8126"
[stat_accumulator]
type = "StatAccumInput"
ticker_interval = 1
emit_in_fields = true
We don’t need two StatsdInputs for our example, however, so for simplicity we’ll go with the most concise configuration.
Collecting stats alone doesn’t actually provide much value, we want to be able to actually see the data that has been gathered. Statsd servers are typically used to aggregate incoming statistics and then periodically deliver the totals to an upstream time series database, usually Graphite, although InfluxDB is rapidly growing in popularity. For Heka to replace a standalone statsd server it needs to be able to do the same.
To understand how this will work, we need to step back a bit to look at how Heka handles message routing. First, data enters the Heka pipeline through an input plugin. Then it needs to be converted from its original raw format into a message object that Heka knows how to work with. Usually this is done with a decoder plugin, although in the statsd example above instead the StatAccumInput itself is periodically generating statmetric messages.
After the data has been marshaled into one (or more) message(s), the message is handed to Heka’s internal message router. The message router will then iterate through all of the registered filter and output plugins to see which ones would like to process the message. Each filter and output provides a message matcher to specify which messages it would like to receive. The router hands each message to each message matcher, and if there’s a match then the matcher in turn hands the message to the plugin.
To return to our example, we’ll start by setting up a CarbonOutput plugin that knows how to deliver messages to an upstream Graphite Carbon server. We’ll configure it to receive the statmetric messages generated by the StatAccumInput:
[CarbonOutput]
message_matcher = "Type == 'heka.statmetric'"
address = "mycarbonserver.example.com:2003"
protocol = "udp"
Any messages that pass through the router with a Type field equal to heka.statmetric (which is what the StatAccumOutput emits by default) will be handed to this output, which will in turn deliver it over UDP to the specified carbon server address. This is simple, but it’s a fundamental concept. Nearly all communication within Heka happens using Heka message objects being passed through the message router and being matched against the registered matchers.
Okay, so that gets us talking to Graphite. What about InfluxDB? InfluxDB has an extension that allows it to support the graphite format, so we could use that and just set up a second CarbonOutput:
[carbon]
type = "CarbonOutput"
message_matcher = "Type == 'heka.statmetric'"
address = "mycarbonserver.example.com:2003"
protocol = "udp"
[influx]
type = "CarbonOutput"
message_matcher = "Type == 'heka.statmetric'"
address = "myinfluxserver.example.com:2003"
protocol = "udp"
A couple of things to note here. First, don’t get confused by the type = “CarbonOutput”, which is specifying the type of the plugin we are configuring, and the “Type” in message_matcher = “Type == ‘heka.statmetric’”, which is referring to the Type field of the messages that are passing through the Heka router. They’re both called “type”, but other than that they are unrelated.
Second, you’ll see that it’s fine to have more than one output (and/or filter, for that matter) plugin with identical message_matcher settings. The router doesn’t care, it will happily give the same message to both of them, and any others that happen to match.
This will work, but it’d be nice to just use the InfluxDB native HTTP API. For this, we can instead use our handy HttpOutput:
[CarbonOutput]
message_matcher = "Type == 'heka.statmetric'"
address = "mycarbonserver.example.com:2003"
protocol = "udp"
[statmetric_influx_encoder]
type = "SandboxEncoder"
filename = "lua_encoders/statmetric_influx.lua"
[influx]
type = "HttpOutput"
message_matcher = "Type == 'heka.statmetric'"
address = "http://myinfluxserver.example.com:8086/db/stats/series"
encoder = "statmetric_influx_encoder"
username = "influx_username"
password = "influx_password"
The HttpOutput configuration above will also capture statmetric messages, and will then deliver the data over HTTP to the specified address where InfluxDB is listening. But wait! what’s all that statmetric-influx-encoder stuff? I’m glad you asked...
We’ve already briefly mentioned how, on the way in, raw data needs to be converted into a standard message format that Heka’s router, filters, and outputs are able to process. Similarly, on the way out, data must be extracted from the standard message format and serialized into whatever format is required by the destination. This is typically achieved through the use of encoder plugins, which take Heka messages as input and generate as output raw bytes that an output plugin can send over the wire. The CarbonOutput doesn’t specify an encoder because it assumes that the Graphite data will be in the message payload, where the StatAccumInput puts it, but most outputs need an encoder to be specified so they know how to generate their data stream from the messages that are received.
In the InfluxDB example above, you can see that we’ve defined a statmetric_influx_encoder, of type SandboxEncoder. A “Sandbox” plugin is one where the core logic of the plugin is implemented in Lua and is run in a protected sandbox. Heka has support for SandboxDecoder, SandboxFilter, and SandboxEncoder plugins. In this instance, we’re using a SandboxEncoder implementation provided by Heka that knows how to extract data from the fields of a heka.statmetric message and use that data to generate JSON in a format that will be understood by InfluxDB (see StatMetric InfluxDB Encoder).
This separation of concerns between encoder and output plugins allows for a great deal of flexibility. It’s easy to write your own SandboxEncoder plugins to generate any format needed, allowing the same HttpOutput implementation can be used for multiple HTTP-based back ends, rather than needing a separate output plugin for each service. Also, the same encoder can be used with different outputs. If, for instance, we wanted to write the InfluxDB formatted data to a file system file for later processing, we could use the statmetric_influx encoder with a FileOutput to do so.
While both Graphite and InfluxDB provide mechanisms for displaying graphs of the stats data they receive, Heka is also able to provide graphs of this data directly. These graphs will be updated in real time, as the data is flowing through Heka, without the latency of the data store driven graphs. The following config snippet shows how this is done:
[stat_graph]
type = "SandboxFilter"
filename = "lua_filters/stat_graph.lua"
ticker_interval = 1
preserve_data = true
message_matcher = "Type == 'heka.statmetric'"
[stat_graph.config]
num_rows = 300
secs_per_row = 1
stats = "stats.counters.000000.count stats.counters.000001.count stats.counters.000002.count"
stat_labels = "counter_0 counter_1 counter_2"
preservation_version = 0
[DashboardOutput]
ticker_interval = 1
There’s a lot going on in just a short bit of configuration here, so let’s consider it one piece at a time to understand what’s happening. First, we’ve got a stat_graph config section, which is telling Heka to start up a SandboxFilter plugin, a filter plugin with the processing code implemented in Lua. The filename option points to a filter implementation that ships with Heka. This filter implementation knows how to extract data from statmetric messages and store that data in a circular buffer data structure. The preserve_data option tells Heka that the all global data in this filter (the circular buffer data, in this case) should be flushed out to disk if Heka is shut down, so it can be reloaded again when Heka is restarted. And the ticker_interval option is specifying that our filter will be emitting an output message containing the cbuf data back into the router once every second. This message can then be consumed by other filters and/or outputs, such as our DashboardOutput which will use it to generate graphs (see next section).
After that we have a stat_graph.config section. This isn’t specifying a new plugin, this is nested configuration, a subsection of the outer stat_graph section. (Note that the section nesting is specified by the use of the stat_graph. prefix in the section name; the indentation helps readability, but has no impact on the semantics of the configuration.) The stat-graph section configures the SandboxFilter and tells it what Lua source code to use, the stat_graph.config section is passed in to the Lua source code for further customization of the filter’s behavior.
So what is contained in this nested configuration? The first two options, num_rows and secs_per_row, are configuring the circular buffer data structure that the filter will use to store the stats data. It can be helpful to think of circular buffer data structures as a spreadsheet. Our spreadsheet will have 300 rows, and each row will represent one second of accumulated data, so at any given time we will be holding five minutes worth of stats data in our filter. The next two options, stats and stat_labels, tell Heka which statistics we want to graph and provide shorter labels for use in the graph legend. Finally the preservation_version section allows us to version our data structures. This is needed because our data structures might change. If you let this filter run for a while, gathering data, and then shut down Heka, the 300 rows of circular buffer data will be written to disk. If you then change the num_rows setting and try to restart Heka the filter will fail to start, because the 300 row size of the preserved data won’t match the new size that you’ve specified. In this case you would increment the preservation_version value from 0 to 1, which will tell Heka that the preserved data is no longer valid and the data structures should be created anew.
At this point it’s useful to notice that, while the SandboxFilter gathers the data that we’re interested in and packages it up an a format that’s useful for graphing, it doesn’t actually do any graphing. Instead, it periodically creates a message of type heka.sandbox-output, containing the current circular buffer data, and injects that message back into Heka’s message router. This is where the DashboardOutput that we’ve configured comes in.
Heka’s DashboardOutput is configured by default to listen for heka.sandbox-output messages (along with a few other message types, which we’ll ignore for now). When it receives a sandbox output message, it will examine the contents of the message, and if the message contains circular buffer data it will automatically generate a real time graph of that data.
By default, the dashboard UI is available by pointing a web browser at port 4352 of the machine where Heka is running. The first page you’ll see is the Health report, which provides an overview of the plugins that are configured, along with some information about how messages are flowing through the Heka pipeline:
... and scrolling further down the page ...
In the page header is a Sandboxes link, which will take you to a listing of all of the running SandboxFilter plugins, along with a list of the outputs they emit. Clicking on this we can see our stat_graph filter and the Stats circular buffer (“CBUF”) output:
If you click on the filter name stat_graph, you’ll see a page showing detailed information about the performance of that plugin, including how many messages have been processed, the average amount of time a message matcher takes to match a message, the average amount of time spent processing a message, and more:
Finally, clicking on the Stats link will take us to the actual rendered output, a line graph that updates in real time, showing the values of the specific counter stats that we have specified in our stat_graph SandboxFilter configuration:
Other stats can be added to this graph by adjusting the stats and stat_labels values for our existing stat_graph filter config, although if we do so we’ll have to bump the preservation_version to tell Heka that the previous data structures are no longer valid. You can create multiple graphs by including additional SandboxFilter sections using the same stat_graph.lua source code.
It also should be mentioned that, while the stat_graph.lua filter we’ve been using only emits a single output graph, it is certainly possible for a single filter to generate multiple graphs. It’s also possible for SandboxFilters to emit other types of output, such as raw JSON data, which the DashboardOutput will happily serve as raw text. This can be very useful for generating ad-hoc API endpoints based on the data that Heka is processing. Dig in to our Sandbox documentation to learn more about writing your own Lua filters using our Sandbox API.
For our next trick, we’ll be loading an Nginx HTTP server’s access log files and extracting information about each HTTP request logged therein, storing it in a more structured manner in the fields of a Heka message. The first step is telling Heka where it can find the Nginx access log file. Except that the Nginx log typically isn’t just a single file, it’s a series of files subject to site specific rotation schemes. On the author’s Ubuntu-ish system, for instance, the /var/log/nginx directory looks like this, at the time of writing:
access.log
access.log.1
access.log.2.gz
access.log.3.gz
access.log.4.gz
access.log.5.gz
access.log.6.gz
access.log.7.gz
access.log.8.gz
access.log.9.gz
error.log
This is a common rotation scheme, but there are many others out there. And in cases where many domains are being hosted, there might be several sets of log files, one for each domain, each distinguished from the others by file and/or folder name. Luckily Heka’s Logstreamer Input provides a mechanism for handling all of these cases and more. The LogstreamerInput already has extensive documentation, so we won’t go into exhaustive detail here, instead we’ll show an example config that correctly handles the above case:
[nginx_access_logs]
type = "LogstreamerInput"
parser_type = "token"
decoder = "nginx_access_decoder"
log_directory = "/var/log/nginx"
file_match = 'access\.log\.?(?P<Index>\d+)?(.gz)?'
priority = ["^Index"]
The parser_type option above tells Heka that each record will be delimited by a one character token, in this case the default token n. If our files were delimited by a different character we could use a delimiter option to specify an alternate. (For log files where a single record spans multiple lines, we can use parser_type = “regexp” and then provide a regular expression that describes the record boundary.) The log_directory option tells where the files we’re interested in live. The file_match is a regular expression that matches all of the files comprising the log stream. In this case, they all must start with access.log, after which they can (optionally) be followed by a dot (.), then (optionally, again) one or two digits, then (optionally, one more time) a gzip extension (.gz). Any digits that are found are captured as the Index match group, and the priority option specifies that we use this Index value to determine the order of the files. The leading carat character (^) reverses the order of the priority, since in our case lower digits mean newer files.
The LogstreamerInput will use this configuration data to find all of the relevant files, then it will start working its way through the entire stream of files from oldest to newest, tracking its progress along the way. If Heka is stopped and restarted, it will pick up where it left off, even if that file was rotated during the time that Heka was down. When it gets to the end of the newest file, it will follow along, loading new lines as they’re added, and noticing when the file is rotated so it can hop forward to start loading the newer one.
Which then brings us to the decoder option. This tells Heka which decoder plugin the LogstreamerInput will be using to parse the loaded log files. The nginx_access_decoder configuration is as follows:
[nginx_access_decoder]
type = "SandboxDecoder"
filename = "lua_decoders/nginx_access.lua"
[nginx_access_decoder.config]
log_format = '$remote_addr - $remote_user [$time_local] "$request" $status $body_bytes_sent "$http_referer" "$http_user_agent"'
type = "nginx.access"
Some of this should be looking familiar by now. This is a SandboxDecoder, which means that it is a decoder plugin with the actual parsing logic implemented in Lua. The outer config section configures the SandboxDecoder itself, while the nested section provides additional config information that is passed in to the Lua code.
While it’s certainly possible to write your own custom Lua parsing code, in this case we are again using a plugin provided by Heka, specifically designed for parsing Nginx access logs. But Nginx doesn’t have a single access log format, the exact output is dynamically specified by a log_format directive in the Nginx configuration. Luckily Heka’s decoder is quite sophisticated; all you have to do to parse your access log output is copy the appropriate log_format directive out of the Nginx configuration file and paste it into the log_format option in your Heka decoder config, as above, and Heka will use the magic of LPEG to dynamically create a grammar that will extract the data from the log lines and store them in Heka message fields. Finally the type option above lets you specify what the Type field should be set to on the messages generated by this decoder.
One common use case people are interested in is taking the data extracted from their HTTP server logs and sending it on to ElasticSearch, often so they can peruse that data using dashboards generated by the excellent dashboard creation tool Kibana. We’ve handled loading and parsing the information with our input and decoder configuration above, now let’s look at the other side with the following output and encoder settings:
[ESJsonEncoder]
es_index_from_timestamp = true
type_name = "%{Type}"
[ElasticSearchOutput]
server = "elasticsearch.example.com:9200"
message_matcher = "Type == 'nginx.access'"
encoder = "ESJsonEncoder"
flush_interval = 50
Working backwards, we’ll first look at the ElasticSearchOutput configuration. The server setting indicates where ElasticSearch is listening. The message_matcher tells us we’ll be catching messages with a Type value of nginx.access, which you’ll recall was set in the decoder configuration we discussed above. The flush_interval setting specifies that we’ll be batching our records in the output and flushing them out to ElasticSearch every 50 milliseconds.
Which leaves us with the encoder setting, and the corresponding ESJsonEncoder section. The ElasticSearchOutput uses ElasticSearch’s Bulk API to tell ElasticSearch how the documents should be indexed, which means that each document insert consists of a small JSON object satisfying the Bulk API followed by another JSON object containing the document itself. At the time of writing, Heka provides three encoders that will extract data from a Heka message and generate an appropriate Bulk API header, the ESJsonEncoder we use above, which generates a clean document schema based on the schema of the message that is being encoded; the ESLogstashV0Encoder, which uses the “v0” schema format defined by Logstash (specifically intended for HTTP request data, natively supported by Kibana), and the ESPayloadEncoder, which assumes that the message payload will already contain a fully formed JSON document ready for sending to ElasticSearch, and just prepends the necessary Bulk API segement.
In our ESJsonEncoder section, we’re mostly adhering to the default settings. By default, this decoder inserts documents into an ElasticSearch index based on the current date: heka-YYYY.MM.DD (spelled as heka-%{2006.01.02} in the config). The es_index_from_timestamp = true option tells Heka to use the timestamp from the message when determining the date to use for the index name, as opposed to the default behavior which uses the system clock’s current time as the basis. The type option tells Heka what ElasticSearch record type should be used for each record. This option supports interpolation of various values from the message object; in the example above the message’s Type field will be used as the ElasticSearch record type name.
ElasticSearch and Kibana provide a number of nice tools for graphing and querying the HTTP request data that is being parsed from our Nginx logs but, as with the stats data above, it would be nice to get real time graphs of some of this data directly from Heka. As you might guess, Heka already provides plugins specifically for this purpose:
[http_status]
type = "SandboxFilter"
filename = "lua_filters/http_status.lua"
ticker_interval = 1
preserve_data = true
message_matcher = "Type == 'nginx.access'"
[http_status.config]
sec_per_row = 1
rows = 1800
perservation_version = 0
As mentioned earlier, graphing in Heka is accomplished through the cooperation of a filter which emits messages containing circular buffer data, and the DashboardOutput which consumes those messages and displays the data on a graph. We already configured a DashboardOutput earlier, so now we just need to add a filter that catches the nginx.access messages and aggregates the data into a circular buffer.
Heka has a standard message format that it uses for data that represents a single HTTP request, used by the Nginx access log decoder that is parsing our log files. In this format, the status code of the HTTP response is stored in a dynamic message field called, simply, status. The above filter will create a circular buffer data structure to store these response status codes in 6 columns: 100s, 200s, 300s, 400s, 500s, and unknown. Similar to before, the nested configuration tells the filter how many rows of data to keep in the circular buffer and how many seconds of data each row should represent. It also gives us a preservation_version so we can flag when the data structures have changed.
Once we add this section to our configuration and restart hekad, we should be able to browse to the dashboard UI and be able to find a graph of the various response status categories that are extracted from our HTTP server logs.
We’re getting close to the end of our journey. All of the data that we want to gather is now flowing through Heka, being delivered to external data stores for off line processing and analytics, and being displayed in real time graphs by Heka’s dashboard. The only remaining behavior we’re going to activate is anomaly detection, and the generation of notifiers based on anomalous events being detected. We’ll start by looking at the anomaly detection piece.
We’ve already discussed how Heka uses a circular buffer library to track time series data and generate graphs in the dashboard. Well it turns out that the anomaly detection features that Heka provides make use of the same circular buffer library.
Under the hood, how it works is that you provide an “anomaly config”, which is a string that looks something like a programming function call. The anomaly config specifies which anomaly detection algorithm should be used. Algorithms currently supported by Heka are a standard deviation rate of change test, and both parametric (i.e. Gaussian) and non-parametric Mann-Whitney-Wilcoxon tests. Included in the anomaly config is information about which column in a circular buffer data structure we want to monitor for anomalous behavior. Later, the parsed anomaly config is passed in to the detection module’s detect function, along with a populated circular buffer data structure, and the circular buffer data will be analyzed using the specified algorithm.
Luckily, for our use cases, you don’t have to worry too much about all of the details of using the anomaly detection library, because the SandboxFilters we’ve been using have already taken care of the hard parts. All we need to do is create an anomaly config string and add that to our config sections. For instance, here’s an example of how we might monitor our HTTP response status codes:
[http_status]
type = "SandboxFilter"
filename = "lua_filters/http_status.lua"
ticker_interval = 1
preserve_data = true
message_matcher = "Type == 'nginx.access'"
[http_status.config]
sec_per_row = 1
rows = 1800
perservation_version = 0
anomaly_config = 'roc("HTTP Status", 2, 15, 0, 1.5, true, false) mww_nonparametric("HTTP Status", 5, 15, 10, 0.8)'
Everything is the same as our earlier configuration, except we’ve added an anomaly_config setting. There’s a lot in there, so we’ll examine it a piece at a time. The first thing to notice is that there are actually two anomaly configs specified. You can add as many as you’d like. They’re space delimited here for readability, but that’s not strictly necessary, the parentheses surrounding the config parameters are enough for Heka to identify them. Next we’ll dive into the configurations, each in turn.
The first anomaly configuration by itself looks like this:
roc("HTTP Status", 2, 15, 0, 1.5, true, false)
The roc portion tells us that this config is using the rate of change algorithm. Each algorithm has its own set of parameters, so the values inside the parentheses are those that are required for a rate of change calculation. The first argument is payload_name, which needs to correspond to the payload_name value used when the message is injected back into Heka’s message router, which is “HTTP Status” in the case of this filter.
The next argument is the circular buffer column that we should be watching. We’re specifying column 2 here, which a quick peek at the http_status.lua source code will show you is the column where we’re tracking 200 status codes. The next value specifies how many intervals (i.e. circular buffer rows) should we use in our analysis window. We’ve said 15, which means that we’ll be examining the rate of change between the values in two 15 second intervals. Specifically, we’ll be comparing the data in rows 2 through 16 to the data in rows 17 through 31 (we always throw out the current row because it might not yet be complete).
After that we specify the number of intervals to use in our historical analysis window. Our setting of 0 means we’re using the entire history, rows 32 through 1800. This is followed by the standard deviation threshold parameter, which we’ve set to 1.5. So, put together, we’re saying if the rate of change of the number of 200 status responses over the last two 15 second intervals is more than 1.5 standard deviations off from the rate of change over the 29 minutes before that, then an anomaly alert should be triggered.
The last two parameters here are boolean values. The first of these is whether or not an alert should be fired in the event that we stop receiving input data (we’re saying yes), the second whether or not an alert should be fired if we start receiving data again after a gap (we’re saying no).
That’s the first one, now let’s look at the second:
mww_nonparametric("HTTP Status", 5, 15, 10, 0.8)
The mww_nonparametric tells us, as you might guess, that this config will be using the Mann-Whitney-Wilcoxon non-parametric algorithm for these computations. This algorithm can be used to identify similarities (or differences) between multiple data sets, even when those data sets have a non- Gaussian distribution, such as cases where the set of data points is sparse.
The next argument tells us what column we’ll be looking at. In this case we’re using column 5, which is where we store the 500 range status responses, or server errors. After that is the number of intervals to use in a analysis window (15), followed by the number of analysis windows to compare (10). In this case, that means we’ll be examining the last 15 seconds, and comparing what we find there with the 10 prior 15 second windows, or the 150 previous seconds.
The final argument is called pstat, which is a floating point value between 0 and 1. This tells us what type of data changes we’re going to be looking for. Anything over a 0.5 means we’re looking for an increasing trend, anything below 0.5 means we’re looking for a decreasing trend. We’ve set this to 0.8, which is clearly in the increasing trend range.
So, taken together, this anomaly config means that we’re going to be watching the last 15 seconds to see whether there is an anomalous spike in server errors, compared to the 10 intervals immediately prior. If we do detect a sizable spike in server errors, we consider it an anomaly and an alert will be generated.
In this example, we’ve only specified anomaly detection on our HTTP response status monitoring, but the anomaly_config option is also available to the stat graph filter, so we could apply similar monitoring to any of the statsd data that is contained in our statmetric messages.
But what do we mean, exactly, when we say that detecting an anomaly will generate an alert? As with nearly everything else in Heka, what we’re really saying is that a message will be injected into the message router, which other filter and output plugins are then able to listen for and use as a trigger for action.
We won’t go into detail here, but along with the anomaly detection module Heka’s Lua environment provides an alert module that generates alert messages (with throttling, to make sure hundreds of alerts in rapid succession don’t actually generate hundreds of separate notifications) and an annotation module that causes the dashboard to apply annotations to the graphs based on our circular buffer data. Both the http status and stat graph filters make use of both of these, so if you specify anomaly configs for either of those filters, output graphs will be annotated and alert messages will be generated when anomalies are detected.
Alert messages aren’t of much use if they’re just flowing through Heka’s message router and nothing is listening for them, however. So let’s set up an SmtpOutput that will listen for the alert messages, sending emails when they come through:
[alert_smtp_encoder]
type = "SandboxEncoder"
filename = "lua_encoders/alert.lua"
[SmtpOutput]
message_matcher = "Type == 'heka.sandbox-output' && Fields[payload_type] == 'alert'"
encoder = "alert_smtp_encoder"
send_from = "heka@example.com"
send_to = ["alert_recipient@example.com"]
auth = "Plain"
user = "smtpuser"
password = "smtpassword"
host = "127.0.0.1:25"
First we specify an encoder, using a very simple encoder implementation provided by Heka which extracts the timestamp, hostname, logger, and payload from the message and emits those values in a text format. Then we add the output itself, listening for any alert messages that are emitted by any of our SandboxFilter plugins, using the encoder to format the message body, and sending an outgoing mail message through the SMTP server as specified by the other configuration options.
And that’s it! We’re now generating email notifiers from our anomaly detection alerts.
Here’s what our full config looks like if we put it all together into a single file:
[hekad]
maxprocs = 2
[StatsdInput]
[StatAccumInput]
ticker_interval = 1
emit_in_fields = true
[CarbonOutput]
message_matcher = "Type == 'heka.statmetric'"
address = "mycarbonserver.example.com:2003"
protocol = "udp"
[statmetric-influx-encoder]
type = "SandboxEncoder"
filename = "lua_encoders/statmetric_influx.lua"
[influx]
type = "HttpOutput"
message_matcher = "Type == 'heka.statmetric'"
address = "http://myinfluxserver.example.com:8086/db/stats/series"
encoder = "statmetric-influx-encoder"
username = "influx_username"
password = "influx_password"
[stat_graph]
type = "SandboxFilter"
filename = "lua_filters/stat_graph.lua"
ticker_interval = 1
preserve_data = true
message_matcher = "Type == 'heka.statmetric'"
[stat_graph.config]
num_rows = 300
secs_per_row = 1
stats = "stats.counters.000000.count stats.counters.000001.count stats.counters.000002.count"
stat_labels = "counter_0 counter_1 counter_2"
preservation_version = 0
[DashboardOutput]
ticker_interval = 1
[nginx_access_logs]
type = "LogstreamerInput"
parser_type = "token"
decoder = "nginx_access_decoder"
log_directory = "/var/log/nginx"
file_match = 'access\.log\.?(?P<Index>\d+)?(.gz)?'
priority = ["^Index"]
[nginx_access_decoder]
type = "SandboxDecoder"
script_type = "lua"
filename = "lua_decoders/nginx_access.lua"
[nginx_access_decoder.config]
log_format = '$remote_addr - $remote_user [$time_local] "$request" $status $body_bytes_sent "$http_referer" "$http_user_agent"'
type = "nginx.access"
[ESJsonEncoder]
es_index_from_timestamp = true
type_name = "%{Type}"
[ElasticSearchOutput]
message_matcher = "Type == 'nginx.access'"
encoder = "ESJsonEncoder"
flush_interval = 50
[http_status]
type = "SandboxFilter"
filename = "lua_filters/http_status.lua"
ticker_interval = 1
preserve_data = true
message_matcher = "Type == 'nginx.access'"
[http_status.config]
sec_per_row = 1
rows = 1440
perservation_version = 0
anomaly_config = 'roc("HTTP Status", 2, 15, 0, 1.5, true, false) mww_nonparametric("HTTP Status", 5, 15, 10, 0.8)'
[alert_smtp_encoder]
type = "SandboxEncoder"
filename = "lua_encoders/alert.lua"
[SmtpOutput]
message_matcher = "Type == 'heka.sandbox-output' && Fields[payload_type] == 'alert'"
encoder = "alert_smtp_encoder"
send_from = "heka@example.com"
send_to = ["alert_recipient@example.com"]
auth = "Plain"
user = "smtpuser"
password = "smtpassword"
host = "127.0.0.1:25"
This isn’t too terribly long, but even so it might be nice to break it up into smaller pieces. Heka supports the use of a directory instead of a single file for configuration; if you specify a directory all files ending with .toml will be merged together and loaded as a single configuration, which is preferable for more complex deployments.
This example is not in any way meant to be an exhaustive list of Heka’s features. Indeed, we’ve only just barely scratched the surface. Hopefully, though, it gives those of you who are new to Heka enough context to understand how the pieces fit together, and it can be used as a starting point for developing configurations that will meet your own needs. If you have questions or need assistance getting things going, please make use of the mailing list, or use an IRC client to come visit in the #heka channel on irc.mozilla.org.
A hekad configuration file specifies what inputs, decoders, filters, encoders, and outputs will be loaded. The configuration file is in TOML format. TOML looks very similar to INI configuration formats, but with slightly more rich data structures and nesting support.
If hekad’s config file is specified to be a directory, all contained files with a filename ending in ”.toml” will be loaded and merged into a single config. Files that don’t end with ”.toml” will be ignored. Merging will happen in alphabetical order, settings specified later in the merge sequence will win conflicts.
The config file is broken into sections, with each section representing a single instance of a plugin. The section name specifies the name of the plugin, and the “type” parameter specifies the plugin type; this must match one of the types registered via the pipeline.RegisterPlugin function. For example, the following section describes a plugin named “tcp:5565”, an instance of Heka’s plugin type “TcpInput”:
[tcp:5565]
type = "TcpInput"
parser_type = "message.proto"
decoder = "ProtobufDecoder"
address = ":5565"
If you choose a plugin name that also happens to be a plugin type name, then you can omit the “type” parameter from the section and the specified name will be used as the type. Thus, the following section describes a plugin named “TcpInput”, also of type “TcpInput”:
[TcpInput]
address = ":5566"
parser_type = "message.proto"
decoder = "ProtobufDecoder"
Note that it’s fine to have more than one instance of the same plugin type, as long as their configurations don’t interfere with each other.
Any values other than “type” in a section, such as “address” in the above examples, will be passed through to the plugin for internal configuration (see Plugin Configuration).
If a plugin fails to load during startup, hekad will exit at startup. When hekad is running, if a plugin should fail (due to connection loss, inability to write a file, etc.) then hekad will either shut down or restart the plugin if the plugin supports restarting. When a plugin is restarting, hekad will likely stop accepting messages until the plugin resumes operation (this applies only to filters/output plugins).
Plugins specify that they support restarting by implementing the Restarting interface (see Restarting Plugins). Plugins supporting Restarting can have their restarting behavior configured.
An internal diagnostic runner runs every 30 seconds to sweep the packs used for messages so that possible bugs in heka plugins can be reported and pinned down to a likely plugin(s) that failed to properly recycle the pack.
You can optionally declare a [hekad] section in your configuration file to configure some global options for the heka daemon.
Config:
Turn on CPU profiling of hekad; output is logged to the output_file.
The maximum number of times a message can be re-injected into the system. This is used to prevent infinite message loops from filter to filter; the default is 4.
The maximum number of messages that a sandbox filter’s ProcessMessage function can inject in a single call; the default is 1.
The maximum number of nanoseconds that a sandbox filter’s ProcessMessage function can consume in a single call before being terminated; the default is 100000.
The maximum number of messages that a sandbox filter’s TimerEvent function can inject in a single call; the default is 10.
A time duration string (e.x. “2s”, “2m”, “2h”) indicating how long a message pack can be ‘idle’ before its considered leaked by heka. If too many packs leak from a bug in a filter or output then heka will eventually halt. This setting indicates when that is considered to have occurred.
Enable multi-core usage; the default is 1 core. More cores will generally increase message throughput. Best performance is usually attained by setting this to 2 x (number of cores). This assumes each core is hyper-threaded.
Enable memory profiling; output is logged to the output_file.
Specify the pool size of maximum messages that can exist; default is 100 which is usually sufficient and of optimal performance.
Specify the buffer size for the input channel for the various Heka plugins. Defaults to 50, which is usually sufficient and of optimal performance.
Base working directory Heka will use for persistent storage through process and server restarts. The hekad process must have read and write access to this directory. Defaults to /var/cache/hekad (or c:\var\cache\hekad on Windows).
Root path of Heka’s “share directory”, where Heka will expect to find certain resources it needs to consume. The hekad process should have read- only access to this directory. Defaults to /usr/share/heka (or c:\usr\share\heka on Windows).
New in version 0.6.
Specifies the denominator of the sample rate Heka will use when computing the time required to perform certain operations, such as for the ProtobufDecoder to decode a message, or the router to compare a message against a message matcher. Defaults to 1000, i.e. duration will be calculated for one message out of 1000.
New in version 0.6.
Optionally specify the location of a pidfile where the process id of the running hekad process will be written. The hekad process must have read and write access to the parent directory (which is not automatically created). On a successful exit the pidfile will be removed. If the path already exists the contained pid will be checked for a running process. If one is found, the current process will exit with an error.
[hekad]
maxprocs = 4
# Heka dashboard for internal metrics and time series graphs
[Dashboard]
type = "DashboardOutput"
address = ":4352"
ticker_interval = 15
# Email alerting for anomaly detection
[Alert]
type = "SmtpOutput"
message_matcher = "Type == 'heka.sandbox-output' && Fields[payload_type] == 'alert'"
send_from = "acme-alert@example.com"
send_to = ["admin@example.com"]
auth = "Plain"
user = "smtp-user"
password = "smtp-pass"
host = "mail.example.com:25"
encoder = "AlertEncoder"
# User friendly formatting of alert messages
[AlertEncoder]
type = "SandboxEncoder"
filename = "lua_encoders/alert.lua"
# Nginx access log reader
[AcmeWebserver]
type = "LogstreamerInput"
log_directory = "/var/log/nginx"
file_match = 'access\.log'
decoder = "CombinedNginxDecoder"
# Nginx access 'combined' log parser
[CombinedNginxDecoder]
type = "SandboxDecoder"
filename = "lua_decoders/nginx_access.lua"
[CombinedNginxDecoder.config]
user_agent_transform = true
user_agent_conditional = true
type = "combined"
log_format = '$remote_addr - $remote_user [$time_local] "$request" $status $body_bytes_sent "$http_referer" "$http_user_agent"'
# Collection and visualization of the HTTP status codes
[AcmeHTTPStatus]
type = "SandboxFilter"
filename = "lua_filters/http_status.lua"
ticker_interval = 60
preserve_data = true
message_matcher = "Logger == 'AcmeWebserver'"
# rate of change anomaly detection on column 1 (HTTP 200)
[AcmeHTTPStatus.config]
anomaly_config = 'roc("HTTP Status", 1, 15, 0, 1.5, true, false)'
If you wish to use environmental variables in your config files as a way to configure values, you can simply use %ENV[VARIABLE_NAME] and the text will be replaced with the value of the environmental variable VARIABLE_NAME.
Example:
[AMQPInput]
url = "amqp://%ENV[USER]:%ENV[PASSWORD]@rabbitmq/"
exchange = "testout"
exchangeType = "fanout"
Plugins that support being restarted have a set of options that govern how the restart is handled. If preferred, the plugin can be configured to not restart at which point hekad will exit, or it could be restarted only 100 times, or restart attempts can proceed forever.
Adding the restarting configuration is done by adding a config section to the plugins’ config called retries. A small amount of jitter will be added to the delay between restart attempts.
Config:
The longest jitter duration to add to the delay between restarts. Jitter up to 500ms by default is added to every delay to ensure more even restart attempts over time.
The longest delay between attempts to restart the plugin. Defaults to 30s (30 seconds).
The starting delay between restart attempts. This value will be the initial starting delay for the exponential back-off, and capped to be no larger than the max_delay. Defaults to 250ms.
Maximum amount of times to attempt restarting the plugin before giving up and exiting the plugin. Use 0 for no retry attempt, and -1 to continue trying forever (note that this will cause hekad to halt possibly forever if the plugin cannot be restarted). Defaults to -1.
Example:
[AMQPOutput]
url = "amqp://guest:guest@rabbitmq/"
exchange = "testout"
exchange_type = "fanout"
message_matcher = 'Logger == "TestWebserver"'
[AMQPOutput.retries]
max_delay = "30s"
delay = "250ms"
max_retries = 5
Connects to a remote AMQP broker (RabbitMQ) and retrieves messages from the specified queue. As AMQP is dynamically programmable, the broker topology needs to be specified in the plugin configuration.
Config:
An AMQP connection string formatted per the RabbitMQ URI Spec.
AMQP exchange name
AMQP exchange type (fanout, direct, topic, or headers).
Whether the exchange should be configured as a durable exchange. Defaults to non-durable.
Whether the exchange is deleted when all queues have finished and there is no publishing. Defaults to auto-delete.
The message routing key used to bind the queue to the exchange. Defaults to empty string.
How many messages to fetch at once before message acks are sent. See RabbitMQ performance measurements for help in tuning this number. Defaults to 2.
Name of the queue to consume from, an empty string will have the broker generate a name for the queue. Defaults to empty string.
Whether the queue is durable or not. Defaults to non-durable.
Whether the queue is exclusive (only one consumer allowed) or not. Defaults to non-exclusive.
Whether the queue is deleted when the last consumer un-subscribes. Defaults to auto-delete.
Allows ability to specify TTL in milliseconds on Queue declaration for expiring messages. Defaults to undefined/infinite.
Decoder name used to transform a raw message body into a structured hekad message. Must be a decoder appropriate for the messages that come in from the exchange. If accepting messages that have been generated by an AMQPOutput in another Heka process then this should be a ProtobufDecoder instance.
A sub-section that specifies the settings to be used for restart behavior. See Configuring Restarting Behavior
New in version 0.6.
An optional sub-section that specifies the settings to be used for any SSL/TLS encryption. This will only have any impact if URL uses the AMQPS URI scheme. See Configuring TLS.
Since many of these parameters have sane defaults, a minimal configuration to consume serialized messages would look like:
[AMQPInput]
url = "amqp://guest:guest@rabbitmq/"
exchange = "testout"
exchange_type = "fanout"
Or you might use a PayloadRegexDecoder to parse OSX syslog messages with the following:
[AMQPInput]
url = "amqp://guest:guest@rabbitmq/"
exchange = "testout"
exchange_type = "fanout"
decoder = "logparser"
[logparser]
type = "MultiDecoder"
subs = ["logline", "leftovers"]
[logline]
type = "PayloadRegexDecoder"
MatchRegex = '\w+ \d+ \d+:\d+:\d+ \S+ (?P<Reporter>[^\[]+)\[(?P<Pid>\d+)](?P<Sandbox>[^:]+)?: (?P Remaining>.*)'
[logline.MessageFields]
Type = "amqplogline"
Hostname = "myhost"
Reporter = "%Reporter%"
Remaining = "%Remaining%"
Logger = "%Logger%"
Payload = "%Remaining%"
[leftovers]
type = "PayloadRegexDecoder"
MatchRegex = '.*'
[leftovers.MessageFields]
Type = "drop"
Payload = ""
New in version 0.8.
The DockerLogInput plugin attaches to all containers running on a host and sends their logs messages into the Heka pipeline. The plugin is based on Logspout by Jeff Lindsay. Messages will be populated as follows:
Config:
A Docker endpoint. Defaults to “unix:///var/run/docker.sock”.
The name of the decoder used to further transform the message into a structured hekad message. No default decoder is specified.
Example:
[nginx_log_decoder]
type = "SandboxDecoder"
filename = "lua_decoders/nginx_access.lua"
[nginx_log_decoder.config]
type = "nginx.access"
user_agent_transform = true
log_format = '$remote_addr - $remote_user [$time_local] "$request" $status $body_bytes_sent "$http_referer" "$http_user_agent"'
[DockerLogInput]
decoder = "nginx_log_decoder"
New in version 0.7.
FilePollingInputs periodically read (unbuffered) the contents of a file specified, and creates a Heka message with the contents of the file as the payload.
Config:
The absolute path to the file which the input should read.
How often, in seconds to input should read the contents of the file.
The name of the decoder used to process the payload of the input.
Example:
[MemStats]
type = "FilePollingInput"
ticker_interval = 1
file_path = "/proc/meminfo"
decoder = "MemStatsDecoder"
HttpInput plugins intermittently poll remote HTTP URLs for data and populate message objects based on the results of the HTTP interactions. Messages will be populated as follows:
Uuid: Type 4 (random) UUID generated by Heka.
Timestamp: Time HTTP request is completed.
not the request completed. (Note that a response returned with an HTTP error code is still considered complete and will generate type heka.httpinput.data.)
Hostname: Hostname of the machine on which Heka is running.
Payload: Entire contents of the HTTP response body.
results use error_severity config value.
Logger: Fetched URL.
Fields[“Status”] (string): HTTP status string value (e.g. “200 OK”).
Fields[“StatusCode”] (int): HTTP status code integer value.
Fields[“ResponseSize”] (int): Value of HTTP Content-Length header.
seconds.
“HTTP/1.0”)
The Fields values above will only be populated in the event of a completed HTTP request. Also, it is possible to specify a decoder to further process the results of the HTTP response before injecting the message into the router.
Config:
A HTTP URL which this plugin will regularly poll for data. This option cannot be used with the urls option. No default URL is specified.
New in version 0.5.
An array of HTTP URLs which this plugin will regularly poll for data. This option cannot be used with the url option. No default URLs are specified.
New in version 0.5.
The HTTP method to use for the request. Defaults to “GET”.
New in version 0.5.
Subsection defining headers for the request. By default the User-Agent header is set to “Heka”
New in version 0.5.
The request body (e.g. for an HTTP POST request). No default body is specified.
New in version 0.5.
The username for HTTP Basic Authentication. No default username is specified.
New in version 0.5.
The password for HTTP Basic Authentication. No default password is specified.
Time interval (in seconds) between attempts to poll for new data. Defaults to 10.
New in version 0.5.
Severity level of successful HTTP request. Defaults to 6 (information).
New in version 0.5.
Severity level of errors, unreachable connections, and non-200 responses of successful HTTP requests. Defaults to 1 (alert).
The name of the decoder used to further transform the response body text into a structured hekad message. No default decoder is specified.
Example:
[HttpInput]
url = "http://localhost:9876/"
ticker_interval = 5
success_severity = 6
error_severity = 1
decoder = "MyCustomJsonDecoder"
[HttpInput.headers]
user-agent = "MyCustomUserAgent"
New in version 0.5.
HttpListenInput plugins start a webserver listening on the specified address and port. If no decoder is specified data in the request body will be populated as the message payload. Messages will be populated as follows:
Uuid: Type 4 (random) UUID generated by Heka.
Timestamp: Time HTTP request is handled.
Type: heka.httpdata.request
Hostname: The remote network address of requester.
Payload: Entire contents of the HTTP response body.
Severity: 6
Logger: HttpListenInput
Fields[“UserAgent”] (string): Request User-Agent header (e.g. “GitHub Hookshot dd0772a”).
Fields[“ContentType”] (string): Request Content-Type header (e.g. “application/x-www-form-urlencoded”).
“HTTP/1.0”)
Config:
An IP address:port on which this plugin will expose a HTTP server. Defaults to “127.0.0.1:8325”.
The name of the decoder used to further transform the request body text into a structured hekad message. No default decoder is specified.
New in version 0.7.
It is possible to inject arbitrary HTTP headers into each outgoing response by adding a TOML subsection entitled “headers” to you HttpOutput config section. All entries in the subsection must be a list of string values.
Example:
[HttpListenInput]
address = "0.0.0.0:8325"
New in version 0.5.
Tails a single log file, a sequential single log source, or multiple log sources of either a single logstream or multiple logstreams.
See also
Config:
The hostname to use for the messages, by default this will be the machine’s qualified hostname. This can be set explicitly to ensure it’s the correct name in the event the machine has multiple interfaces/hostnames.
A time duration string (e.x. “2s”, “2m”, “2h”). Logfiles with a last modified time older than oldest_duration ago will not be included for parsing.
The directory to store the journal files in for tracking the location that has been read to thus far. By default this is stored under heka’s base directory.
The root directory to scan files from. This scan is recursive so it should be suitably restricted to the most specific directory this selection of logfiles will be matched under. The log_directory path will be prepended to the file_match.
During logfile rotation, or if the logfile is not originally present on the system, this interval is how often the existence of the logfile will be checked for. The default of 5 seconds is usually fine. This interval is in milliseconds.
Regular expression used to match files located under the log_directory. This regular expression has $ added to the end automatically if not already present, and log_directory as the prefix. WARNING: file_match should typically be delimited with single quotes, indicating use of a raw string, rather than double quotes, which require all backslashes to be escaped. For example, ‘access\.log’ will work as expected, but “access\.log” will not, you would need “access\\.log” to achieve the same result.
When using sequential logstreams, the priority is how to sort the logfiles in order from oldest to newest.
When using multiple logstreams, the differentiator is a set of strings that will be used in the naming of the logger, and portions that match a captured group from the file_match will have their matched value substituted in.
A set of translation mappings for matched groupings to the ints to use for sorting purposes.
A ProtobufDecoder instance must be specified for the message.proto parser. Use of a decoder is optional for token and regexp parsers; if no decoder is specified the parsed data is available in the Heka message payload.
Character or regexp delimiter used by the parser (default “\n”). For the regexp delimiter a single capture group can be specified to preserve the delimiter (or part of the delimiter). The capture will be added to the start or end of the log line depending on the delimiter_location configuration. Note: when a start delimiter is used the last line in the file will not be processed (since the next record defines its end) until the log is rolled.
Executes one or more external programs on an interval, creating messages from the output. Supports a chain of commands, where stdout from each process will be piped into the stdin for the next process in the chain. In the event the program returns a non-zero exit code, ProcessInput will log that an error occurred.
Config:
The command is a structure that contains the full path to the binary, command line arguments, optional enviroment variables and an optional working directory (see below). ProcessInput expects the commands to be indexed by integers starting with 0, where 0 is the first process in the chain.
The number of seconds to wait between each run of command. Defaults to 15. A ticker_interval of 0 indicates that the command is run only once, and should only be used for long running processes that do not exit. If ticker_interval is set to 0 and the process exits, then the ProcessInput will exit, invoking the restart behavior (see Configuring Restarting Behavior).
If true, for each run of the process chain a message will be generated with the last command in the chain’s stdout as the payload. Defaults to true.
If true, for each run of the process chain a message will be generated with the last command in the chain’s stderr as the payload. Defaults to false.
Name of the decoder instance to send messages to. If omitted messages will be injected directly into Heka’s message router.
Character or regexp delimiter used by the parser (default “\n”). For the regexp delimiter a single capture group can be specified to preserve the delimiter (or part of the delimiter). The capture will be added to the start or end of the log line depending on the delimiter_location configuration. Note: when a start delimiter is used the last line in the file will not be processed (since the next record defines its end) until the log is rolled.
Timeout in seconds before any one of the commands in the chain is terminated.
Trim a single trailing newline character if one exists. Default is true.
A sub-section that specifies the settings to be used for restart behavior. See Configuring Restarting Behavior
cmd_config structure:
The full path to the binary that will be executed.
Command line arguments to pass into the executable.
Used to set environment variables before command is run. Default is nil, which uses the heka process’s environment.
Used to set the working directory of Bin Default is “”, which uses the heka process’s working directory.
Example:
[DemoProcessInput]
type = "ProcessInput"
ticker_interval = 2
parser_type = "token"
delimiter = " "
stdout = true
stderr = false
trim = true
[DemoProcessInput.command.0]
bin = "/bin/cat"
args = ["../testsupport/process_input_pipes_test.txt"]
[DemoProcessInput.command.1]
bin = "/usr/bin/grep"
args = ["ignore"]
New in version 0.5.
The ProcessDirectoryInput periodically scans a filesystem directory looking for ProcessInput configuration files. The ProcessDirectoryInput will maintain a pool of running ProcessInputs based on the contents of this directory, refreshing the set of running inputs as needed with every rescan. This allows Heka administrators to manage a set of data collection processes for a running hekad server without restarting the server.
Each ProcessDirectoryInput has a process_dir configuration setting, which is the root folder of the tree where scheduled jobs are defined. It should contain exactly one nested level of subfolders, named with ASCII numeric characters indicating the interval, in seconds, between each process run. These numeric folders must contain TOML files which specify the details regarding which processes to run.
For example, a process_dir might look like this:
-/usr/share/heka/processes/
|-5
|- check_myserver_running.toml
|-61
|- cat_proc_mounts.toml
|- get_running_processes.toml
|-302
|- some_custom_query.toml
This indicates one process to be run every five seconds, two processes to be run every 61 seconds, and one process to be run every 302 seconds.
Note that ProcessDirectory will ignore any files that are not nested one level deep, are not in a folder named for an integer 0 or greater, and do not end with ‘.toml’. Each file which meets these criteria, such as those shown in the example above, should contain the TOML configuration for exactly one ProcessInput, matching that of a standalone ProcessInput with the following restrictions:
If the specified process fails to run or the ProcessInput config fails for any other reason, ProcessDirectoryInput will log an error message and continue.
Config:
Amount of time, in seconds, between scans of the process_dir. Defaults to 300 (i.e. 5 minutes).
This is the root folder of the tree where the scheduled jobs are defined. Absolute paths will be honored, relative paths will be computed relative to Heka’s globally specified share_dir. Defaults to “processes” (i.e. “$share_dir/processes”).
A sub-section that specifies the settings to be used for restart behavior. See Configuring Restarting Behavior
Example:
[ProcessDirectoryInput]
process_dir = "/etc/hekad/processes.d"
ticker_interval = 120
Provides an implementation of the StatAccumulator interface which other plugins can use to submit Stat objects for aggregation and roll-up. Accumulates these stats and then periodically emits a “stat metric” type message containing aggregated information about the stats received since the last generated message.
Config:
Specifies whether or not the aggregated stat information should be emitted in the message fields of the generated messages. Defaults to false. NOTE: At least one of ‘emit_in_payload’ or ‘emit_in_fields’ must be true or it will be considered a configuration error and the input won’t start.
Percent threshold to use for computing “upper_N%” type stat values. Defaults to 90.
Time interval (in seconds) between generated output messages. Defaults to 10.
String value to use for the Type value of the emitted stat messages. Defaults to “heka.statmetric”.
If set to true, then use the older format for namespacing counter stats, with rates recorded under stats.<counter_name> and absolute count recorded under stats_counts.<counter_name>. See statsd metric namespacing. Defaults to false.
Global prefix to use for sending stats to graphite. Defaults to “stats”.
Secondary prefix to use for namespacing counter metrics. Has no impact unless legacy_namespaces is set to false. Defaults to “counters”.
Secondary prefix to use for namespacing timer metrics. Defaults to “timers”.
Secondary prefix to use for namespacing gauge metrics. Defaults to “gauges”.
Prefix to use for the statsd numStats metric. Defaults to “statsd”.
Don’t emit values for inactive stats instead of sending 0 or in the case of gauges, sending the previous value. Defaults to false.
Listens for statsd protocol counter, timer, or gauge messages on a UDP port, and generates Stat objects that are handed to a StatAccumulator for aggregation and processing.
Config:
An IP address:port on which this plugin will expose a statsd server. Defaults to “127.0.0.1:8125”.
Name of a StatAccumInput instance that this StatsdInput will use as its StatAccumulator for submitting received stat values. Defaults to “StatAccumInput”.
Size of a buffer used for message read from statsd. In some cases, when statsd sends a lots in single message of stats it’s required to boost this value. All over-length data will be truncated without raising an error. Defaults to 512.
Example:
[StatsdInput]
address = ":8125"
stat_accum_name = "custom_stat_accumulator"
Listens on a specific TCP address and port for messages. If the message is signed it is verified against the signer name and specified key version. If the signature is not valid the message is discarded otherwise the signer name is added to the pipeline pack and can be use to accept messages using the message_signer configuration option.
Config:
An IP address:port on which this plugin will listen.
Optional TOML subsection. Section name consists of a signer name, underscore, and numeric version of the key.
The hash key used to sign the message.
New in version 0.4.
A ProtobufDecoder instance must be specified for the message.proto parser. Use of a decoder is optional for token and regexp parsers; if no decoder is specified the raw input data is available in the Heka message payload.
Character or regexp delimiter used by the parser (default “\n”). For the regexp delimiter a single capture group can be specified to preserve the delimiter (or part of the delimiter). The capture will be added to the start or end of the message depending on the delimiter_location configuration.
New in version 0.5.
Specifies whether or not SSL/TLS encryption should be used for the TCP connections. Defaults to false.
A sub-section that specifies the settings to be used for any SSL/TLS encryption. This will only have any impact if use_tls is set to true. See Configuring TLS.
Network value must be one of: “tcp”, “tcp4”, “tcp6”, “unix” or “unixpacket”.
New in version 0.6.
Specifies whether or not TCP keepalive should be used for established TCP connections. Defaults to false.
Time duration in seconds that a TCP connection will be maintained before keepalive probes start being sent. Defaults to 7200 (i.e. 2 hours).
Example:
[TcpInput]
address = ":5565"
parser_type = "message.proto"
decoder = "ProtobufDecoder"
[TcpInput.signer.ops_0]
hmac_key = "4865ey9urgkidls xtb0[7lf9rzcivthkm"
[TcpInput.signer.ops_1]
hmac_key = "xdd908lfcgikauexdi8elogusridaxoalf"
[TcpInput.signer.dev_1]
hmac_key = "haeoufyaiofeugdsnzaogpi.ua,dp.804u"
Listens on a specific UDP address and port for messages. If the message is signed it is verified against the signer name and specified key version. If the signature is not valid the message is discarded otherwise the signer name is added to the pipeline pack and can be use to accept messages using the message_signer configuration option.
Note
The UDP payload is not restricted to a single message; since the stream parser is being used multiple messages can be sent in a single payload.
Config:
An IP address:port or Unix datagram socket file path on which this plugin will listen.
Optional TOML subsection. Section name consists of a signer name, underscore, and numeric version of the key.
The hash key used to sign the message.
New in version 0.4.
A ProtobufDecoder instance must be specified for the message.proto parser. Use of a decoder is optional for token and regexp parsers; if no decoder is specified the raw input data is available in the Heka message payload.
Character or regexp delimiter used by the parser (default “\n”). For the regexp delimiter a single capture group can be specified to preserve the delimiter (or part of the delimiter). The capture will be added to the start or end of the message depending on the delimiter_location configuration.
New in version 0.5.
Network value must be one of: “udp”, “udp4”, “udp6”, or “unixgram”.
Example:
[UdpInput]
address = "127.0.0.1:4880"
parser_type = "message.proto"
decoder = "ProtobufDecoder"
[UdpInput.signer.ops_0]
hmac_key = "4865ey9urgkidls xtb0[7lf9rzcivthkm"
[UdpInput.signer.ops_1]
hmac_key = "xdd908lfcgikauexdi8elogusridaxoalf"
[UdpInput.signer.dev_1]
hmac_key = "haeoufyaiofeugdsnzaogpi.ua,dp.804u"
New in version 0.6.
Parses the Apache access logs based on the Apache ‘LogFormat’ configuration directive. The Apache format specifiers are mapped onto the Nginx variable names where applicable e.g. %a -> remote_addr. This allows generic web filters and outputs to work with any HTTP server input.
Config:
The ‘LogFormat’ configuration directive from the apache2.conf. %t variables are converted to the number of nanosecond since the Unix epoch and used to set the Timestamp on the message. http://httpd.apache.org/docs/2.4/mod/mod_log_config.html
Sets the message ‘Type’ header to the specified value
Transform the http_user_agent into user_agent_browser, user_agent_version, user_agent_os.
Always preserve the http_user_agent value if transform is enabled.
Only preserve the http_user_agent value if transform is enabled and fails.
Always preserve the original log line in the message payload.
Example Heka Configuration
[TestWebserver]
type = "LogstreamerInput"
log_directory = "/var/log/apache"
file_match = 'access\.log'
decoder = "CombinedLogDecoder"
[CombinedLogDecoder]
type = "SandboxDecoder"
filename = "lua_decoders/apache_access.lua"
[CombinedLogDecoder.config]
type = "combined"
user_agent_transform = true
# combined log format
log_format = '%h %l %u %t \"%r\" %>s %O \"%{Referer}i\" \"%{User-Agent}i\"'
# common log format
# log_format = '%h %l %u %t \"%r\" %>s %O'
# vhost_combined log format
# log_format = '%v:%p %h %l %u %t \"%r\" %>s %O \"%{Referer}i\" \"%{User-Agent}i\"'
# referer log format
# log_format = '%{Referer}i -> %U'
Example Heka Message
Timestamp: | 2014-01-10 07:04:56 -0800 PST |
---|---|
Type: | combined |
Hostname: | test.example.com |
Pid: | 0 |
UUID: | 8e414f01-9d7f-4a48-a5e1-ae92e5954df5 |
Logger: | TestWebserver |
Payload: | |
EnvVersion: | |
Severity: | 7 |
Fields: | name:”remote_user” value_string:”-“
name:”http_x_forwarded_for” value_string:”-“
name:”http_referer” value_string:”-“
name:”body_bytes_sent” value_type:DOUBLE representation:”B” value_double:82
name:”remote_addr” value_string:”62.195.113.219” representation:”ipv4”
name:”status” value_type:DOUBLE value_double:200
name:”request” value_string:”GET /v1/recovery_email/status HTTP/1.1”
name:”user_agent_os” value_string:”FirefoxOS”
name:”user_agent_browser” value_string:”Firefox”
name:”user_agent_version” value_type:DOUBLE value_double:29
|
New in version 0.8.
Parses a payload containing JSON in the Graylog2 Extended Format specficiation. http://graylog2.org/resources/gelf/specification
Config:
Sets the message ‘Type’ header to the specified value
Always preserve the original log line in the message payload.
Example of Graylog2 Exteded Format Log
{
"version": "1.1",
"host": "rogueethic.com",
"short_message": "This is a short message to identify what is going on.",
"full_message": "An entire backtrace\ncould\ngo\nhere",
"timestamp": 1385053862.3072,
"level": 1,
"_user_id": 9001,
"_some_info": "foo",
"_some_env_var": "bar"
}
Example Heka Configuration
[GELFLogInput]
type = "LogstreamerInput"
log_directory = "/var/log"
file_match = 'application\.gelf'
decoder = "GraylogDecoder"
[GraylogDecoder]
type = "SandboxDecoder"
filename = "lua_decoders/graylog_decoder.lua"
[GraylogDecoder.config]
type = "gelf"
payload_keep = true
New in version 0.6.
Decoder plugin that generates GeoIP data based on the IP address of a specified field. It uses the GeoIP Go project as a wrapper around MaxMind’s geoip-api-c library, and thus assumes you have the library downloaded and installed. Currently, only the GeoLiteCity database is supported, which you must also download and install yourself into a location to be referenced by the db_file config option. By default the database file is opened using “GEOIP_MEMORY_CACHE” mode. This setting is hard- coded into the wrapper’s geoip.go file. You will need to manually override that code if you want to specify one of the other modes listed here.
Note
Due to external dependencies, this plugin is not compiled in to the released Heka binaries. It will automatically be included in a source build if GeoIP.h is available in the include path during build time. The generated binary will then only work on machines with the appropriate GeoIP shared library (e.g. libGeoIP.so.1) installed.
Note
If you are using this with the ES output you will likely need to specify the raw_bytes_field option for the target_field specified. This is required to preserve the formatting of the JSON object.
Config:
The location of the GeoLiteCity.dat database. Defaults to “/var/cache/hekad/GeoLiteCity.dat”
The name of the field containing the IP address you want to derive the location for.
The name of the new field created by the decoder. The decoder will output a JSON object with the following elements:
latitute: string,
longitude: string,
- location: [ float64, float64 ],
- GeoJSON format intended for use as a geo_point for ES output. Useful when using Kibana’s Bettermap panel
coordinates: [ string, string ],
countrycode: string,
countrycode3: string,
region: string,
city: string,
postalcode: string,
areacode: int,
charset: int,
continentalcode: string
[apache_geoip_decoder]
type = "GeoIpDecoder"
db_file="/etc/geoip/GeoLiteCity.dat"
source_ip_field="remote_host"
target_field="geoip"
This decoder plugin allows you to specify an ordered list of delegate decoders. The MultiDecoder will pass the PipelinePack to be decoded to each of the delegate decoders in turn until decode succeeds. In the case of failure to decode, MultiDecoder will return an error and recycle the message.
Config:
An ordered list of subdecoders to which the MultiDecoder will delegate. Each item in the list should specify another decoder configuration section by section name. Must contain at least one entry.
If true, the DecoderRunner will log the errors returned whenever a delegate decoder fails to decode a message. Defaults to false.
Specifies behavior the MultiDecoder should exhibit with regard to cascading through the listed decoders. Supports only two valid values: “first-wins” and “all”. With “first-wins”, each decoder will be tried in turn until there is a successful decoding, after which decoding will be stopped. With “all”, all listed decoders will be applied whether or not they succeed. In each case, decoding will only be considered to have failed if none of the sub-decoders succeed.
Here is a slightly contrived example where we have protocol buffer encoded messages coming in over a TCP connection, with each message containin a single nginx log line. Our MultiDecoder will run each message through two decoders, the first to deserialize the protocol buffer and the second to parse the log text:
[TcpInput]
address = ":5565"
parser_type = "message.proto"
decoder = "shipped-nginx-decoder"
[shipped-nginx-decoder]
type = "MultiDecoder"
subs = ['ProtobufDecoder', 'nginx-access-decoder']
cascade_strategy = "all"
log_sub_errors = true
[ProtobufDecoder]
[nginx-access-decoder]
type = "SandboxDecoder"
filename = "lua_decoders/nginx_access.lua"
[nginx-access-decoder.config]
type = "combined"
user_agent_transform = true
log_format = '$remote_addr - $remote_user [$time_local] "$request" $status $body_bytes_sent "$http_referer" "$http_user_agent"'
New in version 0.7.
Parses a payload containing the contents of a /sys/block/$DISK/stat file (where $DISK is a disk identifier such as sda) into a Heka message struct. This also tries to obtain the TickerInterval of the input it recieved the data from, by extracting it from a message field named TickerInterval.
Config:
Always preserve the original log line in the message payload.
Example Heka Configuration
[DiskStats]
type = "FilePollingInput"
ticker_interval = 1
file_path = "/sys/block/sda1/stat"
decoder = "DiskStatsDecoder"
[DiskStatsDecoder]
type = "SandboxDecoder"
filename = "lua_decoders/linux_diskstats.lua"
Example Heka Message
Timestamp: | 2014-01-10 07:04:56 -0800 PST |
---|---|
Type: | stats.diskstats |
Hostname: | test.example.com |
Pid: | 0 |
UUID: | 8e414f01-9d7f-4a48-a5e1-ae92e5954df5 |
Payload: | |
EnvVersion: | |
Severity: | 7 |
Fields: | name:”ReadsCompleted” value_type:DOUBLE value_double:”20123”
name:”ReadsMerged” value_type:DOUBLE value_double:”11267”
name:”SectorsRead” value_type:DOUBLE value_double:”1.094968e+06”
name:”TimeReading” value_type:DOUBLE value_double:”45148”
name:”WritesCompleted” value_type:DOUBLE value_double:”1278”
name:”WritesMerged” value_type:DOUBLE value_double:”1278”
name:”SectorsWritten” value_type:DOUBLE value_double:”206504”
name:”TimeWriting” value_type:DOUBLE value_double:”3348”
name:”TimeDoingIO” value_type:DOUBLE value_double:”4876”
name:”WeightedTimeDoingIO” value_type:DOUBLE value_double:”48356”
name:”NumIOInProgress” value_type:DOUBLE value_double:”3”
name:”TickerInterval” value_type:DOUBLE value_double:”2”
name:”FilePath” value_string:”/sys/block/sda/stat”
|
New in version 0.7.
Parses a payload containing the contents of a /proc/loadavg file into a Heka message.
Config:
Always preserve the original log line in the message payload.
Example Heka Configuration
[LoadAvg]
type = "FilePollingInput"
ticker_interval = 1
file_path = "/proc/loadavg"
decoder = "LoadAvgDecoder"
[LoadAvgDecoder]
type = "SandboxDecoder"
filename = "lua_decoders/linux_loadavg.lua"
Example Heka Message
Timestamp: | 2014-01-10 07:04:56 -0800 PST |
---|---|
Type: | stats.loadavg |
Hostname: | test.example.com |
Pid: | 0 |
UUID: | 8e414f01-9d7f-4a48-a5e1-ae92e5954df5 |
Payload: | |
EnvVersion: | |
Severity: | 7 |
Fields: | name:”1MinAvg” value_type:DOUBLE value_double:”3.05”
name:”5MinAvg” value_type:DOUBLE value_double:”1.21”
name:”15MinAvg” value_type:DOUBLE value_double:”0.44”
name:”NumProcesses” value_type:DOUBLE value_double:”11”
name:”FilePath” value_string:”/proc/loadavg”
|
New in version 0.7.
Parses a payload containing the contents of a /proc/meminfo file into a Heka message.
Config:
Always preserve the original log line in the message payload.
Example Heka Configuration
[MemStats]
type = "FilePollingInput"
ticker_interval = 1
file_path = "/proc/meminfo"
decoder = "MemStatsDecoder"
[MemStatsDecoder]
type = "SandboxDecoder"
filename = "lua_decoders/linux_memstats.lua"
Example Heka Message
Timestamp: | 2014-01-10 07:04:56 -0800 PST |
---|---|
Type: | stats.memstats |
Hostname: | test.example.com |
Pid: | 0 |
UUID: | 8e414f01-9d7f-4a48-a5e1-ae92e5954df5 |
Payload: | |
EnvVersion: | |
Severity: | 7 |
Fields: | name:”MemTotal” value_type:DOUBLE representation:”kB” value_double:”4047616”
name:”MemFree” value_type:DOUBLE representation:”kB” value_double:”3432216”
name:”Buffers” value_type:DOUBLE representation:”kB” value_double:”82028”
name:”Cached” value_type:DOUBLE representation:”kB” value_double:”368636”
name:”FilePath” value_string:”/proc/meminfo”
|
The total available fields can be found in man procfs. All fields are of type double, and the representation is in kB (except for the HugePages fields). Here is a full list of fields available:
MemTotal, MemFree, Buffers, Cached, SwapCached, Active, Inactive, Active(anon), Inactive(anon), Active(file), Inactive(file), Unevictable, Mlocked, SwapTotal, SwapFree, Dirty, Writeback, AnonPages, Mapped, Shmem, Slab, SReclaimable, SUnreclaim, KernelStack, PageTables, NFS_Unstable, Bounce, WritebackTmp, CommitLimit, Committed_AS, VmallocTotal, VmallocUsed, VmallocChunk, HardwareCorrupted, AnonHugePages, HugePages_Total, HugePages_Free, HugePages_Rsvd, HugePages_Surp, Hugepagesize, DirectMap4k, DirectMap2M, DirectMap1G.
Note that your available fields may have a slight variance depending on the system’s kernel version.
New in version 0.6.
Parses and transforms the MySQL slow query logs. Use mariadb_slow_query.lua to parse the MariaDB variant of the MySQL slow query logs.
Config:
Truncates the SQL payload to the specified number of bytes (not UTF-8 aware) and appends ”...”. If the value is nil no truncation is performed. A negative value will truncate the specified number of bytes from the end.
Example Heka Configuration
[Sync-1_5-SlowQuery]
type = "LogstreamerInput"
log_directory = "/var/log/mysql"
file_match = 'mysql-slow\.log'
parser_type = "regexp"
delimiter = "\n(# User@Host:)"
delimiter_location = "start"
decoder = "MySqlSlowQueryDecoder"
[MySqlSlowQueryDecoder]
type = "SandboxDecoder"
filename = "lua_decoders/mysql_slow_query.lua"
[MySqlSlowQueryDecoder.config]
truncate_sql = 64
Example Heka Message
Timestamp: | 2014-05-07 15:51:28 -0700 PDT |
---|---|
Type: | mysql.slow-query |
Hostname: | 127.0.0.1 |
Pid: | 0 |
UUID: | 5324dd93-47df-485b-a88e-429f0fcd57d6 |
Logger: | Sync-1_5-SlowQuery |
Payload: | /* [queryName=FIND_ITEMS] */ SELECT bso.userid, bso.collection, ... |
EnvVersion: | |
Severity: | 7 |
Fields: | name:”Rows_examined” value_type:DOUBLE value_double:16458
name:”Query_time” value_type:DOUBLE representation:”s” value_double:7.24966
name:”Rows_sent” value_type:DOUBLE value_double:5001
name:”Lock_time” value_type:DOUBLE representation:”s” value_double:0.047038
|
New in version 0.5.
Parses the Nginx access logs based on the Nginx ‘log_format’ configuration directive.
Config:
The ‘log_format’ configuration directive from the nginx.conf. $time_local or $time_iso8601 variable is converted to the number of nanosecond since the Unix epoch and used to set the Timestamp on the message. http://nginx.org/en/docs/http/ngx_http_log_module.html
Sets the message ‘Type’ header to the specified value
Transform the http_user_agent into user_agent_browser, user_agent_version, user_agent_os.
Always preserve the http_user_agent value if transform is enabled.
Only preserve the http_user_agent value if transform is enabled and fails.
Always preserve the original log line in the message payload.
Example Heka Configuration
[TestWebserver]
type = "LogstreamerInput"
log_directory = "/var/log/nginx"
file_match = 'access\.log'
decoder = "CombinedLogDecoder"
[CombinedLogDecoder]
type = "SandboxDecoder"
filename = "lua_decoders/nginx_access.lua"
[CombinedLogDecoder.config]
type = "combined"
user_agent_transform = true
# combined log format
log_format = '$remote_addr - $remote_user [$time_local] "$request" $status $body_bytes_sent "$http_referer" "$http_user_agent"'
Example Heka Message
Timestamp: | 2014-01-10 07:04:56 -0800 PST |
---|---|
Type: | combined |
Hostname: | test.example.com |
Pid: | 0 |
UUID: | 8e414f01-9d7f-4a48-a5e1-ae92e5954df5 |
Logger: | TestWebserver |
Payload: | |
EnvVersion: | |
Severity: | 7 |
Fields: | name:”remote_user” value_string:”-“
name:”http_x_forwarded_for” value_string:”-“
name:”http_referer” value_string:”-“
name:”body_bytes_sent” value_type:DOUBLE representation:”B” value_double:82
name:”remote_addr” value_string:”62.195.113.219” representation:”ipv4”
name:”status” value_type:DOUBLE value_double:200
name:”request” value_string:”GET /v1/recovery_email/status HTTP/1.1”
name:”user_agent_os” value_string:”FirefoxOS”
name:”user_agent_browser” value_string:”Firefox”
name:”user_agent_version” value_type:DOUBLE value_double:29
|
New in version 0.6.
Parses the Nginx error logs based on the Nginx hard coded internal format.
Config:
The conversion actually happens on the Go side since there isn’t good TZ support here.
Example Heka Configuration
[TestWebserverError]
type = "LogstreamerInput"
log_directory = "/var/log/nginx"
file_match = 'error\.log'
decoder = "NginxErrorDecoder"
[NginxErrorDecoder]
type = "SandboxDecoder"
filename = "lua_decoders/nginx_error.lua"
[NginxErrorDecoder.config]
tz = "America/Los_Angeles"
Example Heka Message
Timestamp: | 2014-01-10 07:04:56 -0800 PST |
---|---|
Type: | nginx.error |
Hostname: | trink-x230 |
Pid: | 16842 |
UUID: | 8e414f01-9d7f-4a48-a5e1-ae92e5954df5 |
Logger: | TestWebserverError |
Payload: | using inherited sockets from “6;” |
EnvVersion: | |
Severity: | 5 |
Fields: | name:”tid” value_type:DOUBLE value_double:0
name:”connection” value_type:DOUBLE value_double:8878
|
Decoder plugin that accepts messages of a specified form and generates new outgoing messages from extracted data, effectively transforming one message format into another.
Note
The Go regular expression tester is an invaluable tool for constructing and debugging regular expressions to be used for parsing your input data.
Config:
Regular expression that must match for the decoder to process the message.
Subsection defining severity strings and the numerical value they should be translated to. hekad uses numerical severity codes, so a severity of WARNING can be translated to 3 by settings in this section. See Heka Message.
Subsection defining message fields to populate and the interpolated values that should be used. Valid interpolated values are any captured in a regex in the message_matcher, and any other field that exists in the message. In the event that a captured name overlaps with a message field, the captured name’s value will be used. Optional representation metadata can be added at the end of the field name using a pipe delimiter i.e. ResponseSize|B = “%ResponseSize%” will create Fields[ResponseSize] representing the number of bytes. Adding a representation string to a standard message header name will cause it to be added as a user defined field i.e., Payload|json will create Fields[Payload] with a json representation (see Field Variables).
Interpolated values should be surrounded with % signs, for example:
[my_decoder.message_fields]
Type = "%Type%Decoded"
This will result in the new message’s Type being set to the old messages Type with Decoded appended.
A formatting string instructing hekad how to turn a time string into the actual time representation used internally. Example timestamp layouts can be seen in Go’s time documentation. In addition to the Go time formatting, special timestamp_layout values of “Epoch”, “EpochMilli”, “EpochMicro”, and “EpochNano” are supported for Unix style timestamps represented in seconds, milliseconds, microseconds, and nanoseconds since the Epoch, respectively.
Time zone in which the timestamps in the text are presumed to be in. Should be a location name corresponding to a file in the IANA Time Zone database (e.g. “America/Los_Angeles”), as parsed by Go’s time.LoadLocation() function (see http://golang.org/pkg/time/#LoadLocation). Defaults to “UTC”. Not required if valid time zone info is embedded in every parsed timestamp, since those can be parsed as specified in the timestamp_layout. This setting will have no impact if one of the supported “Epoch*” values is used as the timestamp_layout setting.
New in version 0.5.
If set to false, payloads that can not be matched against the regex will not be logged as errors. Defaults to true.
Example (Parsing Apache Combined Log Format):
[apache_transform_decoder]
type = "PayloadRegexDecoder"
match_regex = '^(?P<RemoteIP>\S+) \S+ \S+ \[(?P<Timestamp>[^\]]+)\] "(?P<Method>[A-Z]+) (?P<Url>[^\s]+)[^"]*" (?P<StatusCode>\d+) (?P<RequestSize>\d+) "(?P<Referer>[^"]*)" "(?P<Browser>[^"]*)"'
timestamp_layout = "02/Jan/2006:15:04:05 -0700"
# severities in this case would work only if a (?P<Severity>...) matching
# group was present in the regex, and the log file contained this information.
[apache_transform_decoder.severity_map]
DEBUG = 7
INFO = 6
WARNING = 4
[apache_transform_decoder.message_fields]
Type = "ApacheLogfile"
Logger = "apache"
Url|uri = "%Url%"
Method = "%Method%"
Status = "%Status%"
RequestSize|B = "%RequestSize%"
Referer = "%Referer%"
Browser = "%Browser%"
This decoder plugin accepts XML blobs in the message payload and allows you to map parts of the XML into Field attributes of the pipeline pack message using XPath syntax using the xmlpath library.
Config:
A subsection defining a capture name that maps to an XPath expression. Each expression can fetch a single value, if the expression does not resolve to a valid node in the XML blob, the capture group will be assigned an empty string value.
Subsection defining severity strings and the numerical value they should be translated to. hekad uses numerical severity codes, so a severity of WARNING can be translated to 3 by settings in this section. See Heka Message.
Subsection defining message fields to populate and the interpolated values that should be used. Valid interpolated values are any captured in an XPath in the message_matcher, and any other field that exists in the message. In the event that a captured name overlaps with a message field, the captured name’s value will be used. Optional representation metadata can be added at the end of the field name using a pipe delimiter i.e. ResponseSize|B = “%ResponseSize%” will create Fields[ResponseSize] representing the number of bytes. Adding a representation string to a standard message header name will cause it to be added as a user defined field i.e., Payload|json will create Fields[Payload] with a json representation (see Field Variables).
Interpolated values should be surrounded with % signs, for example:
[my_decoder.message_fields]
Type = "%Type%Decoded"
This will result in the new message’s Type being set to the old messages Type with Decoded appended.
A formatting string instructing hekad how to turn a time string into the actual time representation used internally. Example timestamp layouts can be seen in Go’s time documentation. The default layout is ISO8601 - the same as Javascript. In addition to the Go time formatting, special timestamp_layout values of “Epoch”, “EpochMilli”, “EpochMicro”, and “EpochNano” are supported for Unix style timestamps represented in seconds, milliseconds, microseconds, and nanoseconds since the Epoch, respectively.
Time zone in which the timestamps in the text are presumed to be in. Should be a location name corresponding to a file in the IANA Time Zone database (e.g. “America/Los_Angeles”), as parsed by Go’s time.LoadLocation() function (see http://golang.org/pkg/time/#LoadLocation). Defaults to “UTC”. Not required if valid time zone info is embedded in every parsed timestamp, since those can be parsed as specified in the timestamp_layout. This setting will have no impact if one of the supported “Epoch*” values is used as the timestamp_layout setting.
Example:
[myxml_decoder]
type = "PayloadXmlDecoder"
[myxml_decoder.xpath_map]
Count = "/some/path/count"
Name = "/some/path/name"
Pid = "//pid"
Timestamp = "//timestamp"
Severity = "//severity"
[myxml_decoder.severity_map]
DEBUG = 7
INFO = 6
WARNING = 4
[myxml_decoder.message_fields]
Pid = "%Pid%"
StatCount = "%Count%"
StatName = "%Name%"
Timestamp = "%Timestamp%"
PayloadXmlDecoder’s xpath_map config subsection supports XPath as implemented by the xmlpath library.
- All axes are supported (“child”, “following-sibling”, etc)
- All abbreviated forms are supported (”.”, “//”, etc)
- All node types except for namespace are supported
- Predicates are restricted to [N], [path], and [path=literal] forms
- Only a single predicate is supported per path step
- Richer expressions and namespaces are not supported
The ProtobufDecoder is used for Heka message objects that have been serialized into protocol buffers format. This is the format that Heka uses to communicate with other Heka instances, so one will always be included in your Heka configuration whether specified or not. The ProtobufDecoder has no configuration options.
The hekad protocol buffers message schema in defined in the message.proto file in the message package.
Example:
[ProtobufDecoder]
New in version 0.5.
Parses the rsyslog output using the string based configuration template.
Config:
The ‘template’ configuration string from rsyslog.conf. http://rsyslog-5-8-6-doc.neocities.org/rsyslog_conf_templates.html
If your rsyslog timestamp field in the template does not carry zone offset information, you may set an offset to be applied to your events here. Typically this would be used with the “Traditional” rsyslog formats.
Parsing is done by Go, supports values of “UTC”, “Local”, or a location name corresponding to a file in the IANA Time Zone database, e.g. “America/New_York”.
Example Heka Configuration
[RsyslogDecoder]
type = "SandboxDecoder"
filename = "lua_decoders/rsyslog.lua"
[RsyslogDecoder.config]
type = "RSYSLOG_TraditionalFileFormat"
template = '%TIMESTAMP% %HOSTNAME% %syslogtag%%msg:::sp-if-no-1st-sp%%msg:::drop-last-lf%\n'
tz = "America/Los_Angeles"
Example Heka Message
Timestamp: | 2014-02-10 12:58:58 -0800 PST |
---|---|
Type: | RSYSLOG_TraditionalFileFormat |
Hostname: | trink-x230 |
Pid: | 0 |
UUID: | e0eef205-0b64-41e8-a307-5772b05e16c1 |
Logger: | RsyslogInput |
Payload: | “imklog 5.8.6, log source = /proc/kmsg started.” |
EnvVersion: | |
Severity: | 7 |
Fields: | name:”programname” value_string:”kernel”
|
The SandboxDecoder provides an isolated execution environment for data parsing and complex transformations without the need to recompile Heka. See Sandbox.
Config:
Example
[sql_decoder]
type = "SandboxDecoder"
filename = "sql_decoder.lua"
New in version 0.5.
The ScribbleDecoder is a trivial decoder that makes it possible to set one or more static field values on every decoded message. It is often used in conjunction with another decoder (i.e. in a MultiDecoder w/ cascade_strategy set to “all”) to, for example, set the message type of every message to a specific custom value after the messages have been decoded from Protocol Buffers format. Note that this only supports setting the exact same value on every message, if any dynamic computation is required to determine what the value should be, or whether it should be applied to a specific message, a SandboxDecoder using the provided write_message API call should be used instead.
Config:
Subsection defining message fields to populate. Optional representation metadata can be added at the end of the field name using a pipe delimiter i.e. host|ipv4 = “192.168.55.55” will create Fields[Host] containing an IPv4 address. Adding a representation string to a standard message header name will cause it to be added as a user defined field, i.e. Payload|json will create Fields[Payload] with a json representation (see Field Variables). Does not support Timestamp or Uuid.
Example (in MultiDecoder context)
[mytypedecoder]
type = "MultiDecoder"
subs = ["ProtobufDecoder", "mytype"]
cascade_strategy = "all"
log_sub_errors = true
[ProtobufDecoder]
[mytype]
type = "ScribbleDecoder"
[mytype.message_fields]
Type = "MyType"
New in version 0.4.
The StatsToFieldsDecoder will parse time series statistics data in the graphite message format and encode the data into the message fields, in the same format produced by a StatAccumInput plugin with the emit_in_fields value set to true. This is useful if you have externally generated graphite string data flowing through Heka that you’d like to process without having to roll your own string parsing code.
This decoder has no configuration options. It simply expects to be passed messages with statsd string data in the payload. Incorrect or malformed content will cause a decoding error, dropping the message.
The fields format only contains a single “timestamp” field, so any payloads containing multiple timestamps will end up generating a separate message for each timestamp. Extra messages will be a copy of the original message except a) the payload will be empty and b) the unique timestamp and related stats will be the only message fields.
Example:
[StatsToFieldsDecoder]
There are some configuration options that are universally available to all Heka filter plugins. These will be consumed by Heka itself when Heka initializes the plugin and do not need to be handled by the plugin-specific initialization code.
Boolean expression, when evaluated to true passes the message to the filter for processing. Defaults to matching nothing. See: Message Matcher Syntax
The name of the message signer. If specified only messages with this signer are passed to the filter for processing.
Frequency (in seconds) that a timer event will be sent to the filter. Defaults to not sending timer events.
New in version 0.7.
Whether or not this plugin can exit without causing Heka to shutdown. Defaults to false for non-sandbox filters, and true for sandbox filters.
New in version 0.5.
Collects the circular buffer delta output from multiple instances of an upstream sandbox filter (the filters should all be the same version at least with respect to their cbuf output). The purpose is to recreate the view at a larger scope in each level of the aggregation i.e., host view -> datacenter view -> service level view.
Config:
Specifies whether or not this aggregator should generate cbuf deltas.
A list of anomaly detection specifications. If not specified no anomaly detection/alerting will be performed.
If preserve_data = true is set in the SandboxFilter configuration, then this value should be incremented every time the enable_delta configuration is changed to prevent the plugin from failing to start during data restoration.
Example Heka Configuration
[TelemetryServerMetricsAggregator]
type = "SandboxFilter"
message_matcher = "Logger == 'TelemetryServerMetrics' && Fields[payload_type] == 'cbufd'"
ticker_interval = 60
filename = "lua_filters/cbufd_aggregator.lua"
preserve_data = true
[TelemetryServerMetricsAggregator.config]
enable_delta = false
anomaly_config = 'roc("Request Statistics", 1, 15, 0, 1.5, true, false)'
preservation_version = 0
New in version 0.5.
Collects the circular buffer delta output from multiple instances of an upstream sandbox filter (the filters should all be the same version at least with respect to their cbuf output). Each column from the source circular buffer will become its own graph. i.e., ‘Error Count’ will become a graph with each host being represented in a column.
Config:
Pre-allocates the number of host columns in the graph(s). If the number of active hosts exceed this value, the plugin will terminate.
The number of rows to keep from the original circular buffer. Storing all the data from all the hosts is not practical since you will most likely run into memory and output size restrictions (adjust the view down as necessary).
The amount of time a host has to be inactive before it can be replaced by a new host.
If preserve_data = true is set in the SandboxFilter configuration, then this value should be incremented every time the max_hosts or rows configuration is changed to prevent the plugin from failing to start during data restoration.
Example Heka Configuration
[TelemetryServerMetricsHostAggregator]
type = "SandboxFilter"
message_matcher = "Logger == 'TelemetryServerMetrics' && Fields[payload_type] == 'cbufd'"
ticker_interval = 60
filename = "lua_filters/cbufd_host_aggregator.lua"
preserve_data = true
[TelemetryServerMetricsHostAggregator.config]
max_hosts = 5
rows = 60
host_expiration = 120
preservation_version = 0
Once per ticker interval a CounterFilter will generate a message of type heka .counter-output. The payload will contain text indicating the number of messages that matched the filter’s message_matcher value during that interval (i.e. it counts the messages the plugin received). Every ten intervals an extra message (also of type heka.counter-output) goes out, containing an aggregate count and average per second throughput of messages received.
Config:
Interval between generated counter messages, in seconds. Defaults to 5.
Example:
[CounterFilter]
message_matcher = "Type != 'heka.counter-output'"
New in version 0.7.
Graphs disk IO stats. It automatically converts the running totals of Writes and Reads into rates of the values. The time based fields are left as running totals of the amount of time doing IO. Expects to receive messages with disk IO data embedded in a particular set of message fields which matches what is generated by Linux Disk Stats Decoder: WritesCompleted, ReadsCompleted, SectorsWritten, SectorsRead, WritesMerged, ReadsMerged, TimeWriting, TimeReading, TimeDoingIO, WeightedTimeDoingIO, TickerInterval.
Config:
Sets the size of the sliding window i.e., 1440 rows representing 60 seconds per row is a 24 sliding hour window with 1 minute resolution.
anomaly_config(string) - (see Anomaly Detection Module)
Example Heka Configuration
[DiskStatsFilter]
type = "SandboxFilter"
filename = "lua_filters/diskstats.lua"
preserve_data = true
message_matcher = "Type == 'stats.diskstats'"
New in version 0.5.
Calculates the most frequent items in a data stream.
Config:
The message variable name containing the items to be counted.
The maximum size of the sample set (higher will produce a more accurate list).
Used to reduce the long tail output by only outputting the higher frequency items.
Resets the list after the specified number of days (on the UTC day boundary). A value of 0 will never reset the list.
Example Heka Configuration
[FxaAuthServerFrequentIP]
type = "SandboxFilter"
filename = "lua_filters/frequent_items.lua"
ticker_interval = 60
preserve_data = true
message_matcher = "Logger == 'nginx.access' && Type == 'fxa-auth-server'"
[FxaAuthServerFrequentIP.config]
message_variable = "Fields[remote_addr]"
max_items = 10000
min_output_weight = 100
reset_days = 1
New in version 0.6.
Graphs the Heka memory statistics using the heka.memstat message generated by pipeline/report.go.
Config:
Sets the size of the sliding window i.e., 1440 rows representing 60 seconds per row is a 24 sliding hour window with 1 minute resolution.
Sets the size of each bucket (resolution in seconds) in the sliding window.
If preserve_data = true is set in the SandboxFilter configuration, then this value should be incremented every time the rows or sec_per_row configuration is changed to prevent the plugin from failing to start during data restoration.
Example Heka Configuration
[HekaMemstat]
type = "SandboxFilter"
filename = "lua_filters/heka_memstat.lua"
ticker_interval = 60
preserve_data = true
message_matcher = "Type == 'heka.memstat'"
New in version 0.5.
Generates documentation for each unique message in a data stream. The output is a hierarchy of Logger, Type, EnvVersion, and a list of associated message field attributes including their counts (number in the brackets). This plugin is meant for data discovery/exploration and should not be left running on a production system.
Config:
<none>
Example Heka Configuration
[SyncMessageSchema]
type = "SandboxFilter"
filename = "lua_filters/heka_message_schema.lua"
ticker_interval = 60
preserve_data = false
message_matcher = "Logger =~ /^Sync/"
Example Output
New in version 0.7.
Monitors Heka’s process message failures by plugin.
Config:
A list of anomaly detection specifications. If not specified a default of ‘mww_nonparametric(“DEFAULT”, 1, 5, 10, 0.7)’ is used. The “DEFAULT” settings are applied to any plugin without an explict specification.
Example Heka Configuration
[HekaProcessMessageFailures]
type = "SandboxFilter"
filename = "lua_filters/heka_process_message_failures.lua"
ticker_interval = 60
preserve_data = false # the counts are reset on Heka restarts and the monitoring should be too.
message_matcher = "Type == 'heka.all-report'"
New in version 0.5.
Graphs HTTP status codes using the numeric Fields[status] variable collected from web server access logs.
Config:
Sets the size of each bucket (resolution in seconds) in the sliding window.
Sets the size of the sliding window i.e., 1440 rows representing 60 seconds per row is a 24 sliding hour window with 1 minute resolution.
If preserve_data = true is set in the SandboxFilter configuration, then this value should be incremented every time the sec_per_row or rows configuration is changed to prevent the plugin from failing to start during data restoration.
Example Heka Configuration
[FxaAuthServerHTTPStatus]
type = "SandboxFilter"
filename = "lua_filters/http_status.lua"
ticker_interval = 60
preserve_data = true
message_matcher = "Logger == 'nginx.access' && Type == 'fxa-auth-server'"
[FxaAuthServerHTTPStatus.config]
sec_per_row = 60
rows = 1440
anomaly_config = 'roc("HTTP Status", 2, 15, 0, 1.5, true, false) roc("HTTP Status", 4, 15, 0, 1.5, true, false) mww_nonparametric("HTTP Status", 5, 15, 10, 0.8)'
preservation_version = 0
New in version 0.7.
Graphs the load average and process count data. Expects to receive messages containing fields entitled 1MinAvg, 5MinAvg, 15MinAvg, and NumProcesses, such as those generated by the Linux Load Average Decoder.
Config:
Sets the size of each bucket (resolution in seconds) in the sliding window.
Sets the size of the sliding window i.e., 1440 rows representing 60 seconds per row is a 24 sliding hour window with 1 minute resolution.
If preserve_data = true is set in the SandboxFilter configuration, then this value should be incremented every time the sec_per_row or rows configuration is changed to prevent the plugin from failing to start during data restoration.
Example Heka Configuration
[LoadAvgFilter]
type = "SandboxFilter"
filename = "lua_filters/loadavg.lua"
ticker_interval = 60
preserve_data = true
message_matcher = "Type == 'stats.loadavg'"
New in version 0.7.
Graphs memory usage statistics. Expects to receive messages with memory usage data embedded in a specific set of message fields, which matches the messages generated by Linux Memory Stats Decoder: MemFree, Cached, Active, Inactive, VmallocUsed, Shmem, SwapCached.
Config:
Sets the size of each bucket (resolution in seconds) in the sliding window.
Sets the size of the sliding window i.e., 1440 rows representing 60 seconds per row is a 24 sliding hour window with 1 minute resolution.
If preserve_data = true is set in the SandboxFilter configuration, then this value should be incremented every time the sec_per_row or rows configuration is changed to prevent the plugin from failing to start during data restoration.
Example Heka Configuration
[MemoryStatsFilter]
type = "SandboxFilter"
filename = "lua_filters/memstats.lua"
ticker_interval = 60
preserve_data = true
message_matcher = "Type == 'stats.memstats'"
New in version 0.6.
Graphs MySQL slow query data produced by the MySQL Slow Query Log Decoder.
Config:
Sets the size of each bucket (resolution in seconds) in the sliding window.
Sets the size of the sliding window i.e., 1440 rows representing 60 seconds per row is a 24 sliding hour window with 1 minute resolution.
If preserve_data = true is set in the SandboxFilter configuration, then this value should be incremented every time the sec_per_row or rows configuration is changed to prevent the plugin from failing to start during data restoration.
Example Heka Configuration
[Sync-1_5-SlowQueries]
type = "SandboxFilter"
message_matcher = "Logger == 'Sync-1_5-SlowQuery'"
ticker_interval = 60
filename = "lua_filters/mysql_slow_query.lua"
[Sync-1_5-SlowQueries.config]
anomaly_config = 'mww_nonparametric("Statistics", 5, 15, 10, 0.8)'
preservation_version = 0
Filter plugin that accepts messages of a specfied form and uses extracted message data to feed statsd-style numerical metrics in the form of Stat objects to a StatAccumulator.
Config:
Metric:
Subsection defining a single metric to be generated. Both the name and value fields for each metric support interpolation of message field values (from ‘Type’, ‘Hostname’, ‘Logger’, ‘Payload’, or any dynamic field name) with the use of %% delimiters, so %Hostname% would be replaced by the message’s Hostname field, and %Foo% would be replaced by the first value of a dynamic field called “Foo”:
- type (string):
Metric type, supports “Counter”, “Timer”, “Gauge”.
- name (string):
Metric name, must be unique.
- value (string):
Expression representing the (possibly dynamic) value that the StatFilter should emit for each received message.
Name of a StatAccumInput instance that this StatFilter will use as its StatAccumulator for submitting generate stat values. Defaults to “StatAccumInput”.
Example:
[StatAccumInput]
ticker_interval = 5
[StatsdInput]
address = "127.0.0.1:29301"
[Hits]
type = "StatFilter"
message_matcher = 'Type == "ApacheLogfile"'
[Hits.Metric.bandwidth]
type = "Counter"
name = "httpd.bytes.%Hostname%"
value = "%Bytes%"
[Hits.Metric.method_counts]
type = "Counter"
name = "httpd.hits.%Method%.%Hostname%"
value = "1"
Note
StatFilter requires an available StatAccumInput to be running.
The sandbox filter provides an isolated execution environment for data analysis. Any output generated by the sandbox is injected into the payload of a new message for further processing or to be output.
Config:
Example:
[hekabench_counter]
type = "SandboxFilter"
message_matcher = "Type == 'hekabench'"
ticker_interval = 1
filename = "counter.lua"
preserve_data = true
profile = false
[hekabench_counter.config]
rows = 1440
sec_per_row = 60
The SandboxManagerFilter provides dynamic control (start/stop) of sandbox filters in a secure manner without stopping the Heka daemon. Commands are sent to a SandboxManagerFilter using a signed Heka message. The intent is to have one manager per access control group each with their own message signing key. Users in each group can submit a signed control message to manage any filters running under the associated manager. A signed message is not an enforced requirement but it is highly recommended in order to restrict access to this functionality.
The directory where the filter configurations, code, and states are preserved. The directory can be unique or shared between sandbox managers since the filter names are unique per manager. Defaults to a directory in ${BASE_DIR}/sbxmgrs with a name generated from the plugin name.
The directory where ‘require’ will attempt to load the external Lua modules from. Defaults to ${SHARE_DIR}/lua_modules.
The maximum number of filters this manager can run.
New in version 0.5.
The number of bytes managed sandboxes are allowed to consume before being terminated (default 8MiB).
The number of instructions managed sandboxes are allowed to execute during the process_message/timer_event functions before being terminated (default 1M).
The number of bytes managed sandbox output buffers can hold before being terminated (default 63KiB). Warning: messages exceeding 64KiB will generate an error and be discarded by the standard output plugins (File, TCP, UDP) since they exceed the maximum message size.
Example
[OpsSandboxManager]
type = "SandboxManagerFilter"
message_signer = "ops"
# message_matcher = "Type == 'heka.control.sandbox'" # automatic default setting
max_filters = 100
New in version 0.7.
Converts stat values extracted from statmetric messages (see StatAccumInput) to circular buffer data and periodically emits messages containing this data to be graphed by a DashboardOutput. Note that this filter expects the stats data to be available in the message fields, so the StatAccumInput must be configured with emit_in_fields set to true for this filter to work correctly.
Config:
Title for the graph output generated by this filter.
The number of rows to store in our circular buffer. Each row represents one time interval.
The number of seconds in each circular buffer time interval.
Space separated list of stat names. Each specified stat will be expected to be found in the fields of the received statmetric messages, and will be extracted and inserted into its own column in the accumulated circular buffer.
Space separated list of header label names to use for the extracted stats. Must be in the same order as the specified stats. Any label longer than 15 characters will be truncated.
Anomaly detection configuration, see Anomaly Detection Module.
If preserve_data = true is set in the SandboxFilter configuration, then this value should be incremented every time any edits are made to your rows, sec_per_row, stats, or stat_labels values, or else Heka will fail to start because the preserved data will no longer match the filter’s data structure.
Example Heka Configuration
[stat-graph]
type = "SandboxFilter"
filename = "lua_filters/stat_graph.lua"
ticker_interval = 10
preserve_data = true
message_matcher = "Type == 'heka.statmetric'"
[stat-graph.config]
title = "Hits and Misses"
rows = 1440
sec_per_row = 10
stats = "stats.counters.hits.count stats.counters.misses.count"
stat_labels = "hits misses"
anomaly_config = 'roc("Hits and Misses", 1, 15, 0, 1.5, true, false) roc("Hits and Misses", 2, 15, 0, 1.5, true, false)'
preservation_version = 0
New in version 0.6.
Counts the number of unique items per day e.g. active daily users by uid.
Config:
The Heka message variable containing the item to be counted.
The graph title for the cbuf output.
Specifies whether or not this plugin should generate cbuf deltas. Deltas should be enabled when sharding is used; see: Circular Buffer Delta Aggregator.
If preserve_data = true is set in the SandboxFilter configuration, then this value should be incremented every time the enable_delta configuration is changed to prevent the plugin from failing to start during data restoration.
Example Heka Configuration
[FxaActiveDailyUsers]
type = "SandboxFilter"
filename = "lua_filters/unique_items.lua"
ticker_interval = 60
preserve_data = true
message_matcher = "Logger == 'FxaAuth' && Type == 'request.summary' && Fields[path] == '/v1/certificate/sign' && Fields[errno] == 0"
[FxaActiveDailyUsers.config]
message_variable = "Fields[uid]"
title = "Estimated Active Daily Users"
preservation_version = 0
New in version 0.6.
Produces more human readable alert messages.
Config:
<none>
Example Heka Configuration
[FxaAlert]
type = "SmtpOutput"
message_matcher = "Type == 'heka.sandbox-output' && Fields[payload_type] == 'alert' && Logger =~ /^Fxa/" || Type == 'heka.sandbox-terminated' && Fields[plugin] =~ /^Fxa/"
send_from = "heka@example.com"
send_to = ["alert@example.com"]
auth = "Plain"
user = "test"
password = "testpw"
host = "localhost:25"
encoder = "AlertEncoder"
[AlertEncoder]
type = "SandboxEncoder"
filename = "lua_encoders/alert.lua"
Example Output
Timestamp: | 2014-05-14T14:20:18Z |
---|---|
Hostname: | ip-10-226-204-51 |
Plugin: | FxaBrowserIdHTTPStatus |
Alert: | HTTP Status - algorithm: roc col: 1 msg: detected anomaly, standard deviation exceeds 1.5 |
New in version 0.8.
Extracts data from SandboxFilter circular buffer output messages and uses it to generate time series JSON structures that will be accepted by Librato’s POST API. It will keep track of the last time it’s seen a particular message, keyed by filter name and output name. The first time it sees a new message, it will send data from all of the rows except the last one, which is possibly incomplete. For subsequent messages, the encoder will automatically extract data from all of the rows that have elapsed since the last message was received.
The SandboxEncoder preserve_data setting should be set to true when using this encoder, or else the list of received messages will be lost whenever Heka is restarted, possibly causing the same data rows to be sent to Librato multiple times.
Config:
<none>
Example Heka Configuration
[cbuf_librato_encoder]
type = "SandboxEncoder"
filename = "lua_encoders/cbuf_librato"
preserve_data = true
[librato]
type = "HttpOutput"
message_matcher = "Type == 'heka.sandbox-output && Fields[payload_type] == 'cbuf'"
encoder = "cbuf_librato_encoder"
address = "https://metrics-api.librato.com/v1/metrics"
username = "username@example.com"
password = "SECRET"
[librato.headers]
Content-Type = ["application/json"]
Example Output
{"gauges":[{"value":12,"measure_time":1410824950,"name":"HTTP_200","source":"thor"},{"value":1,"measure_time":1410824950,"name":"HTTP_300","source":"thor"},{"value":1,"measure_time":1410824950,"name":"HTTP_400","source":"thor"}]}
This encoder serializes a Heka message into a clean JSON format, preceded by a separate JSON structure containing information required for ElasticSearch BulkAPI indexing. The JSON serialization is done by hand, without the use of Go’s stdlib JSON marshalling. This is so serialization can succeed even if the message contains invalid UTF-8 characters, which will be encoded as U+FFFD.
Config:
Name of the ES index into which the messages will be inserted. Supports interpolation of message field values (from ‘Type’, ‘Hostname’, ‘Pid’, ‘UUID’, ‘Logger’, ‘EnvVersion’, ‘Severity’, a field name, or a timestamp format) with the use of ‘%{}’ chars, so ‘%{Hostname}-%{Logger}-data’ would add the records to an ES index called ‘some.example.com-processname-data’. Defaults to ‘heka-%{2006.01.02}’.
Name of ES record type to create. Supports interpolation of message field values (from ‘Type’, ‘Hostname’, ‘Pid’, ‘UUID’, ‘Logger’, ‘EnvVersion’, ‘Severity’, field name, or a timestamp format) with the use of ‘%{}’ chars, so ‘%{Hostname}-stat’ would create an ES record with a type of ‘some.example.com-stat’. Defaults to ‘message’.
The ‘fields’ parameter specifies that only specific message data should be indexed into ElasticSearch. Available fields to choose are “Uuid”, “Timestamp”, “Type”, “Logger”, “Severity”, “Payload”, “EnvVersion”, “Pid”, “Hostname”, and “Fields” (where “Fields” causes the inclusion of any and all dynamically specified message fields. Defaults to including all of the supported message fields.
Format to use for timestamps in generated ES documents. Defaults to “2006-01-02T15:04:05.000Z”.
When generating the index name use the timestamp from the message instead of the current time. Defaults to false.
Allows you to optionally specify the document id for ES to use. Useful for overwriting existing ES documents. If the value specified is placed within %{}, it will be interpolated to its Field value. Default is allow ES to auto-generate the id.
This specifies a set of fields which will be passed through to the encoded JSON output without any processing or escaping. This is useful for fields which contain embedded JSON objects to prevent the embedded JSON from being escaped as normal strings. Only supports dynamically specified message fields.
Example
[ESJsonEncoder]
index = "%{Type}-%{2006.01.02}"
es_index_from_timestamp = true
type_name = "%{Type}"
[ElasticSearchOutput]
message_matcher = "Type == 'nginx.access'"
encoder = "ESJsonEncoder"
flush_interval = 50
This encoder serializes a Heka message into a JSON format, preceded by a separate JSON structure containing information required for ElasticSearch BulkAPI indexing. The message JSON structure uses the original (i.e. “v0”) schema popularized by Logstash. Using this schema can aid integration with existing Logstash deployments. This schema also plays nicely with the default Logstash dashboard provided by Kibana.
The JSON serialization is done by hand, without using Go’s stdlib JSON marshalling. This is so serialization can succeed even if the message contains invalid UTF-8 characters, which will be encoded as U+FFFD.
Config:
Name of the ES index into which the messages will be inserted. Supports interpolation of message field values (from ‘Type’, ‘Hostname’, ‘Pid’, ‘UUID’, ‘Logger’, ‘EnvVersion’, ‘Severity’, a field name, or a timestamp format) with the use of ‘%{}’ chars, so ‘%{Hostname}-%{Logger}-data’ would add the records to an ES index called ‘some.example.com-processname-data’. Defaults to ‘logstash-%{2006.01.02}’.
Name of ES record type to create. Supports interpolation of message field values (from ‘Type’, ‘Hostname’, ‘Pid’, ‘UUID’, ‘Logger’, ‘EnvVersion’, ‘Severity’, field name, or a timestamp format) with the use of ‘%{}’ chars, so ‘%{Hostname}-stat’ would create an ES record with a type of ‘some.example.com-stat’. Defaults to ‘message’.
If false, the generated JSON’s @type value will match the ES record type specified in the type_name setting. If true, the message’s Type value will be used as the @type value instead. Defaults to false.
The ‘fields’ parameter specifies that only specific message data should be indexed into ElasticSearch. Available fields to choose are “Uuid”, “Timestamp”, “Type”, “Logger”, “Severity”, “Payload”, “EnvVersion”, “Pid”, “Hostname”, and “Fields” (where “Fields” causes the inclusion of any and all dynamically specified message fields. Defaults to including all of the supported message fields. The “Payload” field is sent to ElasticSearch as “@message”.
When generating the index name use the timestamp from the message instead of the current time. Defaults to false.
Allows you to optionally specify the document id for ES to use. Useful for overwriting existing ES documents. If the value specified is placed within %{}, it will be interpolated to its Field value. Default is allow ES to auto-generate the id.
This specifies a set of fields which will be passed through to the encoded JSON output without any processing or escaping. This is useful for fields which contain embedded JSON objects to prevent the embedded JSON from being escaped as normal strings. Only supports dynamically specified message fields.
Example
[ESLogstashV0Encoder]
es_index_from_timestamp = true
type_name = "%{Type}"
[ElasticSearchOutput]
message_matcher = "Type == 'nginx.access'"
encoder = "ESLogstashV0Encoder"
flush_interval = 50
Prepends ElasticSearch BulkAPI index JSON to a message payload.
Config:
String to use as the _index key’s value in the generated JSON. Supports field interpolation as described below.
String to use as the _type key’s value in the generated JSON. Supports field interpolation as described below.
String to use as the _id key’s value in the generated JSON. Supports field interpolation as described below.
If true, then any time interpolation (often used to generate the ElasticSeach index) will use the timestamp from the processed message rather than the system time.
Field interpolation:
Data from the current message can be interpolated into any of the string arguments listed above. A %{} enclosed field name will be replaced by the field value from the current message. Supported default field names are “Type”, “Hostname”, “Pid”, “UUID”, “Logger”, “EnvVersion”, and “Severity”. Any other values will be checked against the defined dynamic message fields. If no field matches, then a C strftime (on non-Windows platforms) or C89 strftime (on Windows) time substitution will be attempted.
Example Heka Configuration
[es_payload]
type = "SandboxEncoder"
filename = "lua_encoders/es_payload.lua"
[es_payload.config]
es_index_from_timestamp = true
index = "%{Logger}-%{%Y.%m.%d}"
type_name = "%{Type}-%{Hostname}"
[ElasticSearchOutput]
message_matcher = "Type == 'mytype'"
encoder = "es_payload"
Example Output
{"index":{"_index":"mylogger-2014.06.05","_type":"mytype-host.domain.com"}}
{"json":"data","extracted":"from","message":"payload"}
The PayloadEncoder simply extracts the payload from the provided Heka message and converts it into a byte stream for delivery to an external resource.
Config:
Specifies whether or not a newline character (i.e. n) will be appended to the captured message payload before serialization. Defaults to true.
Specifies whether a timestamp will be prepended to the captured message payload before serialization. Defaults to false.
If true, the prepended timestamp will be extracted from the message that is being processed. If false, the prepended timestamp will be generated by the system clock at the time of message processing. Defaults to true. This setting has no impact if prefix_ts is set to false.
Specifies the format that should be used for prepended timestamps, using Go’s standard time format specification strings. Defaults to [2006/Jan/02:15:04:05 -0700]. If the specified format string does not end with a space character, then a space will be inserted between the formatted timestamp and the payload.
Example
[PayloadEncoder]
append_newlines = false
prefix_ts = true
ts_format = "2006/01/02 3:04:05PM MST"
The ProtobufEncoder is used to serialize Heka message objects back into Heka’s standard protocol buffers format. This is the format that Heka uses to communicate with other Heka instances, so one will always be included in your Heka configuration using the default “ProtobufEncoder” name whether specified or not.
The hekad protocol buffers message schema is defined in the message.proto file in the message package.
Config:
<none>
Example:
[ProtobufEncoder]
The RstEncoder generates a reStructuredText rendering of a Heka message, including all fields and attributes. It is useful for debugging, especially when coupled with a LogOutput.
Config:
<none>
Example:
[RstEncoder]
[LogOutput]
message_matcher = "TRUE"
encoder = "RstEncoder"
The SandboxEncoder provides an isolated execution environment for converting messages into binary data without the need to recompile Heka. See Sandbox.
Config:
Example
[custom_json_encoder]
type = "SandboxEncoder"
filename = "path/to/custom_json_encoder.lua"
[custom_json_encoder.config]
msg_fields = ["field1", "field2"]
New in version 0.8.
Converts full Heka message contents to JSON for InfluxDB HTTP API. Includes all standard message fields and iterates through all of the dynamically specified fields, skipping any bytes fields or any fields explicitly omitted using the skip_fields config option.
Config:
String to use as the series key’s value in the generated JSON. Supports interpolation of field values from the processed message, using %{fieldname}. Any fieldname values of “Type”, “Payload”, “Hostname”, “Pid”, “Logger”, “Severity”, or “EnvVersion” will be extracted from the the base message schema, any other values will be assumed to refer to a dynamic message field. Only the first value of the first instance of a dynamic message field can be used for series name interpolation. If the dynamic field doesn’t exist, the uninterpolated value will be left in the series name. Note that it is not possible to interpolate either the “Timestamp” or the “Uuid” message fields into the series name, those values will be interpreted as referring to dynamic message fields.
Space delimited set of fields that should not be included in the InfluxDB records being generated. Any fieldname values of “Type”, “Payload”, “Hostname”, “Pid”, “Logger”, “Severity”, or “EnvVersion” will be assumed to refer to the corresponding field from the base message schema, any other values will be assumed to refer to a dynamic message field.
Example Heka Configuration
[influxdb]
type = "SandboxEncoder"
filename = "lua_encoders/influxdb.lua"
[influxdb.config]
series = "heka.%{Logger}"
skip_fields = "Pid EnvVersion"
[InfluxOutput]
message_matcher = "Type == 'influxdb'"
encoder = "influxdb"
type = "HttpOutput"
address = "http://influxdbserver.example.com:8086/db/databasename/series"
username = "influx_username"
password = "influx_password"
Example Output
[{"points":[[1.409378221e+21,"log","test","systemName","TcpInput",5,1,"test"]],"name":"heka.MyLogger","columns":["Time","Type","Payload","Hostname","Logger","Severity","syslogfacility","programname"]}]
New in version 0.7.
Extracts data from message fields in heka.statmetric messages generated by a StatAccumInput and generates JSON suitable for use with InfluxDB’s HTTP API. StatAccumInput must be configured with emit_in_fields = true for this encoder to work correctly.
Config:
<none>
Example Heka Configuration
[statmetric-influx-encoder]
type = "SandboxEncoder"
filename = "lua_encoders/statmetric_influx.lua"
[influx]
type = "HttpOutput"
message_matcher = "Type == 'heka.statmetric'"
address = "http://myinfluxserver.example.com:8086/db/stats/series"
encoder = "statmetric-influx-encoder"
username = "influx_username"
password = "influx_password"
Example Output
[{"points":[[1408404848,78271]],"name":"stats.counters.000000.rate","columns":["time","value"]},{"points":[[1408404848,78271]],"name":"stats.counters.000000.count","columns":["time","value"]},{"points":[[1408404848,17420]],"name":"stats.timers.000001.count","columns":["time","value"]},{"points":[[1408404848,17420]],"name":"stats.timers.000001.count_ps","columns":["time","value"]},{"points":[[1408404848,1]],"name":"stats.timers.000001.lower","columns":["time","value"]},{"points":[[1408404848,1024]],"name":"stats.timers.000001.upper","columns":["time","value"]},{"points":[[1408404848,8937851]],"name":"stats.timers.000001.sum","columns":["time","value"]},{"points":[[1408404848,513.07985074627]],"name":"stats.timers.000001.mean","columns":["time","value"]},{"points":[[1408404848,461.72356167879]],"name":"stats.timers.000001.mean_90","columns":["time","value"]},{"points":[[1408404848,925]],"name":"stats.timers.000001.upper_90","columns":["time","value"]},{"points":[[1408404848,2]],"name":"stats.statsd.numStats","columns":["time","value"]}]
There are some configuration options that are universally available to all Heka output plugins. These will be consumed by Heka itself when Heka initializes the plugin and do not need to be handled by the plugin-specific initialization code.
Boolean expression, when evaluated to true passes the message to the filter for processing. Defaults to matching nothing. See: Message Matcher Syntax
The name of the message signer. If specified only messages with this signer are passed to the filter for processing.
Frequency (in seconds) that a timer event will be sent to the filter. Defaults to not sending timer events.
New in version 0.6.
Encoder to be used by the output. This should refer to the name of an encoder plugin section that is specified elsewhere in the TOML configuration. Messages can be encoded using the specified encoder by calling the OutputRunner’s Encode() method.
New in version 0.6.
Specifies whether or not Heka’s Stream Framing should be applied to the binary data returned from the OutputRunner’s Encode() method.
New in version 0.7.
Whether or not this plugin can exit without causing Heka to shutdown. Defaults to false.
Connects to a remote AMQP broker (RabbitMQ) and sends messages to the specified queue. The message is serialized if specified, otherwise only the raw payload of the message will be sent. As AMQP is dynamically programmable, the broker topology needs to be specified.
Config:
An AMQP connection string formatted per the RabbitMQ URI Spec.
AMQP exchange name
AMQP exchange type (fanout, direct, topic, or headers).
Whether the exchange should be configured as a durable exchange. Defaults to non-durable.
Whether the exchange is deleted when all queues have finished and there is no publishing. Defaults to auto-delete.
The message routing key used to bind the queue to the exchange. Defaults to empty string.
Whether published messages should be marked as persistent or transient. Defaults to non-persistent.
A sub-section that specifies the settings to be used for restart behavior. See Configuring Restarting Behavior
New in version 0.6.
MIME content type of the payload used in the AMQP header. Defaults to “application/hekad”.
Specifies which of the registered encoders should be used for converting Heka messages to binary data that is sent out over the AMQP connection. Defaults to the always available “ProtobufEncoder”.
Specifies whether or not the encoded data sent out over the TCP connection should be delimited by Heka’s Stream Framing. Defaults to true.
New in version 0.6.
An optional sub-section that specifies the settings to be used for any SSL/TLS encryption. This will only have any impact if URL uses the AMQPS URI scheme. See Configuring TLS.
Example (that sends log lines from the logger):
[AMQPOutput]
url = "amqp://guest:guest@rabbitmq/"
exchange = "testout"
exchange_type = "fanout"
message_matcher = 'Logger == "TestWebserver"'
CarbonOutput plugins parse the “stat metric” messages generated by a StatAccumulator and write the extracted counter, timer, and gauge data out to a graphite compatible carbon daemon. Output is written over a TCP or UDP socket using the plaintext protocol.
Config:
An IP address:port on which this plugin will write to. (default: “localhost:2003”)
New in version 0.5.
“tcp” or “udp” (default: “tcp”)
if set, keep the TCP connection open and reuse it until a failure; then retry (default: false)
Example:
[CarbonOutput]
message_matcher = "Type == 'heka.statmetric'"
address = "localhost:2003"
protocol = "udp"
Specialized output plugin that listens for certain Heka reporting message types and generates JSON data which is made available via HTTP for use in web based dashboards and health reports.
Config:
Specifies how often, in seconds, the dashboard files should be updated. Defaults to 5.
Defaults to “Type == ‘heka.all-report’ || Type == ‘heka.sandbox-output’ || Type == ‘heka.sandbox-terminated’”. Not recommended to change this unless you know what you’re doing.
An IP address:port on which we will serve output via HTTP. Defaults to “0.0.0.0:4352”.
File system directory into which the plugin will write data files and from which it will serve HTTP. The Heka process must have read / write access to this directory. Relative paths will be evaluated relative to the Heka base directory. Defaults to $(BASE_DIR)/dashboard.
File system directory where the Heka dashboard source code can be found. The Heka process must have read access to this directory. Relative paths will be evaluated relative to the Heka base directory. Defaults to ${SHARE_DIR}/dasher.
New in version 0.7.
It is possible to inject arbitrary HTTP headers into each outgoing response by adding a TOML subsection entitled “headers” to you HttpOutput config section. All entries in the subsection must be a list of string values.
Example:
[DashboardOutput]
ticker_interval = 30
Output plugin that uses HTTP or UDP to insert records into an ElasticSearch database. Note that it is up to the specified encoder to both serialize the message into a JSON structure and to prepend that with the appropriate ElasticSearch BulkAPI indexing JSON. Usually this output is used in conjunction with an ElasticSearch-specific encoder plugin, such as ESJsonEncoder, ESLogstashV0Encoder, or ESPayloadEncoder.
Config:
Interval at which accumulated messages should be bulk indexed into ElasticSearch, in milliseconds. Defaults to 1000 (i.e. one second).
Number of messages that, if processed, will trigger them to be bulk indexed into ElasticSearch. Defaults to 10.
Time in milliseconds to wait for a response for each http post to ES. This may drop data as there is currently no retry. Default is 0 (no timeout).
Specifies whether or not re-using of established TCP connections to ElasticSearch should be disabled. Defaults to false, that means using both HTTP keep-alive mode and TCP keep-alives. Set it to true to close each TCP connection after ‘flushing’ messages to ElasticSearch.
Example:
[ElasticSearchOutput]
message_matcher = "Type == 'sync.log'"
server = "http://es-server:9200"
flush_interval = 5000
flush_count = 10
encoder = "ESJsonEncoder"
Writes message data out to a file system.
Config:
Full path to the output file.
File permission for writing. A string of the octal digit representation. Defaults to “644”.
Permissions to apply to directories created for FileOutput’s parent directory if it doesn’t exist. Must be a string representation of an octal integer. Defaults to “700”.
Interval at which accumulated file data should be written to disk, in milliseconds (default 1000, i.e. 1 second). Set to 0 to disable.
Number of messages to accumulate until file data should be written to disk (default 1, minimum 1).
Operator describing how the two parameters “flush_interval” and “flush_count” are combined. Allowed values are “AND” or “OR” (default is “AND”).
New in version 0.6.
Specifies whether or not the encoded data sent out over the TCP connection should be delimited by Heka’s Stream Framing. Defaults to true if a ProtobufEncoder is used, false otherwise.
Example:
[counter_file]
type = "FileOutput"
message_matcher = "Type == 'heka.counter-output'"
path = "/var/log/heka/counter-output.log"
prefix_ts = true
perm = "666"
flush_count = 100
flush_operator = "OR"
encoder = "PayloadEncoder"
New in version 0.6.
A very simple output plugin that uses HTTP GET, POST, or PUT requests to deliver data to an HTTP endpoint. When using POST or PUT request methods the encoded output will be uploaded as the request body. When using GET the encoded output will be ignored.
This output doesn’t support any request batching; each received message will generate an HTTP request. Batching can be achieved by use of a filter plugin that accumulates message data, periodically emitting a single message containing the batched, encoded HTTP request data in the payload. An HttpOutput can then be configured to capture these batch messages, using a PayloadEncoder to extract the message payload.
For now the HttpOutput only supports statically defined request parameters (URL, headers, auth, etc.). Future iterations will provide a mechanism for dynamically specifying these values on a per-message basis.
Config:
HTTP request method to use, must be one of GET, POST, or PUT. Defaults to POST.
If specified, HTTP Basic Auth will be used with the provided user name.
If specified, HTTP Basic Auth will be used with the provided password.
It is possible to inject arbitrary HTTP headers into each outgoing request by adding a TOML subsection entitled “headers” to you HttpOutput config section. All entries in the subsection must be a list of string values.
A sub-section that specifies the settings to be used for any SSL/TLS encryption. This will only have any impact if an “https://” address is used. See Configuring TLS.
Example:
[PayloadEncoder]
[influxdb]
message_matcher = "Type == 'influx.formatted'"
address = "http://influxdb.example.com:8086/db/stats/series"
encoder = "PayloadEncoder"
username = "MyUserName"
password = "MyPassword"
Connects to an Irc Server and sends messages to the specified Irc channels. Output is encoded using the specified encoder, and expects output to be properly truncated to fit within the bounds of an Irc message before being receiving the output.
Config:
A host:port of the irc server that Heka will connect to for sending output.
Irc nick used by Heka.
The Irc identity used to login with by Heka.
The password used to connect to the Irc server.
A list of Irc channels which every matching Heka message is sent to. If there is a space in the channel string, then the part after the space is expected to be a password for a protected irc channel.
The maximum amount of time (in seconds) to wait before timing out when connect, reading, or writing to the Irc server. Defaults to 10.
A sub-section that specifies the settings to be used for any SSL/TLS encryption. This will only have any impact if use_tls is set to true. See Configuring TLS.
This is the maximum amount of messages Heka will queue per Irc channel before discarding messages. There is also a queue of the same size used if all per-irc channel queues are full. This is used when Heka is unable to send a message to an Irc channel, such as when it hasn’t joined or has been disconnected. Defaults to 100.
Set this if you want Heka to automatically re-join an Irc channel after being kicked. If not set, and Heka is kicked, it will not attempt to rejoin ever. Defaults to false.
How often (in seconds) heka should send a message to the server. This is on a per message basis, not per channel. Defaults to 2.
How long to wait (in seconds) before reconnecting to the Irc server after being disconnected. Defaults to 3.
How long to wait (in seconds) before attempting to rejoin an Irc channel which is full. Defaults to 3.
The maximum amount of attempts Heka will attempt to join an Irc channel before giving up. After attempts are exhausted, Heka will no longer attempt to join the channel. Defaults to 3.
Enable to see raw internal message events Heka is receiving from the server. Defaults to false.
Specifies which of the registered encoders should be used for converting Heka messages into what is sent to the irc channels.
A sub-section that specifies the settings to be used for restart behavior. See Configuring Restarting Behavior
Example:
[IrcOutput]
message_matcher = 'Type == "alert"'
encoder = "PayloadEncoder"
server = "irc.mozilla.org:6667"
nick = "heka_bot"
ident = "heka_ident"
channels = [ "#heka_bot_irc testkeypassword" ]
rejoin_on_kick = true
queue_size = 200
ticker_interval = 1
Logs messages to stdout using Go’s log package.
Config:
<none>
Example:
[counter_output]
type = "LogOutput"
message_matcher = "Type == 'heka.counter-output'"
encoder = "PayloadEncoder"
Specialized output plugin that listens for Nagios external command message types and delivers passive service check results to Nagios using either HTTP requests made to the Nagios cmd.cgi API or the use of the send_ncsa binary. The message payload must consist of a state followed by a colon and then the message e.g., “OK:Service is functioning properly”. The valid states are: OK|WARNING|CRITICAL|UNKNOWN. Nagios must be configured with a service name that matches the Heka plugin instance name and the hostname where the plugin is running.
Config:
An HTTP URL to the Nagios cmd.cgi. Defaults to http://localhost/nagios/cgi-bin/cmd.cgi.
Username used to authenticate with the Nagios web interface. Defaults to empty string.
Password used to authenticate with the Nagios web interface. Defaults to empty string.
Specifies the amount of time, in seconds, to wait for a server’s response headers after fully writing the request. Defaults to 2.
Must match Nagios service’s service_description attribute. Defaults to the name of the output.
Must match the hostname of the server in nagios. Defaults to the Hostname attribute of the message.
New in version 0.5.
Use send_nsca program, as provided, rather than sending HTTP requests. Not supplying this value means HTTP will be used, and any other send_nsca_* settings will be ignored.
New in version 0.5.
Arguments to use with send_nsca, usually at least the nagios hostname, e.g. [“-H”, “nagios.somehost.com”]. Defaults to an empty list.
New in version 0.5.
Timeout for the send_nsca command, in seconds. Defaults to 5.
New in version 0.5.
Specifies whether or not SSL/TLS encryption should be used for the TCP connections. Defaults to false.
New in version 0.5.
A sub-section that specifies the settings to be used for any SSL/TLS encryption. This will only have any impact if use_tls is set to true. See Configuring TLS.
Example configuration to output alerts from SandboxFilter plugins:
[NagiosOutput]
url = "http://localhost/nagios/cgi-bin/cmd.cgi"
username = "nagiosadmin"
password = "nagiospw"
message_matcher = "Type == 'heka.sandbox-output' && Fields[payload_type] == 'nagios-external-command' && Fields[payload_name] == 'PROCESS_SERVICE_CHECK_RESULT'"
Example Lua code to generate a Nagios alert:
inject_payload("nagios-external-command", "PROCESS_SERVICE_CHECK_RESULT", "OK:Alerts are working!")
New in version 0.5.
Outputs a Heka message in an email. The message subject is the plugin name and the message content is controlled by the payload_only setting. The primary purpose is for email alert notifications e.g., PagerDuty.
Config:
The email address of the sender. (default: “heka@localhost.localdomain”)
An array of email addresses where the output will be sent to.
Custom subject line of email. (default: “Heka [SmtpOutput]”)
SMTP host to send the email to (default: “127.0.0.1:25”)
SMTP authentication type: “none”, “Plain”, “CRAMMD5” (default: “none”)
SMTP user name
SMTP user password
Example:
[FxaAlert]
type = "SmtpOutput"
message_matcher = "((Type == 'heka.sandbox-output' && Fields[payload_type] == 'alert') || Type == 'heka.sandbox-terminated') && Logger =~ /^Fxa/"
send_from = "heka@example.com"
send_to = ["alert@example.com"]
auth = "Plain"
user = "test"
password = "testpw"
host = "localhost:25"
encoder = "AlertEncoder"
Output plugin that delivers Heka message data to a listening TCP connection. Can be used to deliver messages from a local running Heka agent to a remote Heka instance set up as an aggregator and/or router, or to any other arbitrary listening TCP server that knows how to process the encoded data.
Config:
An IP address:port to which we will send our output data.
Specifies whether or not SSL/TLS encryption should be used for the TCP connections. Defaults to false.
New in version 0.5.
A sub-section that specifies the settings to be used for any SSL/TLS encryption. This will only have any impact if use_tls is set to true. See Configuring TLS.
Specifies how often, in seconds, the output queue files are rolled. Defaults to 300.
New in version 0.6.
A local IP address to use as the source address for outgoing traffic to this destination. Cannot currently be combined with TLS connections.
Specifies which of the registered encoders should be used for converting Heka messages to binary data that is sent out over the TCP connection. Defaults to the always available “ProtobufEncoder”.
Specifies whether or not the encoded data sent out over the TCP connection should be delimited by Heka’s Stream Framing. Defaults to true if a ProtobufEncoder is used, false otherwise.
Specifies whether or not TCP keepalive should be used for established TCP connections. Defaults to false.
Time duration in seconds that a TCP connection will be maintained before keepalive probes start being sent. Defaults to 7200 (i.e. 2 hours).
Example:
[aggregator_output]
type = "TcpOutput"
address = "heka-aggregator.mydomain.com:55"
local_address = "127.0.0.1"
message_matcher = "Type != 'logfile' && Type != 'heka.counter-output' && Type != 'heka.all-report'"
New in version 0.7.
Output plugin that delivers Heka message data to a specified UDP or Unix datagram socket location.
Config:
Network type to use for communication. Must be one of “udp”, “udp4”, “udp6”, or “unixgram”. “unixgram” option only available on systems that support Unix datagram sockets. Defaults to “udp”.
Address to which we will be sending the data. Must be IP:port for net types of “udp”, “udp4”, or “udp6”. Must be a path to a Unix datagram socket file for net type “unixgram”.
Local address to use on the datagram packets being generated. Must be IP:port for net types of “udp”, “udp4”, or “udp6”. Must be a path to a Unix datagram socket file for net type “unixgram”.
Name of registered encoder plugin that will extract and/or serialized data from the Heka message.
Example:
[PayloadEncoder]
[UdpOutput]
address = "myserver.example.com:34567"
encoder = "PayloadEncoder"
WhisperOutput plugins parse the “statmetric” messages generated by a StatAccumulator and write the extracted counter, timer, and gauge data out to a graphite compatible whisper database file tree structure.
Config:
Path to the base directory where the whisper file tree will be written. Absolute paths will be honored, relative paths will be calculated relative to the Heka base directory. Defaults to “whisper” (i.e. “$(BASE_DIR)/whisper”).
Default aggregation method to use for each whisper output file. Supports the following values:
Default specification for new whisper db archives. Should be a sequence of 3-tuples, where each tuple describes a time interval’s storage policy: [<offset> <# of secs per datapoint> <# of datapoints>] (see whisper docs for more info). Defaults to:
[ [0, 60, 1440], [0, 900, 8], [0, 3600, 168], [0, 43200, 1456]]
The above defines four archive sections. The first uses 60 seconds for each of 1440 data points, which equals one day of retention. The second uses 15 minutes for each of 8 data points, for two hours of retention. The third uses one hour for each of 168 data points, or 7 days of retention. Finally, the fourth uses 12 hours for each of 1456 data points, representing two years of data.
Permission mask to be applied to folders created in the whisper database file tree. Must be a string representation of an octal integer. Defaults to “700”.
Example:
[WhisperOutput]
message_matcher = "Type == 'heka.statmetric'"
default_agg_method = 3
default_archive_info = [ [0, 30, 1440], [0, 900, 192], [0, 3600, 168], [0, 43200, 1456] ]
folder_perm = "755"
Heka can emit metrics about it’s internal state to either an outgoing Heka message (and, through the DashboardOutput, to a web dashboard) or to stdout. Sending SIGUSR1 to hekad on a UNIX will send a plain text report tostdout. On Windows, you will need to send signal 10 to the hekad process using Powershell.
Sample text output
========[heka.all-report]========
inputRecycleChan:
InChanCapacity: 100
InChanLength: 99
injectRecycleChan:
InChanCapacity: 100
InChanLength: 98
Router:
InChanCapacity: 50
InChanLength: 0
ProcessMessageCount: 26
ProtobufDecoder-0:
InChanCapacity: 50
InChanLength: 0
ProtobufDecoder-1:
InChanCapacity: 50
InChanLength: 0
ProtobufDecoder-2:
InChanCapacity: 50
InChanLength: 0
ProtobufDecoder-3:
InChanCapacity: 50
InChanLength: 0
DecoderPool-ProtobufDecoder:
InChanCapacity: 4
InChanLength: 4
OpsSandboxManager:
InChanCapacity: 50
InChanLength: 0
MatchChanCapacity: 50
MatchChanLength: 0
MatchAvgDuration: 0
ProcessMessageCount: 0
hekabench_counter:
InChanCapacity: 50
InChanLength: 0
MatchChanCapacity: 50
MatchChanLength: 0
MatchAvgDuration: 445
ProcessMessageCount: 0
InjectMessageCount: 0
Memory: 20644
MaxMemory: 20644
MaxInstructions: 18
MaxOutput: 0
ProcessMessageAvgDuration: 0
TimerEventAvgDuration: 78532
LogOutput:
InChanCapacity: 50
InChanLength: 0
MatchChanCapacity: 50
MatchChanLength: 0
MatchAvgDuration: 406
DashboardOutput:
InChanCapacity: 50
InChanLength: 0
MatchChanCapacity: 50
MatchChanLength: 0
MatchAvgDuration: 336
========
To enable the HTTP interface, you will need to enable the dashboard output plugin, see DashboardOutput.
The core of the Heka engine is written in the Go programming language. Heka supports five different types of plugins (inputs, decoders, filters, encoders, and outputs), which are also written in Go. This document will try to provide enough information for developers to extend Heka by implementing their own custom plugins. It assumes a small amount of familiarity with Go, although any reasonably experienced programmer will probably be able to follow along with no trouble.
NOTE: Heka also supports the use of security sandboxed Lua code for implementing the core logic of decoder, filter, and encoder plugins. This document only covers the development of Go plugins. You can learn more about sandboxed plugins in the Sandbox section.
Each Heka plugin type performs a specific task: inputs receive input from the outside world and inject the data into the Heka pipeline, decoders turn binary data into Message objects that Heka can process, filters perform arbitrary processing of Heka message data, encoders serialize Heka messages into arbitrary byte streams, and outputs send data from Heka back to the outside world. Each specific plugin has some custom behaviour, but it also shares behaviour w/ every other plugin of that type. A UDPInput and a TCPInput listen on the network differently, and a LogstreamerInput (reading files off the file system) doesn’t listen on the network at all, but all of these inputs need to interact w/ the Heka system to access data structures, gain access to decoders to which we pass our incoming data, respond to shutdown and other system events, etc.
To support this all Heka plugins except encoders actually consist of two parts: the plugin itself, and an accompanying “plugin runner”. Inputs have an InputRunner, decoders have a DecoderRunner, filters have a FilterRunner, and Outputs have an OutputRunner. The plugin itself contains the plugin-specific behaviour, and is provided by the plugin developer. The plugin runner contains the shared (by type) behaviour, and is provided by Heka. When Heka starts a plugin, it a) creates and configures a plugin instance of the appropriate type, b) creates a plugin runner instance of the appropriate type (passing in the plugin), and c) calls the Start method of the plugin runner. Most plugin runners (all except decoders) then call the plugin’s Run method, passing themselves and an additional PluginHelper object in as arguments so the plugin code can use their exposed APIs to interact w/ the Heka system.
For inputs, filters, and outputs, there’s a 1:1 correspondence between sections specified in the config file and running plugin instances. This is not the case for decoders and encoders, however. Decoder and encoder sections register possible configurations, but actual decoder and encoder instances aren’t created until they are used by input or output plugins.
Heka uses a slightly modified version of TOML as its configuration file format (see: Configuring hekad), and provides a simple mechanism through which plugins can integrate with the configuration loading system to initialize themselves from settings in hekad’s config file.
The minimal shared interface that a Heka plugin must implement in order to use the config system is (unsurprisingly) Plugin, defined in pipeline_runner.go:
type Plugin interface {
Init(config interface{}) error
}
During Heka initialization an instance of every plugin listed in the configuration file will be created. The TOML configuration for each plugin will be parsed and the resulting configuration object will be passed in to the above specified Init method. The argument is of type interface{}. By default the underlying type will be *pipeline.PluginConfig, a map object that provides config data as key/value pairs. There is also a way for plugins to specify a custom struct to be used instead of the generic PluginConfig type (see Custom Plugin Config Structs). In either case, the config object will be already loaded with values read in from the TOML file, which your plugin code can then use to initialize itself. The input, filter, and output plugins will then be started so they can begin processing messages. The decoder and encoder instances will be thrown away, with new ones created as needed when requested by input (for decoder) or output (for encoder) plugins.
As an example, imagine we’re writing a filter that will deliver messages to a specific output plugin, but only if they come from a list of approved hosts. Both ‘hosts’ and ‘output’ would be required in the plugin’s config section. Here’s one version of what the plugin definition and Init method might look like:
type HostFilter struct {
hosts map[string]bool
output string
}
// Extract hosts value from config and store it on the plugin instance.
func (f *HostFilter) Init(config interface{}) error {
var (
hostsConf interface{}
hosts []interface{}
host string
outputConf interface{}
ok bool
)
conf := config.(pipeline.PluginConfig)
if hostsConf, ok = conf["hosts"]; !ok {
return errors.New("No 'hosts' setting specified.")
}
if hosts, ok = hostsConf.([]interface{}); !ok {
return errors.New("'hosts' setting not a sequence.")
}
if outputConf, ok = conf["output"]; !ok {
return errors.New("No 'output' setting specified.")
}
if f.output, ok = outputConf.(string); !ok {
return errors.New("'output' setting not a string value.")
}
f.hosts = make(map[string]bool)
for _, h := range hosts {
if host, ok = h.(string); !ok {
return errors.New("Non-string host value.")
}
f.hosts[host] = true
}
return nil
}
(Note that this is a bit of a contrived example. In practice, you would generally route messages to specific outputs using the Message Matcher Syntax.)
In the event that your plugin fails to initialize properly at startup, hekad will exit. However, once hekad is running, if the plugin should fail (perhaps because a network connection dropped, a file became unavailable, etc), then the plugin will exit. If your plugin supports being restarted then when it exits it will be recreated, and restarted until it exhausts its max retry attempts. At which point it will exit, and heka will shutdown if not configured with can_exit.
To add restart support to your plugin, the Restarting interface defined in the config.go file:
type Restarting interface {
CleanupForRestart()
}
A plugin that implements this interface will not trigger shutdown should it fail while hekad is running. The CleanupForRestart method will be called when the plugins’ main run method exits, a single time. Then the runner will repeatedly call the plugins Init method until it initializes successfully. It will then resume running it unless it exits again at which point the restart process will begin anew.
In simple cases it might be fine to get plugin configuration data as a generic map of keys and values, but if there are more than a couple of config settings then checking for, extracting, and validating the values quickly becomes a lot of work. Heka plugins can instead specify a schema struct for their configuration data, into which the TOML configuration will be decoded.
Plugins that wish to provide a custom configuration struct should implement the HasConfigStruct interface defined in the config.go file:
type HasConfigStruct interface {
ConfigStruct() interface{}
}
Any plugin that implements this method should return a struct that can act as the schema for the plugin configuration. Heka’s config loader will then try to decode the plugin’s TOML config into this struct. Note that this also gives you a way to specify default config values; you just populate your config struct as desired before returning it from the ConfigStruct method.
Let’s look at the code for Heka’s UdpOutput, which delivers messages to a UDP listener somewhere. The initialization code looks as follows:
// This is our plugin struct.
type UdpOutput struct {
*UdpOutputConfig
conn net.Conn
}
// This is our plugin's config struct
type UdpOutputConfig struct {
// Network type ("udp", "udp4", "udp6", or "unixgram"). Needs to match the
// input type.
Net string
// String representation of the address of the network connection to which
// we will be sending out packets (e.g. "192.168.64.48:3336").
Address string
// Optional address to use as the local address for the connection.
LocalAddress string `toml:"local_address"`
}
// Provides pipeline.HasConfigStruct interface.
func (o *UdpOutput) ConfigStruct() interface{} {
return &UdpOutputConfig{
Net: "udp",
}
}
// Initialize UDP connection
func (o *UdpOutput) Init(config interface{}) (err error) {
o.UdpOutputConfig = config.(*UdpOutputConfig) // assert we have the right config type
if o.Net == "unixgram" {
if runtime.GOOS == "windows" {
return errors.New("Can't use Unix datagram sockets on Windows.")
}
var unixAddr, lAddr *net.UnixAddr
unixAddr, err = net.ResolveUnixAddr(o.Net, o.Address)
if err != nil {
return fmt.Errorf("Error resolving unixgram address '%s': %s", o.Address,
err.Error())
}
if o.LocalAddress != "" {
lAddr, err = net.ResolveUnixAddr(o.Net, o.LocalAddress)
if err != nil {
return fmt.Errorf("Error resolving local unixgram address '%s': %s",
o.LocalAddress, err.Error())
}
}
if o.conn, err = net.DialUnix(o.Net, lAddr, unixAddr); err != nil {
return fmt.Errorf("Can't connect to '%s': %s", o.Address,
err.Error())
}
} else {
var udpAddr, lAddr *net.UDPAddr
if udpAddr, err = net.ResolveUDPAddr(o.Net, o.Address); err != nil {
return fmt.Errorf("Error resolving UDP address '%s': %s", o.Address,
err.Error())
}
if o.LocalAddress != "" {
lAddr, err = net.ResolveUDPAddr(o.Net, o.LocalAddress)
if err != nil {
return fmt.Errorf("Error resolving local UDP address '%s': %s",
o.Address, err.Error())
}
}
if o.conn, err = net.DialUDP(o.Net, lAddr, udpAddr); err != nil {
return fmt.Errorf("Can't connect to '%s': %s", o.Address,
err.Error())
}
}
return
}
In addition to specifying configuration options that are specific to your plugin, it is also possible to use the config struct to specify default values for the ticker_interval and message_matcher values that are available to all Filter and Output plugins. If a config struct contains a uint attribute called TickerInterval, that will be used as a default ticker interval value (in seconds) if none is supplied in the TOML. Similarly, if a config struct contains a string attribute called MessageMatcher, that will be used as the default message routing rule if none is specified in the configuration file.
There is an optional configuration interface called WantsName. It provides a a plugin access to its configured name before the runner has started. The SandboxFilter plugin uses the name to locate/load any preserved state before being run:
type WantsName interface {
SetName(name string)
}
There is also a similar WantsPipelineConfig interface that can be used if a plugin needs access to the active PipelineConfig or GlobalConfigStruct values in the ConfigStruct or Init methods. (If these values are needed in the Run method they can be retrieved from the PluginRunner.):
type WantsPipelineConfig interface {
SetPipelineConfig(pConfig *pipeline.PipelineConfig)
}
Input plugins are responsible for acquiring data from the outside world and injecting this data into the Heka pipeline. An input might be passively listening for incoming network data or actively scanning external sources (either on the local machine or over a network). The input plugin interface is:
type Input interface {
Run(ir InputRunner, h PluginHelper) (err error)
Stop()
}
The Run method is called when Heka starts and, if all is functioning as intended, should not return until Heka is shut down. If a condition arises such that the input can not perform its intended activity it should return with an appropriate error, otherwise it should continue to run until a shutdown event is triggered by Heka calling the input’s Stop method, at which time any clean-up should be done and a clean shutdown should be indicated by returning a nil error.
Inside the Run method, an input has three primary responsibilities:
The details of the first step are clearly entirely defined by the plugin’s intended input mechanism(s). Plugins can (and should!) spin up goroutines as needed to perform tasks such as listening on a network connection, making requests to external data sources, scanning machine resources and operational characteristics, reading files from a file system, etc.
For the second step, before you can populate a PipelinePack object you have to actually have one. You can get empty packs from a channel provided to you by the InputRunner. You get the channel itself by calling ir.InChan() and then pull a pack from the channel whenever you need one.
Often, populating a PipelinePack is as simple as storing the raw data that was retrieved from the outside world in the pack’s MsgBytes attribute. For efficiency’s sake, it’s best to write directly into the already allocated memory rather than overwriting the attribute with a []byte slice pointing to a new array. Overwriting the array is likely to lead to a lot of garbage collector churn.
The third step involves the input plugin deciding where next to pass the PipelinePack and then doing so. Once the MsgBytes attribute has been set the pack will typically be passed on to a decoder plugin, which will convert the raw bytes into a Message object, also an attribute of the PipelinePack. An input can gain access to the decoders that are available by calling PluginHelper.DecoderRunner, which can be used to access decoders by the name they have been registered as in the config. Each call to PluginHelper.DecoderRunner will spin up a new decoder in its own goroutine. It’s perfectly fine for an input to ask for multiple decoders; for instance the TcpInput creates one for each separate TCP connection. All decoders will be closed when Heka shuts down, but if a decoder will not longer be used (e.g. when a TCP connection is closed in the TcpInput example mentioned above) it’s a good idea to call PluginHelper.StopDecoderRunner to shut it down or else it will continue to consume system resources throughout the life of the Heka process.
It is up to the input to decide which decoder should be used. Once the decoder has been determined and fetched from the PluginHelper the input can call DecoderRunner.InChan() to fetch a DecoderRunner’s input channel upon which the PipelinePack can be placed.
Sometimes the input itself might wish to decode the data, rather than delegating that job to a separate decoder. In this case the input can directly populate the pack.Message and set the pack.Decoded value as true, as a decoder would do. Decoded messages are then injected into Heka’s routing system by calling InputRunner.Inject(pack). The message will then be delivered to the appropriate filter and output plugins.
One final important detail: if for any reason your input plugin should pull a PipelinePack off of the input channel and not end up passing it on to another step in the pipeline (i.e. to a decoder or to the router), you must call PipelinePack.Recycle() to free the pack up to be used again. Failure to do so will cause the PipelinePack pool to be depleted and will cause Heka to freeze.
Decoder plugins are responsible for converting raw bytes containing message data into actual Message struct objects that the Heka pipeline can process. As with inputs, the Decoder interface is quite simple:
type Decoder interface {
Decode(pack *PipelinePack) (packs []*PipelinePack, err error)
}
There are two optional Decoder interfaces. The first provides the Decoder access to its DecoderRunner object when it is started:
type WantsDecoderRunner interface {
SetDecoderRunner(dr DecoderRunner)
}
The second provides a notification to the Decoder when the DecoderRunner is exiting:
type WantsDecoderRunnerShutdown interface {
Shutdown()
}
A decoder’s Decode method should extract the raw message data from pack.MsgBytes and attempt to deserialize this and use the contained information to populate the Message struct pointed to by the pack.Message attribute. Again, to minimize GC churn, take care to reuse the already allocated memory rather than creating new objects and overwriting the existing ones.
If the message bytes are decoded successfully then Decode should return a slice of PipelinePack pointers and a nil error value. The first item in the returned slice (i.e. packs[0]) should be the pack that was passed in to the method. If the decoding process produces more than one output pack, additonal packs can be appended to the slice.
If decoding fails for any reason, then Decode should return a nil value for the PipelinePack slice, causing the message to be dropped with no further processing. Returning an appropriate error value will cause Heka to log an error message about the decoding failure.
Filter plugins are the message processing engine of the Heka system. They are used to examine and process message contents, and trigger events based on those contents in real time as messages are flowing through the Heka system.
The filter plugin interface is just a single method:
type Filter interface {
Run(r FilterRunner, h PluginHelper) (err error)
}
Like input plugins, filters have a Run method which accepts a runner and a helper, and which should not return until shutdown unless there’s an error condition. And like input plugins, filters should call runner.InChan() to gain access to the plugin’s input channel.
The similarities end there, however. A filter’s input channel provides pointers to PipelinePack objects, defined in pipeline_runner.go
The Pack contains a fully decoded Message object from which the filter can extract any desired information.
Upon processing a message, a filter plugin can perform any of three tasks:
To pass a message through unchanged, a filter can call PluginHelper.Filter() or PluginHelper.Output() to access a filter or output plugin, and then call that plugin’s Deliver() method, passing in the PipelinePack.
To generate new messages, your filter must call PluginHelper.PipelinePack(msgLoopCount int). The msgloopCount value to be passed in should be obtained from the MsgLoopCount value on the PipelinePack that you’re already holding, or possibly zero if the new message is being triggered by a timed ticker instead of an incoming message. The PipelinePack method will either return a pack ready for you to populate or nil if the loop count is greater than the configured maximum value, as a safeguard against inadvertently creating infinite message loops.
Once a PipelinePack has been obtained, a filter plugin can populate its Message object. The pack can then be passed along to a specific plugin (or plugins) as above. Alternatively, the pack can be injected into the Heka message router queue, where it will be checked against all plugin message matchers, by passing it to the FilterRunner.Inject(pack *PipelinePack) method. Note that, again as a precaution against message looping, a plugin will not be allowed to inject a message which would get a positive response from that plugin’s own matcher.
Sometimes a filter will take a specific action triggered by a single incoming message. There are many cases, however, when a filter is merely collecting or aggregating data from the incoming messages, and instead will be sending out reports on the data that has been collected at specific intervals. Heka has built-in support for this use case. Any filter (or output) plugin can include a ticker_interval config setting (in seconds, integers only), which will automatically be extracted by Heka when the configuration is loaded. Then from within your plugin code you can call FilterRunner.Ticker() and you will get a channel (type <-chan time.Time) that will send a tick at the specified interval. Your plugin code can listen on the ticker channel and take action as needed.
Observant readers might have noticed that, unlike the Input interface, filters don’t need to implement a Stop method. Instead, Heka will communicate a shutdown event to filter plugins by closing the input channel from which the filter is receiving the PipelinePack objects. When this channel is closed, a filter should perform any necessary clean-up and then return from the Run method with a nil value to indicate a clean exit.
Finally, there is one very important point that all authors of filter plugins should keep in mind: if you are not passing your received PipelinePack object on to another filter or output plugin for further processing, then you must call PipelinePack.Recycle() to tell Heka that you are through with the pack. Failure to do so will cause Heka to not free up the packs for reuse, exhausting the supply and eventually causing the entire pipeline to freeze.
Encoder plugins are the inverse of decoders. They convert Message structs into raw bytes that can be delivered to the outside world. Some encoders will serialize an entire Message struct, such as the ProtobufEncoder which uses Heka’s native protocol buffers format. Other encoders extract data from the message and insert it into a different format such as plain text or JSON.
The Encoder interface consists of one method:
Encode(pack *PipelinePack) (output []byte, err error)
This method accepts a PiplelinePack containing a populated message object and returns a byte slice containing the data that should be sent out, or an error if serialization fails for some reason. If the encoder wishes to swallow an input message without generating any output (such as for batching, or because the message contains no new data) then nil should be returned for both the output and the error.
Unlike the other plugin types, encoders don’t have a PluginRunner, nor do they run in their own goroutines. Outputs invoke encoders directly, by calling the Encode method exposed on the OutputRunner. This has the same signature as the Encoder interface’s Encode method, to which it will will delegate. If use_framing is set to true in the output’s configuration, however, the OutputRunner will prepend Heka’s Stream Framing to the generated binary data.
Outputs can also directly access their encoder instance by calling OutputRunner.Encoder(). Encoders themselves don’t handle the stream framing, however, so it is recommended that outputs use the OutputRunner method instead.
Even though encoders don’t run in their own goroutines, it is possible that they might need to perform some clean up at shutdown time. If this is so, the encoder can implement the NeedsStopping interface:
Stop()
And the Stop method will be called during the shutdown sequence.
Finally we come to the output plugins, which are responsible for receiving Heka messages and using them to generate interactions with the outside world. The Output interface is nearly identical to the Filter interface:
type Output interface {
Run(or OutputRunner, h PluginHelper) (err error)
}
In fact, there are many ways in which filter and output plugins are similar. Like filters, outputs should call the InChan method on the provided runner to get an input channel, which will feed PipelinePack objects. Like filters, outputs should listen on this channel until it is closed, at which time they should perform any necessary clean-up and then return. And, like filters, any output plugin with a ticker_interval value in the configuration will use that value to create a ticker channel that can be accessed using the runner’s Ticker method. And, finally, outputs should also be sure to call PipelinePack.Recycle() when they finish w/ a pack so that Heka knows the pack is freed up for reuse.
The primary way that outputs differ from filters, of course, is that outputs need to serialize (or extract data from) the messages they receive and then send that data to an external destination. The serialization (or data extraction) should typically be performed by the output’s specified encoder plugin. The OutputRunner exposes the following methods to assist with this:
Encode(pack *PipelinePack) (output []byte, err error)
UsesFraming() bool
Encoder() (encoder Encoder)
The Encode method will use the specified encoder to convert the pack’s message to binary data, then if use_framing was set to true in the output’s configuration it will prepend Heka’s Stream Framing. The UsesFraming method will tell you whether or not use_framing was set to true. Finally, the Encoder method will return the actual encoder that was registered. This is useful to check to make sure that an encoder was actually registered, but generally you will want to use OutputRunner.Encode and not Encoder.Encode, since the latter will not honor the output’s use_framing specification.
The last step you have to take after implementing your plugin is to register it with hekad so it can actually be configured and used. You do this by calling the pipeline package’s RegisterPlugin function:
func RegisterPlugin(name string, factory func() interface{})
The name value should be a unique identifier for your plugin, and it should end in one of “Input”, “Decoder”, “Filter”, or “Output”, depending on the plugin type.
The factory value should be a function that returns an instance of your plugin, usually a pointer to a struct, where the pointer type implements the Plugin interface and the interface appropriate to its type (i.e. Input, Decoder, Filter, or Output).
This sounds more complicated than it is. Here are some examples from Heka itself:
RegisterPlugin("UdpInput", func() interface{} {return new(UdpInput)})
RegisterPlugin("TcpInput", func() interface{} {return new(TcpInput)})
RegisterPlugin("ProtobufDecoder", func() interface{} {return new(ProtobufDecoder)})
RegisterPlugin("CounterFilter", func() interface{} {return new(CounterFilter)})
RegisterPlugin("StatFilter", func() interface{} {return new(StatFilter)})
RegisterPlugin("LogOutput", func() interface{} {return new(LogOutput)})
RegisterPlugin("FileOutput", func() interface{} {return new(FileOutput)})
It is recommended that RegisterPlugin calls be put in your Go package’s init() function so that you can simply import your package when building hekad and the package’s plugins will be registered and available for use in your Heka config file. This is made a bit easier if you use plugin_loader.cmake, see Building hekad with External Plugins.
name (required, string) - Name of the field (key).
representation (optional, string) - Freeform metadata string where you can describe what the data in this field represents. This information might provide cues to assist with processing, labeling, or rendering of the data performed by downstream plugins or UI elements. Examples of common usage follow:
- Numeric value representation - In most cases it is the unit.
- count - It is a standard practice to use ‘count’ for raw values with no units.
- KiB
- mm
- String value representation - Ideally it should reference a formal specification but you are free to create you own vocabulary.
- date-time RFC 3339, section 5.6
- email RFC 5322, section 3.4.1
- hostname RFC 1034, section 3.1
- ipv4 RFC 2673, section 3.2
- ipv6 RFC 2373, section 2.2
- uri RFC 3986
- How the representation is/can be used
- data parsing and validation
- unit conversion i.e., B to KiB
- presentation i.e., graph labels, HTML links
value_* (optional, value_type) - Array of values, only one type will be active at a time.
Heka has some custom framing that can be used to delimit records when generating a stream of binary data. The entire structure encapsulating a single message consists of a one byte record separator, one byte representing the header length, a protobuf encoded message header, a one byte unit separator, and the binary record content (usually a protobuf encoded Heka message). This message structure is indicated in this diagram:
The header schema is as follows:
Clients interested in decoding a Heka stream will need to read the header length byte to determine the length of the header, extract the encoded header data and decode this into a Header structure using an appropriate protobuf library. From this they can then extract the length of the encoded message data, which can then be extracted from the data stream and processed and/or decoded as needed.
Message matching is done by the hekad router to choose an appropriate filter(s) to run. Every filter that matches will get a copy of the message.
All message variables must be on the left hand side of the relational comparison
See also
Sandboxes are Heka plugins that are implemented in a sandboxed scripting language. They provide a dynamic and isolated execution environment for data parsing, transformation, and analysis. They allow real time access to data in production without jeopardizing the integrity or performance of the monitoring infrastructure and do not require Heka to be recompiled. This broadens the audience that the data can be exposed to and facilitates new uses of the data (i.e. debugging, monitoring, dynamic provisioning, SLA analysis, intrusion detection, ad-hoc reporting, etc.)
small - memory requirements are about 16 KiB for a basic sandbox
fast - microsecond execution times
stateful - ability to resume where it left off after a restart/reboot
isolated - failures are contained and malfunctioning sandboxes are terminated
The Lua sandbox provides full access to the Lua language in a sandboxed environment under hekad that enforces configurable restrictions.
See also
Called by Heka when a message is available to the sandbox. The instruction_limit configuration parameter is applied to this function call.
Called by Heka when the ticker_interval expires. The instruction_limit configuration parameter is applied to this function call. This function is only required in SandboxFilters (SandboxDecoders do not support timer events).
See: https://github.com/mozilla-services/lua_sandbox/blob/master/docs/sandbox_api.md
require(libraryName)
Appends the arguments to the payload buffer for incremental construction of the final payload output (inject_payload finalizes the buffer and sends the message to Heka). This function is simply a rename of the generic sandbox output function to improve the readability of the plugin code.
Provides access to the sandbox configuration variables.
Provides access to the Heka message data. Note that both fieldIndex and arrayIndex are zero-based (i.e. the first element is 0) as opposed to Lua’s standard indexing, which is one-based.
New in version 0.5.
Decoders only. Mutates specified field value on the message that is being decoded.
Uuid (accepts raw bytes or RFC4122 string representation)
Type (string)
Logger (string)
Payload (string)
EnvVersion (string)
Hostname (string)
parseable string representations.)
Severity (number or int-parseable string)
Pid (number or int-parseable string)
Fields[_name_] (field type determined by value type: bool, number, or string)
Iterates through the message fields returning the field contents or nil when the end is reached.
inject_payload(payload_type, payload_name, arg3, ..., argN)
Creates a new Heka message using the contents of the payload buffer (pre-populated with add_to_payload) combined with any additional payload_args passed to this function. The output buffer is cleared after the injection. The payload_type and payload_name arguments are two pieces of optional metadata. If specified, they will be included as fields in the injected message e.g., Fields[payload_type] == ‘csv’, Fields[payload_name] == ‘Android Usage Statistics’. The number of messages that may be injected by the process_message or timer_event functions are globally controlled by the hekad global configuration options; if these values are exceeded the sandbox will be terminated.
- Arguments
- payload_type (optional, default “txt” string) Describes the content type of the injected payload data.
- payload_name (optional, default “” string) Names the content to aid in downstream filtering.
- arg3 (optional) Same type restrictions as add_to_payload.
- ...
- argN
- Return
- none
Creates a new Heka protocol buffer message using the contents of the specified Lua table (overwriting whatever is in the output buffer). Notes about message fields:
Timestamp is automatically generated if one is not provided. Nanosecond since the UNIX epoch is the only valid format.
UUID is automatically generated, anything provided by the user is ignored.
Hostname and Logger are automatically set by the SandboxFilter and cannot be overridden.
Type is prepended with “heka.sandbox.” by the SandboxFilter to avoid data confusion/mis-representation.
name=value i.e., foo=”bar”; foo=1; foo=true
name={array} i.e., foo={“b”, “a”, “r”}
{
Uuid = "data", -- always ignored
Logger = "nginx", -- ignored in the SandboxFilter
Hostname = "bogus.mozilla.com", -- ignored in the SandboxFilter
Timestamp = 1e9,
Type = "TEST", -- will become "heka.sandbox.TEST" in the SandboxFilter
Payload = "Test Payload",
EnvVersion = "0.8",
Pid = 1234,
Severity = 6,
Fields = {
http_status = 200,
request_size = {value=1413, representation="B"}
}
}
function process_message ()
return 0
end
function timer_event(ns)
end
require "string"
total = 0 -- preserved between restarts since it is in global scope
local count = 0 -- local scope so this will not be preserved
function process_message()
total= total + 1
count = count + 1
return 0
end
function timer_event(ns)
count = 0
inject_payload("txt", "",
string.format("%d messages in the last minute; total=%d", count, total))
end
[demo_counter]
type = "SandboxFilter"
message_matcher = "Type == 'demo'"
ticker_interval = 60
filename = "counter.lua"
preserve_data = true
4. Extending the business logic (count the number of ‘demo’ events per minute per device)
require "string"
device_counters = {}
function process_message()
local device_name = read_message("Fields[DeviceName]")
if device_name == nil then
device_name = "_unknown_"
end
local dc = device_counters[device_name]
if dc == nil then
dc = {count = 1, total = 1}
device_counters[device_name] = dc
else
dc.count = dc.count + 1
dc.total = dc.total + 1
end
return 0
end
function timer_event(ns)
add_to_payload("#device_name\tcount\ttotal\n")
for k, v in pairs(device_counters) do
add_to_payload(string.format("%s\t%d\t%d\n", k, v.count, v.total))
v.count = 0
end
inject_payload()
end
The SandboxManagerFilter provides dynamic control (start/stop) of sandbox filters in a secure manner without stopping the Heka daemon. Commands are sent to a SandboxManagerFilter using a signed Heka message. The intent is to have one manager per access control group each with their own message signing key. Users in each group can submit a signed control message to manage any filters running under the associated manager. A signed message is not an enforced requirement but it is highly recommended in order to restrict access to this functionality.
The directory where the filter configurations, code, and states are preserved. The directory can be unique or shared between sandbox managers since the filter names are unique per manager. Defaults to a directory in ${BASE_DIR}/sbxmgrs with a name generated from the plugin name.
The directory where ‘require’ will attempt to load the external Lua modules from. Defaults to ${SHARE_DIR}/lua_modules.
The maximum number of filters this manager can run.
New in version 0.5.
The number of bytes managed sandboxes are allowed to consume before being terminated (default 8MiB).
The number of instructions managed sandboxes are allowed to execute during the process_message/timer_event functions before being terminated (default 1M).
The number of bytes managed sandbox output buffers can hold before being terminated (default 63KiB). Warning: messages exceeding 64KiB will generate an error and be discarded by the standard output plugins (File, TCP, UDP) since they exceed the maximum message size.
Example
[OpsSandboxManager]
type = "SandboxManagerFilter"
message_signer = "ops"
# message_matcher = "Type == 'heka.control.sandbox'" # automatic default setting
max_filters = 100
The sandbox manager control message is a regular Heka message with the following variables set to the specified values.
Starting a SandboxFilter
Stopping a SandboxFilter
Heka Sbmgr is a tool for managing (starting/stopping) sandbox filters by generating the control messages defined above.
Command Line Options
heka-sbmgr [-config config_file] [-action load|unload] [-filtername specified on unload] [-script sandbox script filename] [-scriptconfig sandbox script configuration filename]
Configuration Variables
ip_address (string): IP address of the Heka server.
use_tls (bool): Specifies whether or not SSL/TLS encryption should be used for the TCP connections. Defaults to false.
tls (TlsConfig): A sub-section that specifies the settings to be used for any SSL/TLS encryption. This will only have any impact if use_tls is set to true. See Configuring TLS.
Example
ip_address = "127.0.0.1:5565"
use_tls = true
[signer]
name = "test"
hmac_hash = "md5"
hmac_key = "4865ey9urgkidls xtb0[7lf9rzcivthkm"
version = 0
[tls]
cert_file = "heka.crt"
key_file = "heka.key"
client_auth = "NoClientCert"
prefer_server_ciphers = true
min_version = "TLS11"
Heka Sbmgrload is a test tool for starting/stopping a large number of sandboxes. The script and configuration are built into the tool and the filters will be named: CounterSandboxN where N is the instance number.
Command Line Options
heka-sbmgrload [-config config_file] [-action load|unload] [-num number of sandbox instances]
Configuration Variables (same as heka-sbmgr)
The SandboxManagerFilters are defined in the hekad configuration file and are created when hekad starts. The manager provides a location/namespace for SandboxFilters to run and controls access to this space via a signed Heka message. By associating a message_signer with the manager we can restrict who can load and unload the associated filters. Lets start by configuring a SandboxManager for a specific set of users; platform developers. Choose a unique filter name [PlatformDevs] and a signer name “PlatformDevs”, in this case we will use the same name for each.
[PlatformDevs]
type = "SandboxManagerFilter"
message_signer = "PlatformDevs"
working_directory = "/var/heka/sandbox"
max_filters = 100
For this setup we will extend the current TCP input to handle our signed messages. The signer section consists of the signer name followed by an underscore and the key version number (the reason for this notation is to simply flatten the signer configuration structure into a single map). Multiple key versions are allowed to be active at the same time facilitating the rollout of new keys.
[TCP:5565]
type = "TcpInput"
parser_type = "message.proto"
decoder = "ProtobufDecoder"
address = ":5565"
[TCP:5565.signer.PlatformDevs_0]
hmac_key = "Old Platform devs signing key"
[TCP:5565.signer.PlatformDevs_1]
hmac_key = "Platform devs signing key"
3. Configure the sandbox manager utility (sbmgr). The signer information must exactly match the values in the input configuration above otherwise the messages will be discarded. Save the file as PlatformDevs.toml.
ip_address = ":5565"
[signer]
name = "PlatformDevs"
hmac_hash = "md5"
hmac_key = "Platform devs signing key"
version = 1
require "circular_buffer"
data = circular_buffer.new(1440, 1, 60) -- message count per minute
local COUNT = data:set_header(1, "Messages", "count")
function process_message ()
local ts = read_message("Timestamp")
data:add(ts, COUNT, 1)
return 0
end
function timer_event(ns)
inject_payload("cbuf", "", data)
end
The only difference between a static and dynamic SandboxFilter configuration is the filename. In the dynamic configuration it can be left blank or left out entirely. The manager will assign the filter a unique system wide name, in this case, “PlatformDevs-Example”.
[Example]
type = "SandboxFilter"
message_matcher = "Type == 'Widget'"
ticker_interval = 60
filename = ""
preserve_data = false
sbmgr -action=load -config=PlatformDevs.toml -script=example.lua -scriptconfig=example.toml
If you are running the DashboardOutput the following links are available:
Otherwise
Note
A running filter cannot be ‘reloaded’ it must be unloaded and loaded again. During the unload/load process some data can be missed and gaps will be created. In the future we hope to remedy this but for now it is a limitation of the dynamic sandbox.
sbmgr -action=unload -config=PlatformDevs.toml -filtername=Example
The SandboxDecoder provides an isolated execution environment for data parsing and complex transformations without the need to recompile Heka. See Sandbox.
Config:
Example
[sql_decoder]
type = "SandboxDecoder"
filename = "sql_decoder.lua"
Parses the Apache access logs based on the Apache ‘LogFormat’ configuration directive. The Apache format specifiers are mapped onto the Nginx variable names where applicable e.g. %a -> remote_addr. This allows generic web filters and outputs to work with any HTTP server input.
Config:
The ‘LogFormat’ configuration directive from the apache2.conf. %t variables are converted to the number of nanosecond since the Unix epoch and used to set the Timestamp on the message. http://httpd.apache.org/docs/2.4/mod/mod_log_config.html
Sets the message ‘Type’ header to the specified value
Transform the http_user_agent into user_agent_browser, user_agent_version, user_agent_os.
Always preserve the http_user_agent value if transform is enabled.
Only preserve the http_user_agent value if transform is enabled and fails.
Always preserve the original log line in the message payload.
Example Heka Configuration
[TestWebserver]
type = "LogstreamerInput"
log_directory = "/var/log/apache"
file_match = 'access\.log'
decoder = "CombinedLogDecoder"
[CombinedLogDecoder]
type = "SandboxDecoder"
filename = "lua_decoders/apache_access.lua"
[CombinedLogDecoder.config]
type = "combined"
user_agent_transform = true
# combined log format
log_format = '%h %l %u %t \"%r\" %>s %O \"%{Referer}i\" \"%{User-Agent}i\"'
# common log format
# log_format = '%h %l %u %t \"%r\" %>s %O'
# vhost_combined log format
# log_format = '%v:%p %h %l %u %t \"%r\" %>s %O \"%{Referer}i\" \"%{User-Agent}i\"'
# referer log format
# log_format = '%{Referer}i -> %U'
Example Heka Message
Timestamp: | 2014-01-10 07:04:56 -0800 PST |
---|---|
Type: | combined |
Hostname: | test.example.com |
Pid: | 0 |
UUID: | 8e414f01-9d7f-4a48-a5e1-ae92e5954df5 |
Logger: | TestWebserver |
Payload: | |
EnvVersion: | |
Severity: | 7 |
Fields: | name:”remote_user” value_string:”-“
name:”http_x_forwarded_for” value_string:”-“
name:”http_referer” value_string:”-“
name:”body_bytes_sent” value_type:DOUBLE representation:”B” value_double:82
name:”remote_addr” value_string:”62.195.113.219” representation:”ipv4”
name:”status” value_type:DOUBLE value_double:200
name:”request” value_string:”GET /v1/recovery_email/status HTTP/1.1”
name:”user_agent_os” value_string:”FirefoxOS”
name:”user_agent_browser” value_string:”Firefox”
name:”user_agent_version” value_type:DOUBLE value_double:29
|
Parses a payload containing JSON in the Graylog2 Extended Format specficiation. http://graylog2.org/resources/gelf/specification
Config:
Sets the message ‘Type’ header to the specified value
Always preserve the original log line in the message payload.
Example of Graylog2 Exteded Format Log
{
"version": "1.1",
"host": "rogueethic.com",
"short_message": "This is a short message to identify what is going on.",
"full_message": "An entire backtrace\ncould\ngo\nhere",
"timestamp": 1385053862.3072,
"level": 1,
"_user_id": 9001,
"_some_info": "foo",
"_some_env_var": "bar"
}
Example Heka Configuration
[GELFLogInput]
type = "LogstreamerInput"
log_directory = "/var/log"
file_match = 'application\.gelf'
decoder = "GraylogDecoder"
[GraylogDecoder]
type = "SandboxDecoder"
filename = "lua_decoders/graylog_decoder.lua"
[GraylogDecoder.config]
type = "gelf"
payload_keep = true
Parses a payload containing the contents of a /sys/block/$DISK/stat file (where $DISK is a disk identifier such as sda) into a Heka message struct. This also tries to obtain the TickerInterval of the input it recieved the data from, by extracting it from a message field named TickerInterval.
Config:
Always preserve the original log line in the message payload.
Example Heka Configuration
[DiskStats]
type = "FilePollingInput"
ticker_interval = 1
file_path = "/sys/block/sda1/stat"
decoder = "DiskStatsDecoder"
[DiskStatsDecoder]
type = "SandboxDecoder"
filename = "lua_decoders/linux_diskstats.lua"
Example Heka Message
Timestamp: | 2014-01-10 07:04:56 -0800 PST |
---|---|
Type: | stats.diskstats |
Hostname: | test.example.com |
Pid: | 0 |
UUID: | 8e414f01-9d7f-4a48-a5e1-ae92e5954df5 |
Payload: | |
EnvVersion: | |
Severity: | 7 |
Fields: | name:”ReadsCompleted” value_type:DOUBLE value_double:”20123”
name:”ReadsMerged” value_type:DOUBLE value_double:”11267”
name:”SectorsRead” value_type:DOUBLE value_double:”1.094968e+06”
name:”TimeReading” value_type:DOUBLE value_double:”45148”
name:”WritesCompleted” value_type:DOUBLE value_double:”1278”
name:”WritesMerged” value_type:DOUBLE value_double:”1278”
name:”SectorsWritten” value_type:DOUBLE value_double:”206504”
name:”TimeWriting” value_type:DOUBLE value_double:”3348”
name:”TimeDoingIO” value_type:DOUBLE value_double:”4876”
name:”WeightedTimeDoingIO” value_type:DOUBLE value_double:”48356”
name:”NumIOInProgress” value_type:DOUBLE value_double:”3”
name:”TickerInterval” value_type:DOUBLE value_double:”2”
name:”FilePath” value_string:”/sys/block/sda/stat”
|
Parses a payload containing the contents of a /proc/loadavg file into a Heka message.
Config:
Always preserve the original log line in the message payload.
Example Heka Configuration
[LoadAvg]
type = "FilePollingInput"
ticker_interval = 1
file_path = "/proc/loadavg"
decoder = "LoadAvgDecoder"
[LoadAvgDecoder]
type = "SandboxDecoder"
filename = "lua_decoders/linux_loadavg.lua"
Example Heka Message
Timestamp: | 2014-01-10 07:04:56 -0800 PST |
---|---|
Type: | stats.loadavg |
Hostname: | test.example.com |
Pid: | 0 |
UUID: | 8e414f01-9d7f-4a48-a5e1-ae92e5954df5 |
Payload: | |
EnvVersion: | |
Severity: | 7 |
Fields: | name:”1MinAvg” value_type:DOUBLE value_double:”3.05”
name:”5MinAvg” value_type:DOUBLE value_double:”1.21”
name:”15MinAvg” value_type:DOUBLE value_double:”0.44”
name:”NumProcesses” value_type:DOUBLE value_double:”11”
name:”FilePath” value_string:”/proc/loadavg”
|
Parses a payload containing the contents of a /proc/meminfo file into a Heka message.
Config:
Always preserve the original log line in the message payload.
Example Heka Configuration
[MemStats]
type = "FilePollingInput"
ticker_interval = 1
file_path = "/proc/meminfo"
decoder = "MemStatsDecoder"
[MemStatsDecoder]
type = "SandboxDecoder"
filename = "lua_decoders/linux_memstats.lua"
Example Heka Message
Timestamp: | 2014-01-10 07:04:56 -0800 PST |
---|---|
Type: | stats.memstats |
Hostname: | test.example.com |
Pid: | 0 |
UUID: | 8e414f01-9d7f-4a48-a5e1-ae92e5954df5 |
Payload: | |
EnvVersion: | |
Severity: | 7 |
Fields: | name:”MemTotal” value_type:DOUBLE representation:”kB” value_double:”4047616”
name:”MemFree” value_type:DOUBLE representation:”kB” value_double:”3432216”
name:”Buffers” value_type:DOUBLE representation:”kB” value_double:”82028”
name:”Cached” value_type:DOUBLE representation:”kB” value_double:”368636”
name:”FilePath” value_string:”/proc/meminfo”
|
The total available fields can be found in man procfs. All fields are of type double, and the representation is in kB (except for the HugePages fields). Here is a full list of fields available:
MemTotal, MemFree, Buffers, Cached, SwapCached, Active, Inactive, Active(anon), Inactive(anon), Active(file), Inactive(file), Unevictable, Mlocked, SwapTotal, SwapFree, Dirty, Writeback, AnonPages, Mapped, Shmem, Slab, SReclaimable, SUnreclaim, KernelStack, PageTables, NFS_Unstable, Bounce, WritebackTmp, CommitLimit, Committed_AS, VmallocTotal, VmallocUsed, VmallocChunk, HardwareCorrupted, AnonHugePages, HugePages_Total, HugePages_Free, HugePages_Rsvd, HugePages_Surp, Hugepagesize, DirectMap4k, DirectMap2M, DirectMap1G.
Note that your available fields may have a slight variance depending on the system’s kernel version.
Parses and transforms the MySQL slow query logs. Use mariadb_slow_query.lua to parse the MariaDB variant of the MySQL slow query logs.
Config:
Truncates the SQL payload to the specified number of bytes (not UTF-8 aware) and appends ”...”. If the value is nil no truncation is performed. A negative value will truncate the specified number of bytes from the end.
Example Heka Configuration
[Sync-1_5-SlowQuery]
type = "LogstreamerInput"
log_directory = "/var/log/mysql"
file_match = 'mysql-slow\.log'
parser_type = "regexp"
delimiter = "\n(# User@Host:)"
delimiter_location = "start"
decoder = "MySqlSlowQueryDecoder"
[MySqlSlowQueryDecoder]
type = "SandboxDecoder"
filename = "lua_decoders/mysql_slow_query.lua"
[MySqlSlowQueryDecoder.config]
truncate_sql = 64
Example Heka Message
Timestamp: | 2014-05-07 15:51:28 -0700 PDT |
---|---|
Type: | mysql.slow-query |
Hostname: | 127.0.0.1 |
Pid: | 0 |
UUID: | 5324dd93-47df-485b-a88e-429f0fcd57d6 |
Logger: | Sync-1_5-SlowQuery |
Payload: | /* [queryName=FIND_ITEMS] */ SELECT bso.userid, bso.collection, ... |
EnvVersion: | |
Severity: | 7 |
Fields: | name:”Rows_examined” value_type:DOUBLE value_double:16458
name:”Query_time” value_type:DOUBLE representation:”s” value_double:7.24966
name:”Rows_sent” value_type:DOUBLE value_double:5001
name:”Lock_time” value_type:DOUBLE representation:”s” value_double:0.047038
|
Parses the Nginx access logs based on the Nginx ‘log_format’ configuration directive.
Config:
The ‘log_format’ configuration directive from the nginx.conf. $time_local or $time_iso8601 variable is converted to the number of nanosecond since the Unix epoch and used to set the Timestamp on the message. http://nginx.org/en/docs/http/ngx_http_log_module.html
Sets the message ‘Type’ header to the specified value
Transform the http_user_agent into user_agent_browser, user_agent_version, user_agent_os.
Always preserve the http_user_agent value if transform is enabled.
Only preserve the http_user_agent value if transform is enabled and fails.
Always preserve the original log line in the message payload.
Example Heka Configuration
[TestWebserver]
type = "LogstreamerInput"
log_directory = "/var/log/nginx"
file_match = 'access\.log'
decoder = "CombinedLogDecoder"
[CombinedLogDecoder]
type = "SandboxDecoder"
filename = "lua_decoders/nginx_access.lua"
[CombinedLogDecoder.config]
type = "combined"
user_agent_transform = true
# combined log format
log_format = '$remote_addr - $remote_user [$time_local] "$request" $status $body_bytes_sent "$http_referer" "$http_user_agent"'
Example Heka Message
Timestamp: | 2014-01-10 07:04:56 -0800 PST |
---|---|
Type: | combined |
Hostname: | test.example.com |
Pid: | 0 |
UUID: | 8e414f01-9d7f-4a48-a5e1-ae92e5954df5 |
Logger: | TestWebserver |
Payload: | |
EnvVersion: | |
Severity: | 7 |
Fields: | name:”remote_user” value_string:”-“
name:”http_x_forwarded_for” value_string:”-“
name:”http_referer” value_string:”-“
name:”body_bytes_sent” value_type:DOUBLE representation:”B” value_double:82
name:”remote_addr” value_string:”62.195.113.219” representation:”ipv4”
name:”status” value_type:DOUBLE value_double:200
name:”request” value_string:”GET /v1/recovery_email/status HTTP/1.1”
name:”user_agent_os” value_string:”FirefoxOS”
name:”user_agent_browser” value_string:”Firefox”
name:”user_agent_version” value_type:DOUBLE value_double:29
|
Parses the Nginx error logs based on the Nginx hard coded internal format.
Config:
The conversion actually happens on the Go side since there isn’t good TZ support here.
Example Heka Configuration
[TestWebserverError]
type = "LogstreamerInput"
log_directory = "/var/log/nginx"
file_match = 'error\.log'
decoder = "NginxErrorDecoder"
[NginxErrorDecoder]
type = "SandboxDecoder"
filename = "lua_decoders/nginx_error.lua"
[NginxErrorDecoder.config]
tz = "America/Los_Angeles"
Example Heka Message
Timestamp: | 2014-01-10 07:04:56 -0800 PST |
---|---|
Type: | nginx.error |
Hostname: | trink-x230 |
Pid: | 16842 |
UUID: | 8e414f01-9d7f-4a48-a5e1-ae92e5954df5 |
Logger: | TestWebserverError |
Payload: | using inherited sockets from “6;” |
EnvVersion: | |
Severity: | 5 |
Fields: | name:”tid” value_type:DOUBLE value_double:0
name:”connection” value_type:DOUBLE value_double:8878
|
Parses the rsyslog output using the string based configuration template.
Config:
The ‘template’ configuration string from rsyslog.conf. http://rsyslog-5-8-6-doc.neocities.org/rsyslog_conf_templates.html
If your rsyslog timestamp field in the template does not carry zone offset information, you may set an offset to be applied to your events here. Typically this would be used with the “Traditional” rsyslog formats.
Parsing is done by Go, supports values of “UTC”, “Local”, or a location name corresponding to a file in the IANA Time Zone database, e.g. “America/New_York”.
Example Heka Configuration
[RsyslogDecoder]
type = "SandboxDecoder"
filename = "lua_decoders/rsyslog.lua"
[RsyslogDecoder.config]
type = "RSYSLOG_TraditionalFileFormat"
template = '%TIMESTAMP% %HOSTNAME% %syslogtag%%msg:::sp-if-no-1st-sp%%msg:::drop-last-lf%\n'
tz = "America/Los_Angeles"
Example Heka Message
Timestamp: | 2014-02-10 12:58:58 -0800 PST |
---|---|
Type: | RSYSLOG_TraditionalFileFormat |
Hostname: | trink-x230 |
Pid: | 0 |
UUID: | e0eef205-0b64-41e8-a307-5772b05e16c1 |
Logger: | RsyslogInput |
Payload: | “imklog 5.8.6, log source = /proc/kmsg started.” |
EnvVersion: | |
Severity: | 7 |
Fields: | name:”programname” value_string:”kernel”
|
Stores the last alert time in the global _LAST_ALERT so alert throttling will persist between restarts.
Queue an alert message to be sent.
Send an alert message.
Sends all queued alert message as a single message.
Sets the minimum duration between alert event outputs.
Test to see if sending an alert at this time would be throttled.
Note
Use a zero timestamp to override message throttling.
Create an annotation in the global _ANNOTATIONS table.
Helper function to create an annotation table but not add it to the global list of annotations.
Concatenates an array of annotation tables to the specified key in the global _ANNOTATIONS table.
prune(name, ns)
- Arguments
- name (string) circular buffer payload name.
- ns (int64) current time in nanoseconds since the UNIX epoch.
- Return
- The json encoded list of annotations.
Entirely remove the payload name from the global _ANNOTATIONS table.
set_prune(name, ns_duration)
- Arguments
- name (string) circular buffer payload name.
- ns_duration (int64) time in nanoseconds the annotation should remain in the list.
- Return
- none
Parses the anomaly_config into a Lua table. If the configuration is invalid an error is thrown.
The configuration can specify any number of algorithm function calls (space delimited if desired, but they will also work back to back with no delimiter). This allows for analysis of multiple graphs, columns, and even specification of multiple algorithms per column.
Rate of change test
Only use this test on data with a normal (Gaussian http://en.wikipedia.org/wiki/Normal_distribution) distribution. It identifies rapid changes (spikes) in the data (increasing and decreasing) but ignores cyclic data that has a more gradual rise and fall. It is typically used for something like HTTP 200 status code analysis to detect a sudden increase/decrease in web traffic.
Quoted string containing the payload_name value used in the inject_payload function call. If the payload name contains a double quote it should be escaped as two double quotes in a row.
The circular buffer column to perform the analysis on.
The number of intervals in an analysis window.
The number of intervals in the historical analysis window (0 uses the full history). Must be greater than or equal to ‘win’.
The standard deviation threshold to trigger the anomaly.
Alert if data stops.
Alert if data starts.
e.g. roc(“Output1”, 1, 15, 0, 2, true, false)
Mann-Whitney-Wilcoxon test http://en.wikipedia.org/wiki/Mann-Whitney
Parametric
Only use this test on data with a normal (Gaussian http://en.wikipedia.org/wiki/Normal_distribution) distribution. It identifies more gradual changes in the data (increasing, decreasing, or any). It is typically used with something like server memory analysis where the values are more stable and gradual changes are interesting (e.g., memory leak).
Quoted string containing the payload_name value used in the inject_payload function call. If the payload name contains a double quote it should be escaped as two double quotes in a row.
The circular buffer column to perform the analysis on.
The number of intervals in an analysis window (should be at least 20).
The number of analysis windows to compare.
The pvalue threshold to trigger the prediction. http://en.wikipedia.org/wiki/P_value
(decreasing|increasing|any)
e.g. mww(“Output1”, 2, 60, 10, 0.0001, decreasing)
Non-parametric
This test can be used on data with a normal (Gaussian http://en.wikipedia.org/wiki/Normal_distribution) or non-normal (nonparametric http://en.wikipedia.org/wiki/Nonparametric_statistics) distribution. It identifies overlap/similarities between two data sets. It is typically used for something like detecting an increase in HTTP 500 status code errors.
Quoted string containing the payload_name value used in the inject_payload function call. If the payload name contains a double quote it should be escaped as two double quotes in a row.
The circular buffer column to perform the analysis on.
The number of intervals in an analysis window.
The number of analysis windows to compare.
Value between 0 and 1. Anything above 0.5 is an increasing trend anything below 0.5 is a decreasing trend. http://en.wikipedia.org/wiki/Mann-Whitney#.CF.81_statistic
e.g. mww_nonparametric(“Output1”, 2, 15, 10, 0.55)
Detects anomalies in the circular buffer data returning any error messages for alert generation and array of annotations for the graph.
bulkapi_index_json(index, type_name, id, ns)
Returns a simple JSON ‘index’ structure satisfying the ElasticSearch BulkAPI
- Arguments
- index (string or nil)
String to use as the _index key’s value in the generated JSON, or nil to omit the key. Supports field interpolation as described below.
- type_name (string or nil)
String to use as the _type key’s value in the generated JSON, or nil to omit the key. Supports field interpolation as described below.
- id (string or nil)
String to use as the _id key’ value in the generated JSON, or nil to omit the key. Supports field interpolation as described below.
- ns (number or nil)
Nanosecond timestamp to use for any strftime field interpolation into the above fields. Current system time will be used if nil.
Field interpolation
Data from the current message can be interpolated into any of the string arguments listed above. A %{} enclosed field name will be replaced by the field value from the current message. Supported default field names are “Type”, “Hostname”, “Pid”, “UUID”, “Logger”, “EnvVersion”, and “Severity”. Any other values will be checked against the defined dynamic message fields. If no field matches, then a C strftime (on non-Windows platforms) or C89 strftime (on Windows) time substitution will be attempted, using the nanosecond timestamp (if provided) or the system clock (if not).
- Return
- JSON string suitable for use as ElasticSearch BulkAPI index directive.
Read the LPeg reference
Do not use parentheses around function calls that take a single string argument.
-- prefer
lpeg.P"Literal"
-- instead of
lpeg.P("Literal")
local date_month = lpeg.P"0" * lpeg.R"19"
+ "1" * lpeg.R"02"
-- The exception: when grouping alternates together in a higher level grammar.
local log_grammar = (rfc3339 + iso8601) * log_severity * log_message
-- prefer
lpeg.digit
-- instead of
lpeg.R"09".
-- prefer
lpeg.digit * "Test"
-- instead of
lpeg.digit * lpeg.P"Test"
The sandbox filter provides an isolated execution environment for data analysis. Any output generated by the sandbox is injected into the payload of a new message for further processing or to be output.
Config:
Example:
[hekabench_counter]
type = "SandboxFilter"
message_matcher = "Type == 'hekabench'"
ticker_interval = 1
filename = "counter.lua"
preserve_data = true
profile = false
[hekabench_counter.config]
rows = 1440
sec_per_row = 60
Collects the circular buffer delta output from multiple instances of an upstream sandbox filter (the filters should all be the same version at least with respect to their cbuf output). The purpose is to recreate the view at a larger scope in each level of the aggregation i.e., host view -> datacenter view -> service level view.
Config:
Specifies whether or not this aggregator should generate cbuf deltas.
A list of anomaly detection specifications. If not specified no anomaly detection/alerting will be performed.
If preserve_data = true is set in the SandboxFilter configuration, then this value should be incremented every time the enable_delta configuration is changed to prevent the plugin from failing to start during data restoration.
Example Heka Configuration
[TelemetryServerMetricsAggregator]
type = "SandboxFilter"
message_matcher = "Logger == 'TelemetryServerMetrics' && Fields[payload_type] == 'cbufd'"
ticker_interval = 60
filename = "lua_filters/cbufd_aggregator.lua"
preserve_data = true
[TelemetryServerMetricsAggregator.config]
enable_delta = false
anomaly_config = 'roc("Request Statistics", 1, 15, 0, 1.5, true, false)'
preservation_version = 0
Collects the circular buffer delta output from multiple instances of an upstream sandbox filter (the filters should all be the same version at least with respect to their cbuf output). Each column from the source circular buffer will become its own graph. i.e., ‘Error Count’ will become a graph with each host being represented in a column.
Config:
Pre-allocates the number of host columns in the graph(s). If the number of active hosts exceed this value, the plugin will terminate.
The number of rows to keep from the original circular buffer. Storing all the data from all the hosts is not practical since you will most likely run into memory and output size restrictions (adjust the view down as necessary).
The amount of time a host has to be inactive before it can be replaced by a new host.
If preserve_data = true is set in the SandboxFilter configuration, then this value should be incremented every time the max_hosts or rows configuration is changed to prevent the plugin from failing to start during data restoration.
Example Heka Configuration
[TelemetryServerMetricsHostAggregator]
type = "SandboxFilter"
message_matcher = "Logger == 'TelemetryServerMetrics' && Fields[payload_type] == 'cbufd'"
ticker_interval = 60
filename = "lua_filters/cbufd_host_aggregator.lua"
preserve_data = true
[TelemetryServerMetricsHostAggregator.config]
max_hosts = 5
rows = 60
host_expiration = 120
preservation_version = 0
Graphs disk IO stats. It automatically converts the running totals of Writes and Reads into rates of the values. The time based fields are left as running totals of the amount of time doing IO. Expects to receive messages with disk IO data embedded in a particular set of message fields which matches what is generated by Linux Disk Stats Decoder: WritesCompleted, ReadsCompleted, SectorsWritten, SectorsRead, WritesMerged, ReadsMerged, TimeWriting, TimeReading, TimeDoingIO, WeightedTimeDoingIO, TickerInterval.
Config:
Sets the size of the sliding window i.e., 1440 rows representing 60 seconds per row is a 24 sliding hour window with 1 minute resolution.
anomaly_config(string) - (see Anomaly Detection Module)
Example Heka Configuration
[DiskStatsFilter]
type = "SandboxFilter"
filename = "lua_filters/diskstats.lua"
preserve_data = true
message_matcher = "Type == 'stats.diskstats'"
Calculates the most frequent items in a data stream.
Config:
The message variable name containing the items to be counted.
The maximum size of the sample set (higher will produce a more accurate list).
Used to reduce the long tail output by only outputting the higher frequency items.
Resets the list after the specified number of days (on the UTC day boundary). A value of 0 will never reset the list.
Example Heka Configuration
[FxaAuthServerFrequentIP]
type = "SandboxFilter"
filename = "lua_filters/frequent_items.lua"
ticker_interval = 60
preserve_data = true
message_matcher = "Logger == 'nginx.access' && Type == 'fxa-auth-server'"
[FxaAuthServerFrequentIP.config]
message_variable = "Fields[remote_addr]"
max_items = 10000
min_output_weight = 100
reset_days = 1
Graphs the Heka memory statistics using the heka.memstat message generated by pipeline/report.go.
Config:
Sets the size of the sliding window i.e., 1440 rows representing 60 seconds per row is a 24 sliding hour window with 1 minute resolution.
Sets the size of each bucket (resolution in seconds) in the sliding window.
If preserve_data = true is set in the SandboxFilter configuration, then this value should be incremented every time the rows or sec_per_row configuration is changed to prevent the plugin from failing to start during data restoration.
Example Heka Configuration
[HekaMemstat]
type = "SandboxFilter"
filename = "lua_filters/heka_memstat.lua"
ticker_interval = 60
preserve_data = true
message_matcher = "Type == 'heka.memstat'"
Generates documentation for each unique message in a data stream. The output is a hierarchy of Logger, Type, EnvVersion, and a list of associated message field attributes including their counts (number in the brackets). This plugin is meant for data discovery/exploration and should not be left running on a production system.
Config:
<none>
Example Heka Configuration
[SyncMessageSchema]
type = "SandboxFilter"
filename = "lua_filters/heka_message_schema.lua"
ticker_interval = 60
preserve_data = false
message_matcher = "Logger =~ /^Sync/"
Example Output
Monitors Heka’s process message failures by plugin.
Config:
A list of anomaly detection specifications. If not specified a default of ‘mww_nonparametric(“DEFAULT”, 1, 5, 10, 0.7)’ is used. The “DEFAULT” settings are applied to any plugin without an explict specification.
Example Heka Configuration
[HekaProcessMessageFailures]
type = "SandboxFilter"
filename = "lua_filters/heka_process_message_failures.lua"
ticker_interval = 60
preserve_data = false # the counts are reset on Heka restarts and the monitoring should be too.
message_matcher = "Type == 'heka.all-report'"
Graphs HTTP status codes using the numeric Fields[status] variable collected from web server access logs.
Config:
Sets the size of each bucket (resolution in seconds) in the sliding window.
Sets the size of the sliding window i.e., 1440 rows representing 60 seconds per row is a 24 sliding hour window with 1 minute resolution.
If preserve_data = true is set in the SandboxFilter configuration, then this value should be incremented every time the sec_per_row or rows configuration is changed to prevent the plugin from failing to start during data restoration.
Example Heka Configuration
[FxaAuthServerHTTPStatus]
type = "SandboxFilter"
filename = "lua_filters/http_status.lua"
ticker_interval = 60
preserve_data = true
message_matcher = "Logger == 'nginx.access' && Type == 'fxa-auth-server'"
[FxaAuthServerHTTPStatus.config]
sec_per_row = 60
rows = 1440
anomaly_config = 'roc("HTTP Status", 2, 15, 0, 1.5, true, false) roc("HTTP Status", 4, 15, 0, 1.5, true, false) mww_nonparametric("HTTP Status", 5, 15, 10, 0.8)'
preservation_version = 0
Graphs the load average and process count data. Expects to receive messages containing fields entitled 1MinAvg, 5MinAvg, 15MinAvg, and NumProcesses, such as those generated by the Linux Load Average Decoder.
Config:
Sets the size of each bucket (resolution in seconds) in the sliding window.
Sets the size of the sliding window i.e., 1440 rows representing 60 seconds per row is a 24 sliding hour window with 1 minute resolution.
If preserve_data = true is set in the SandboxFilter configuration, then this value should be incremented every time the sec_per_row or rows configuration is changed to prevent the plugin from failing to start during data restoration.
Example Heka Configuration
[LoadAvgFilter]
type = "SandboxFilter"
filename = "lua_filters/loadavg.lua"
ticker_interval = 60
preserve_data = true
message_matcher = "Type == 'stats.loadavg'"
Graphs memory usage statistics. Expects to receive messages with memory usage data embedded in a specific set of message fields, which matches the messages generated by Linux Memory Stats Decoder: MemFree, Cached, Active, Inactive, VmallocUsed, Shmem, SwapCached.
Config:
Sets the size of each bucket (resolution in seconds) in the sliding window.
Sets the size of the sliding window i.e., 1440 rows representing 60 seconds per row is a 24 sliding hour window with 1 minute resolution.
If preserve_data = true is set in the SandboxFilter configuration, then this value should be incremented every time the sec_per_row or rows configuration is changed to prevent the plugin from failing to start during data restoration.
Example Heka Configuration
[MemoryStatsFilter]
type = "SandboxFilter"
filename = "lua_filters/memstats.lua"
ticker_interval = 60
preserve_data = true
message_matcher = "Type == 'stats.memstats'"
Graphs MySQL slow query data produced by the MySQL Slow Query Log Decoder.
Config:
Sets the size of each bucket (resolution in seconds) in the sliding window.
Sets the size of the sliding window i.e., 1440 rows representing 60 seconds per row is a 24 sliding hour window with 1 minute resolution.
If preserve_data = true is set in the SandboxFilter configuration, then this value should be incremented every time the sec_per_row or rows configuration is changed to prevent the plugin from failing to start during data restoration.
Example Heka Configuration
[Sync-1_5-SlowQueries]
type = "SandboxFilter"
message_matcher = "Logger == 'Sync-1_5-SlowQuery'"
ticker_interval = 60
filename = "lua_filters/mysql_slow_query.lua"
[Sync-1_5-SlowQueries.config]
anomaly_config = 'mww_nonparametric("Statistics", 5, 15, 10, 0.8)'
preservation_version = 0
Converts stat values extracted from statmetric messages (see StatAccumInput) to circular buffer data and periodically emits messages containing this data to be graphed by a DashboardOutput. Note that this filter expects the stats data to be available in the message fields, so the StatAccumInput must be configured with emit_in_fields set to true for this filter to work correctly.
Config:
Title for the graph output generated by this filter.
The number of rows to store in our circular buffer. Each row represents one time interval.
The number of seconds in each circular buffer time interval.
Space separated list of stat names. Each specified stat will be expected to be found in the fields of the received statmetric messages, and will be extracted and inserted into its own column in the accumulated circular buffer.
Space separated list of header label names to use for the extracted stats. Must be in the same order as the specified stats. Any label longer than 15 characters will be truncated.
Anomaly detection configuration, see Anomaly Detection Module.
If preserve_data = true is set in the SandboxFilter configuration, then this value should be incremented every time any edits are made to your rows, sec_per_row, stats, or stat_labels values, or else Heka will fail to start because the preserved data will no longer match the filter’s data structure.
Example Heka Configuration
[stat-graph]
type = "SandboxFilter"
filename = "lua_filters/stat_graph.lua"
ticker_interval = 10
preserve_data = true
message_matcher = "Type == 'heka.statmetric'"
[stat-graph.config]
title = "Hits and Misses"
rows = 1440
sec_per_row = 10
stats = "stats.counters.hits.count stats.counters.misses.count"
stat_labels = "hits misses"
anomaly_config = 'roc("Hits and Misses", 1, 15, 0, 1.5, true, false) roc("Hits and Misses", 2, 15, 0, 1.5, true, false)'
preservation_version = 0
Counts the number of unique items per day e.g. active daily users by uid.
Config:
The Heka message variable containing the item to be counted.
The graph title for the cbuf output.
Specifies whether or not this plugin should generate cbuf deltas. Deltas should be enabled when sharding is used; see: Circular Buffer Delta Aggregator.
If preserve_data = true is set in the SandboxFilter configuration, then this value should be incremented every time the enable_delta configuration is changed to prevent the plugin from failing to start during data restoration.
Example Heka Configuration
[FxaActiveDailyUsers]
type = "SandboxFilter"
filename = "lua_filters/unique_items.lua"
ticker_interval = 60
preserve_data = true
message_matcher = "Logger == 'FxaAuth' && Type == 'request.summary' && Fields[path] == '/v1/certificate/sign' && Fields[errno] == 0"
[FxaActiveDailyUsers.config]
message_variable = "Fields[uid]"
title = "Estimated Active Daily Users"
preservation_version = 0
The SandboxEncoder provides an isolated execution environment for converting messages into binary data without the need to recompile Heka. See Sandbox.
Config:
Example
[custom_json_encoder]
type = "SandboxEncoder"
filename = "path/to/custom_json_encoder.lua"
[custom_json_encoder.config]
msg_fields = ["field1", "field2"]
Produces more human readable alert messages.
Config:
<none>
Example Heka Configuration
[FxaAlert]
type = "SmtpOutput"
message_matcher = "Type == 'heka.sandbox-output' && Fields[payload_type] == 'alert' && Logger =~ /^Fxa/" || Type == 'heka.sandbox-terminated' && Fields[plugin] =~ /^Fxa/"
send_from = "heka@example.com"
send_to = ["alert@example.com"]
auth = "Plain"
user = "test"
password = "testpw"
host = "localhost:25"
encoder = "AlertEncoder"
[AlertEncoder]
type = "SandboxEncoder"
filename = "lua_encoders/alert.lua"
Example Output
Timestamp: | 2014-05-14T14:20:18Z |
---|---|
Hostname: | ip-10-226-204-51 |
Plugin: | FxaBrowserIdHTTPStatus |
Alert: | HTTP Status - algorithm: roc col: 1 msg: detected anomaly, standard deviation exceeds 1.5 |
Extracts data from SandboxFilter circular buffer output messages and uses it to generate time series JSON structures that will be accepted by Librato’s POST API. It will keep track of the last time it’s seen a particular message, keyed by filter name and output name. The first time it sees a new message, it will send data from all of the rows except the last one, which is possibly incomplete. For subsequent messages, the encoder will automatically extract data from all of the rows that have elapsed since the last message was received.
The SandboxEncoder preserve_data setting should be set to true when using this encoder, or else the list of received messages will be lost whenever Heka is restarted, possibly causing the same data rows to be sent to Librato multiple times.
Config:
<none>
Example Heka Configuration
[cbuf_librato_encoder]
type = "SandboxEncoder"
filename = "lua_encoders/cbuf_librato"
preserve_data = true
[librato]
type = "HttpOutput"
message_matcher = "Type == 'heka.sandbox-output && Fields[payload_type] == 'cbuf'"
encoder = "cbuf_librato_encoder"
address = "https://metrics-api.librato.com/v1/metrics"
username = "username@example.com"
password = "SECRET"
[librato.headers]
Content-Type = ["application/json"]
Example Output
{"gauges":[{"value":12,"measure_time":1410824950,"name":"HTTP_200","source":"thor"},{"value":1,"measure_time":1410824950,"name":"HTTP_300","source":"thor"},{"value":1,"measure_time":1410824950,"name":"HTTP_400","source":"thor"}]}
Prepends ElasticSearch BulkAPI index JSON to a message payload.
Config:
String to use as the _index key’s value in the generated JSON. Supports field interpolation as described below.
String to use as the _type key’s value in the generated JSON. Supports field interpolation as described below.
String to use as the _id key’s value in the generated JSON. Supports field interpolation as described below.
If true, then any time interpolation (often used to generate the ElasticSeach index) will use the timestamp from the processed message rather than the system time.
Field interpolation:
Data from the current message can be interpolated into any of the string arguments listed above. A %{} enclosed field name will be replaced by the field value from the current message. Supported default field names are “Type”, “Hostname”, “Pid”, “UUID”, “Logger”, “EnvVersion”, and “Severity”. Any other values will be checked against the defined dynamic message fields. If no field matches, then a C strftime (on non-Windows platforms) or C89 strftime (on Windows) time substitution will be attempted.
Example Heka Configuration
[es_payload]
type = "SandboxEncoder"
filename = "lua_encoders/es_payload.lua"
[es_payload.config]
es_index_from_timestamp = true
index = "%{Logger}-%{%Y.%m.%d}"
type_name = "%{Type}-%{Hostname}"
[ElasticSearchOutput]
message_matcher = "Type == 'mytype'"
encoder = "es_payload"
Example Output
{"index":{"_index":"mylogger-2014.06.05","_type":"mytype-host.domain.com"}}
{"json":"data","extracted":"from","message":"payload"}
Converts full Heka message contents to JSON for InfluxDB HTTP API. Includes all standard message fields and iterates through all of the dynamically specified fields, skipping any bytes fields or any fields explicitly omitted using the skip_fields config option.
Config:
String to use as the series key’s value in the generated JSON. Supports interpolation of field values from the processed message, using %{fieldname}. Any fieldname values of “Type”, “Payload”, “Hostname”, “Pid”, “Logger”, “Severity”, or “EnvVersion” will be extracted from the the base message schema, any other values will be assumed to refer to a dynamic message field. Only the first value of the first instance of a dynamic message field can be used for series name interpolation. If the dynamic field doesn’t exist, the uninterpolated value will be left in the series name. Note that it is not possible to interpolate either the “Timestamp” or the “Uuid” message fields into the series name, those values will be interpreted as referring to dynamic message fields.
Space delimited set of fields that should not be included in the InfluxDB records being generated. Any fieldname values of “Type”, “Payload”, “Hostname”, “Pid”, “Logger”, “Severity”, or “EnvVersion” will be assumed to refer to the corresponding field from the base message schema, any other values will be assumed to refer to a dynamic message field.
Example Heka Configuration
[influxdb]
type = "SandboxEncoder"
filename = "lua_encoders/influxdb.lua"
[influxdb.config]
series = "heka.%{Logger}"
skip_fields = "Pid EnvVersion"
[InfluxOutput]
message_matcher = "Type == 'influxdb'"
encoder = "influxdb"
type = "HttpOutput"
address = "http://influxdbserver.example.com:8086/db/databasename/series"
username = "influx_username"
password = "influx_password"
Example Output
[{"points":[[1.409378221e+21,"log","test","systemName","TcpInput",5,1,"test"]],"name":"heka.MyLogger","columns":["Time","Type","Payload","Hostname","Logger","Severity","syslogfacility","programname"]}]
Extracts data from message fields in heka.statmetric messages generated by a StatAccumInput and generates JSON suitable for use with InfluxDB’s HTTP API. StatAccumInput must be configured with emit_in_fields = true for this encoder to work correctly.
Config:
<none>
Example Heka Configuration
[statmetric-influx-encoder]
type = "SandboxEncoder"
filename = "lua_encoders/statmetric_influx.lua"
[influx]
type = "HttpOutput"
message_matcher = "Type == 'heka.statmetric'"
address = "http://myinfluxserver.example.com:8086/db/stats/series"
encoder = "statmetric-influx-encoder"
username = "influx_username"
password = "influx_password"
Example Output
[{"points":[[1408404848,78271]],"name":"stats.counters.000000.rate","columns":["time","value"]},{"points":[[1408404848,78271]],"name":"stats.counters.000000.count","columns":["time","value"]},{"points":[[1408404848,17420]],"name":"stats.timers.000001.count","columns":["time","value"]},{"points":[[1408404848,17420]],"name":"stats.timers.000001.count_ps","columns":["time","value"]},{"points":[[1408404848,1]],"name":"stats.timers.000001.lower","columns":["time","value"]},{"points":[[1408404848,1024]],"name":"stats.timers.000001.upper","columns":["time","value"]},{"points":[[1408404848,8937851]],"name":"stats.timers.000001.sum","columns":["time","value"]},{"points":[[1408404848,513.07985074627]],"name":"stats.timers.000001.mean","columns":["time","value"]},{"points":[[1408404848,461.72356167879]],"name":"stats.timers.000001.mean_90","columns":["time","value"]},{"points":[[1408404848,925]],"name":"stats.timers.000001.upper_90","columns":["time","value"]},{"points":[[1408404848,2]],"name":"stats.statsd.numStats","columns":["time","value"]}]
Since decoders cannot be dynamically loaded and they stop Heka processing on fatal errors they must be developed outside of a production enviroment. Most Lua decoders are LPeg based as it is the best way to parse and transform data within the sandbox. The other alternatives are the built-in Lua pattern matcher or the JSON parser with a manual transformation.
Procure some sample data to be used as test input.
timestamp=time_t key1=data1 key2=data2
Configure a simple LogstreamerInput to deliver the data to your decoder.
[LogstreamerInput] log_directory = "." file_match = 'data\.log' decoder = "SandboxDecoder"
Configure your test decoder.
[SandboxDecoder] filename = "decoder.lua"
Configure the DasboardOutput for visibility into the decoder (performance, memory usage, messages processed/failed, etc.)
[DashboardOutput] address = "127.0.0.1:4352" ticker_interval = 10 working_directory = "dashboard" static_directory = "/usr/share/heka/dasher"
Configure a LogOutput to display the generated messages.
[LogOutput] message_matcher = "TRUE"
The decoder will receive a message from an input plugin. The input may have set some additional message headers but the ‘Payload’ header contains the data for the decoder. The decoder can access the payload using read_message(“Payload”). The payload can be used to construct an entirely new message, multiple messages or modify any part of the existing message (see inject_message, write_message in the Lua Sandbox API). Message headers not modified by the decoder are left intact and in the case of multiple message injections the initial message header values are duplicated for each message.
Incrementally build and test your grammar using http://lpeg.trink.com.
Test match expressions using http://www.lua.org/cgi-bin/demo.
For data transformation use the LPeg/Lua matcher links above. Something like simple field remapping i.e. msg.Hostname = json.host can be verified in the LogOutput.
Run Heka with the test configuration.
Inspect/verify the messages written by LogOutput.
Since filters can be dynamically loaded it is recommended you develop them in production with live data.
OR
If you are developing the filter in conjunction with the decoder you can add it to the test configuration.
[SandboxFilter] filename = "filter.lua"
Debugging
Watch for a dashboard sandbox termination report. The termination message provides the line number and cause of the failure. These are usually straight forward to correct and commonly caused by a syntax error in the script or invalid assumptions about the data (e.g. cnt = cnt + read_message(“Fields[counter]”) will fail if the counter field doesn’t exist or is non-numeric due to a error in the data).
No termination report and the output does not match expectations. These are usually a little harder to debug.
- Check the Heka dasboard to make sure the router is sending messages to the plugin. If not, verify your message_matcher configuration.
- Visually review the the plugin for errors. Are the message field names correct, was the result of the cjson.decode tested, are the output variables actually being assigned to and output/injected, etc.
- Add a debug output message with the pertinent information.
require "string" require "table" local dbg = {} -- table.insert(dbg, string.format("Entering function x arg1: %s", arg1)) -- table.insert(dbg, "Exiting function x") inject_payload("txt", "debug", table.concat(dbg, "\n"))
- LAST RESORT: Move the filter out of production, turn on preservation, run the tests, stop Heka, and review the entire preserved state of the filter.
heka-flood is a Heka load test tool; it is capable of generating a large number of messages to exercise Heka using different protocols, message types, and error conditions.
Example:
heka-flood -config="/etc/flood.toml" -test="my_test_name"
test (object): Name of the test section (toml key) in the configuration file.
ip_address (string): IP address of the Heka server.
sender (string): tcp or udp
pprof_file (string): The name of the file to save the profiling data to.
encoder (string): protobuf or json
num_messages (int): The number of messages to be sent, 0 for infinite.
corrupt_percentage (float): The percentage of messages that will be randomly corrupted.
signed_percentage (float): The percentage of message that will signed.
variable_size_messages (bool): True, if a random selection of variable size messages are to be sent. False, if a single fixed message will be sent.
ascii_only (bool): True, if generated message payloads should only contain ASCII characters. False, if message payloads should contain arbitrary binary data. Defaults to false.
New in version 0.5.
Example
[default]
ip_address = "127.0.0.1:5565"
sender = "tcp"
pprof_file = ""
encoder = "protobuf"
num_messages = 0
corrupt_percentage = 0.0001
signed_percentage = 0.00011
variable_size_messages = true
[default.signer]
name = "test"
hmac_hash = "md5"
hmac_key = "4865ey9urgkidls xtb0[7lf9rzcivthkm"
version = 0
New in version 0.5.
heka-inject is a Heka client allowing for the injecting of arbitrary messages into the Heka pipeline. It is capable of generating a message of specified message variables with values. It allows for quickly testing plugins. Inject requires TcpInput with Protobufs encoder availability.
Example:
heka-inject -payload="Test message with high severity." -severity=1
New in version 0.5.
A command-line utility for counting, viewing, filtering, and extracting Heka protobuf logs.
Example:
heka-cat -format=count -match="Fields[status] == 404" test.log
Output:
Input:test.log Offset:0 Match:Fields[status] == 404 Format:count Tail:false Output:
Processed: 1002646, matched: 15660 messages
Many input and output plugins that rely on TCP as the underlying transport for network communication also support the use of SSL/TLS encryption for their connections. Typically the TOML configuration for these plugins will support a boolean use_tls flag that specifies whether or not encryption should be used, and a tls sub-section that specifies the settings to be used for negotiating the TLS connections. If use_tls is not set to true, the tls section will be ignored.
Modeled after Go’s stdlib TLS configuration struct, the same configuration structure is used for both client and server connections, with some of the settings being applicable for a client’s configuration, some for a server’s, and some for both. In the description of the TLS configuration settings below, each setting is marked as appropriate to client, server, or both as appropriate.
Name of the server being requested. Included in the client handshake to support virtual hosting server environments.
Full filesystem path to the certificate file to be presented to the other side of the connection.
Full filesystem path to the specified certificate’s associated private key file.
Specifies the server’s policy for TLS client authentication. Must be one of the following values:
Defaults to “NoClientCert”.
List of cipher suites supported for TLS connections. Earlier suites in the list have priority over those following. Must only contain values from the following selection:
If omitted, the implementation’s default ordering will be used.
If true, TLS client connections will accept any certificate presented by the server and any host name in that certificate. This causes TLS to be susceptible to man-in-the-middle attacks and should only be used for testing. Defaults to false.
If true, a server will always favor the server’s specified cipher suite priority order over that requested by the client. Defaults to true.
If true, session resumption support as specified in RFC 5077 will be disabled.
Used by the TLS server to provide session resumption per RFC 5077. If left empty, it will be filled with random data before the first server handshake.
Specifies the mininum acceptable SSL/TLS version. Must be one of the following values:
Defaults to SSL30.
Specifies the maximum acceptable SSL/TLS version. Must be one of the following values:
Defaults to TLS12.
File for server to authenticate client TLS handshake. Any client certs recieved by server must be chained to a CA found in this PEM file.
Has no effect when NoClientCert is set.
File for client to authenticate server TLS handshake. Any server certs recieved by client must be must be chained to a CA found in this PEM file.
The following is a sample TcpInput configuration showing the use of TLS encryption.
[TcpInput]
address = ":5565"
parser_type = "message.proto"
decoder = "ProtobufDecoder"
use_tls = true
[TcpInput.tls]
cert_file = "/usr/share/heka/tls/cert.pem"
key_file = "/usr/share/heka/tls/cert.key"
client_auth = "RequireAndVerifyClientCert"
prefer_server_ciphers = true
min_version = "TLS11"
This plugin detects anomalies in the data. When an anomaly is detected an alert is generated and the graph is visually annotated at the time of the alert. See dygraphs Annotations for the available annotation properties.
-- This Source Code Form is subject to the terms of the Mozilla Public
-- License, v. 2.0. If a copy of the MPL was not distributed with this
-- file, You can obtain one at http://mozilla.org/MPL/2.0/.
--[[
Collects the circular buffer delta output from multiple instances of an upstream
sandbox filter (the filters should all be the same version at least with respect
to their cbuf output). The purpose is to recreate the view at a larger scope in
each level of the aggregation i.e., host view -> datacenter view -> service
level view.
Config:
- enable_delta (bool, optional, default false)
Specifies whether or not this aggregator should generate cbuf deltas.
- anomaly_config(string) - (see :ref:`sandbox_anomaly_module`)
A list of anomaly detection specifications. If not specified no anomaly
detection/alerting will be performed.
- preservation_version (uint, optional, default 0)
If `preserve_data = true` is set in the SandboxFilter configuration, then
this value should be incremented every time the `enable_delta`
configuration is changed to prevent the plugin from failing to start
during data restoration.
*Example Heka Configuration*
.. code-block:: ini
[TelemetryServerMetricsAggregator]
type = "SandboxFilter"
message_matcher = "Logger == 'TelemetryServerMetrics' && Fields[payload_type] == 'cbufd'"
ticker_interval = 60
filename = "lua_filters/cbufd_aggregator.lua"
preserve_data = true
[TelemetryServerMetricsAggregator.config]
enable_delta = false
anomaly_config = 'roc("Request Statistics", 1, 15, 0, 1.5, true, false)'
preservation_version = 0
--]]
_PRESERVATION_VERSION = read_config("preservation_version") or 0
local alert = require "alert"
local annotation= require "annotation"
local anomaly = require "anomaly"
local cbufd = require "cbufd"
require "circular_buffer"
local enable_delta = read_config("enable_delta") or false
local anomaly_config = anomaly.parse_config(read_config("anomaly_config"))
cbufs = {}
local function init_cbuf(payload_name, data)
local ok, h = pcall(cjson.decode, data.header)
if not ok then
return nil
end
local cb = circular_buffer.new(h.rows, h.columns, h.seconds_per_row, enable_delta)
for i,v in ipairs(h.column_info) do
cb:set_header(i, v.name, v.unit, v.aggregation)
end
annotation.set_prune(payload_name, h.rows * h.seconds_per_row * 1e9)
cbufs[payload_name] = cb
return cb
end
function process_message ()
local payload = read_message("Payload")
local payload_name = read_message("Fields[payload_name]") or ""
local data = cbufd.grammar:match(payload)
if not data then
return -1
end
local cb = cbufs[payload_name]
if not cb then
cb = init_cbuf(payload_name, data)
if not cb then
return -1
end
end
for i,v in ipairs(data) do
for col, value in ipairs(v) do
if value == value then -- NaN test, only aggregrate numbers
local n, u, agg = cb:get_header(col)
if agg == "sum" then
cb:add(v.time, col, value)
elseif agg == "min" or agg == "max" then
cb:set(v.time, col, value)
end
end
end
end
return 0
end
function timer_event(ns)
for k,v in pairs(cbufs) do
if anomaly_config then
if not alert.throttled(ns) then
local msg, annos = anomaly.detect(ns, k, v, anomaly_config)
if msg then
alert.queue(ns, msg)
annotation.concat(k, annos)
end
end
inject_payload("cbuf", k, annotation.prune(k, ns), v)
else
inject_payload("cbuf", k, v)
end
if enable_delta then
inject_payload("cbufd", k, v:format("cbufd"))
end
end
alert.send_queue(ns)
end
Alters a date/time string in the JSON payload to be RFC3339 compliant. In this example the JSON is parsed, transformed and re-injected into the payload.
-- This Source Code Form is subject to the terms of the Mozilla Public
-- License, v. 2.0. If a copy of the MPL was not distributed with this
-- file, You can obtain one at http://mozilla.org/MPL/2.0/.
require "string"
require "cjson"
-- sample input {"name":"android_app_created","created_at":"2013-11-15 22:37:34.709739275"}
local date_pattern = '^(%d+-%d+-%d+) (%d+:%d+:%d+%.%d+)'
function process_message ()
local ok, json = pcall(cjson.decode, read_message("Payload"))
if not ok then
return -1
end
local d, t = string.match(json.created_at, date_pattern)
if d then
json.created_at = string.format("%sT%sZ", d, t)
inject_payload("json", "transformed timestamp", cjson.encode(json))
end
return 0
end
-- sample output
--2013/11/15 15:25:56 <
-- Timestamp: 2013-11-15 15:25:56.826184879 -0800 PST
-- Type: logfile
-- Hostname: trink-x230
-- Pid: 0
-- UUID: ef5de908-822a-4fe1-a564-ad3b5a9631c6
-- Logger: test.log
-- Payload: {"name":"android_app_created","created_at":"2013-11-15T22:37:34.709739275Z"}
--
-- EnvVersion: 0.8
-- Severity: 0
-- Fields: [name:"payload_type" value_type:STRING representation:"file-extension" value_string:"json" name:"payload_name" value_type:STRING representation:"" value_string:"transformed timestamp" ]
Alters a date/time string in the JSON payload to be RFC3339 compliant. In this example a search and replace is performed on the JSON text and re-injected into the payload.
-- This Source Code Form is subject to the terms of the Mozilla Public
-- License, v. 2.0. If a copy of the MPL was not distributed with this
-- file, You can obtain one at http://mozilla.org/MPL/2.0/.
require "cjson"
require "string"
-- sample input {"name":"android_app_created","created_at":"2013-11-15 22:37:34.709739275"}
local date_pattern = '("created_at":)"(%d+-%d+-%d+) (%d+:%d+:%d+%.%d+)"'
function process_message ()
local pl = read_message("Payload")
local json, cnt = string.gsub(pl, date_pattern, '%1"%2T%3Z"', 1)
if cnt == 0 then
return -1
end
inject_payload("json", "transformed timestamp S&R", cjson.encode(json))
return 0
end
-- sample output
--2013/11/18 09:20:41 <
-- Timestamp: 2013-11-18 09:20:41.252096692 -0800 PST
-- Type: logfile
-- Hostname: trink-x230
-- Pid: 0
-- UUID: e8298865-fbc2-422c-873e-1210ce8efd9f
-- Logger: test.log
-- Payload: {"name":"android_app_created","created_at":"2013-11-15T22:37:34.709739275Z"}
--
-- EnvVersion: 0.8
-- Severity: 0
-- Fields: [name:"payload_type" value_type:STRING representation:"file-extension" value_string:"json" name:"payload_name" value_type:STRING representation:"" value_string:"transformed timestamp" ]
These are the configuration options that are universally available to all Sandbox plugins. The are consumed by Heka when it initializes the plugin.
The language the sandbox is written in. Currently the only valid option is ‘lua’ which is the default.
The path to the sandbox code; if specified as a relative path it will be appended to Heka’s global share_dir.
True if the sandbox global data should be preserved/restored on plugin shutdown/startup. When true this works in conjunction with a global Lua _PRESERVATION_VERSION variable which is examined during restoration; if the previous version does not match the current version the restoration will be aborted and the sandbox will start cleanly. _PRESERVATION_VERSION should be incremented any time an incompatible change is made to the global data schema. If no version is set the check will always succeed and a version of zero is assumed.
The number of bytes the sandbox is allowed to consume before being terminated (default 8MiB).
The number of instructions the sandbox is allowed to execute during the process_message/timer_event functions before being terminated (default 1M).
The number of bytes the sandbox output buffer can hold before being terminated (default 63KiB). Warning: messages exceeding 64KiB will generate an error and be discarded by the standard output plugins (File, TCP, UDP) since they exceed the maximum message size.
The directory where ‘require’ will attempt to load the external Lua modules from. Defaults to ${SHARE_DIR}/lua_modules.
A map of configuration variables available to the sandbox via read_config. The map consists of a string key with: string, bool, int64, or float64 values.
New in version 0.5.
The Logstreamer plugin scans, sorts, and reads logstreams in a sequential user-defined order, differentiating multiple logstreams found in a search based on a user-defined differentiator.
A “logstream” is a single, linear data stream that is spread across one or more sequential log files. For instance, an Apache or nginx server typically generates two logstreams for each domain: an access log and an error log. Each stream might be written to a single log file that is periodically truncated (ick!) or rotated (better), with some number of historical versions being kept (e.g. access-example.com.log, access-example.com.log.0, access-example.com.log.1, etc.). Or, better yet, the server might periodically create new timestamped files so that the ‘tip’ of the logstream jumps from file to file (e.g. access- example.com-2014.01.28.log, access-example.com-2014.01.27.log, access- example.com-2014.01.26.log, etc.). The job of Heka’s Logstreamer plugin is to understand the file naming and ordering conventions for a single type of logstream (e.g. “all of the nginx server’s domain access logs”), and to use that to watch the specified directories and load the right files in the right order. The plugin will also track its location in the stream so it can resume from where it left off after a restart, even in cases where the file may have rotated during the downtime.
To make it easier to parse multiple logstreams, the Logstreamer plugin can be specified a single time with a single decoder for all the logstreams that should be parsed with it.
Given the flexibility of the Logstreamer, configuration can be more complex for the more advanced use-cases. We’ll start with the simplest use-case and work towards the most complex.
This is the basic use-case where a single logfile should be read that the system may rotate/truncate at some time (hopefully not using truncation though that condition is handled). Log rotation inherently has a risk that some loglines written may be missed if the program reading the log happens to die at exactly the wrong time that the rotation is occuring.
An example of a single rotating logfile would be the case where you want to watch /var/log/system.log for all new entries. Here’s what the configuration for such a case looks like:
[syslog]
type = "LogstreamerInput"
log_directory = "/var/log"
file_match = 'system\.log'
Note
The file_match config value above is delimited with single quotes instead of double quotes (i.e. ‘system\.log’ vs. “system\.log”) because single quotes indicate raw strings that do not require backslashes to be escaped. If you use double quotes around your regular expressions you’ll need to escape backslashes by doubling them up, e.g. “system\\.log”.
We start with the highest directory to start scanning for files under, in this case /var/log. Then the files under that directory (recursively searching in sub-directories) are matched against the file_match.
The log_directory should be the most specific directory of files to match to prevent excessive file scanning to locate the file_match‘s.
This use-case is similar to the single rotating logfile above except there are multiple separate files with the same policy.
An example of multiple single rotating logfiles would be a system that logs the access for each domain name to a separate access log. In this case to differentiate them, we will need to indicate what part of the file_match indicates its a separate logfile (using the domain name as the differentiator).
[accesslogs]
type = "LogstreamerInput"
log_directory = "/var/log/nginx"
file_match = '(?P<DomainName>[^/]+)-access\.log'
differentiator = ["nginx.", "DomainName", ".access"]
Note that we included two strings in the differentiator that don’t correspond to a part in the file_match regular expression. These two parts will be included as is to create the logger name attached to each message. So a file:
/var/log/nginx/hekathings.com-access.log
Will have all its messages in heka with the logger name set to nginx.hekathings.com.access.
What happens if you have a log structure like this?
/var/log/nginx/access.log
/var/log/nginx/access.log.1
/var/log/nginx/access.log.2
/var/log/nginx/access.log.3
Or perhaps like this?
/var/log/nginx/2014/08/1.access.log
/var/log/nginx/2014/08/2.access.log
/var/log/nginx/2014/08/3.access.log
/var/log/nginx/2014/08/4.access.log
Or a combination of them?
/var/log/nginx/2014/08/access.log
/var/log/nginx/2014/08/access.log.1
/var/log/nginx/2014/08/access.log.2
/var/log/nginx/2014/08/access.log.3
(Hopefully your setup isn’t worse than any of these... but even if it is then Logstreamer can handle it.)
Handling a single access log that is sequential and rotated (the first example) can be tricky. The second case where rotation doesn’t occur and new logfiles are written every day with new months/years result in new directories was previously quite difficult to handle. Both of these cases can be handled by the LogstreamerInput.
The other (fun) problem with the second case is that if you use a raw string listing of the directory then 11.access.log will come before 2.access.log which is not good if you expect the logs to be in order.
Let’s look at the config for the first case, note that the numbers incrementing in this case represent the files getting older (the higher the number, the older the log data):
[accesslogs]
type = "LogstreamerInput"
log_directory = "/var/log/nginx"
file_match = 'access\.log\.?(?P<Seq>\d*)'
priority = ["^Seq"]
When handling sequential logfiles in a logstream, we need to indicate a list of matched parts in the file_match that will be used to sort the files matching in order from oldest -> newest. By default, the numbers are sorted in ascending order (which properly reflects oldest first if the number represents the year, month, or day). To indicate that we should sort in descending order we use the ^ in front of the matched part to sort on (Seq).
Here’s what a configuration for the second case:
[accesslogs]
type = "LogstreamerInput"
log_directory = "/var/log/nginx"
file_match = '(?P<Year>\d+)/(?P<Month>\d+)/(?P<Day>\d+)\.access\.log'
priority = ["Year", "Month", "Day"]
First we match the portions to be sorted on, and then we specify the priority of matched portions to sort with. In this case the lower numbers represent older data so none of them need to be prefixed with ^.
Finally, the last configuration is a mix of the prior two:
[accesslogs]
type = "LogstreamerInput"
log_directory = "/var/log/nginx"
file_match = '(?P<Year>\d+)/(?P<Month>\d+)/access\.log\.?(?P<Seq>\d*)'
priority = ["Year", "Month", "^Seq"]
Same as before, except now we need to differentiate the sequential streams. We’re only introducing a single parameter here that we’ve seen before to handle the differentiation. Lets take the last case from above and consider it a multiple sequential source.
Example directory layout:
/var/log/nginx/frank.com/2014/08/access.log
/var/log/nginx/frank.com/2014/08/access.log.1
/var/log/nginx/frank.com/2014/08/access.log.2
/var/log/nginx/frank.com/2014/08/access.log.3
/var/log/nginx/george.com/2014/08/access.log
/var/log/nginx/george.com/2014/08/access.log.1
/var/log/nginx/george.com/2014/08/access.log.2
/var/log/nginx/george.com/2014/08/access.log.3
/var/log/nginx/sally.com/2014/08/access.log
/var/log/nginx/sally.com/2014/08/access.log.1
/var/log/nginx/sally.com/2014/08/access.log.2
/var/log/nginx/sally.com/2014/08/access.log.3
In this case we have multiple sequential logfiles for each domain name that are incrementing in date along with rotation when a logfile gets too large (causing rotation of the file within the directory).
Configuration for this case:
[accesslogs]
type = "LogstreamerInput"
log_directory = "/var/log/nginx"
file_match = '(?P<DomainName>[^/]+/(?P<Year>\d+)/(?P<Month>\d+)/access\.log\.?(?P<Seq>\d*)'
priority = ["Year", "Month", "^Seq"]
differentiator = ["nginx-", "DomainName", "-access"]
As in the case for a non-sequential logfile, we supply a differentiator that will be used to file each sequential set of logfiles into a separate logstream.
See also
In the standard configurations above, the assumption has been that any part matched for sorting will be digit(s). This is because the Logstreamer by default will attempt to coerce a matched portion used for sorting into an integer in the event a mapping isn’t available. LogstreamerInput comes with several built-in mappings and allows you to define your own so that matched parts can be translated to integers for sorting purposes.
There are several special regex grouping names you can use that will indicate to the LogstreamerInput that a default mapping should be used:
English full month name or 3-letter version to the appropriate integer.
English full day name or 3-letter version to the appropriate integer.
If the last example above looked like this:
/var/log/nginx/frank.com/2014/Sep/access.log
/var/log/nginx/frank.com/2014/Oct/access.log.1
/var/log/nginx/frank.com/2014/Nov/access.log.2
/var/log/nginx/frank.com/2014/Dec/access.log.3
/var/log/nginx/sally.com/2014/Sep/access.log
/var/log/nginx/sally.com/2014/Oct/access.log.1
/var/log/nginx/sally.com/2014/Nov/access.log.2
/var/log/nginx/sally.com/2014/Dec/access.log.3
Using the default mappings would provide us a simple configuration:
[accesslogs]
type = "LogstreamerInput"
log_directory = "/var/log/nginx"
file_match = '(?P<Domain>[^/]+/(?P<Year>\d+)/(?P<MonthName>\s+)/access\.log\.?(?P<Seq>\d*)'
priority = ["Year", "MonthName", "^Seq"]
differentiator = ["nginx-", "Domain", "-access"]
LogstreamerInput will translate the 3-letter month names automatically before sorting (If used in the differentiator, you will still get the original matched string).
What if your logfiles (for reasons we won’t speculate about) happened to use Pharsi month names but Spanish day names such that it looked like this?
/var/log/nginx/sally.com/2014/Hadukannas/lunes/access.log
/var/log/nginx/sally.com/2014/Turmar/miercoles/access.log
/var/log/nginx/sally.com/2014/Karmabatas/jueves/access.log
/var/log/nginx/sally.com/2014/Karbasiyas/sabado/access.log
It would be easier if the logging scheme just used month and day integers but changing existing systems isn’t always an option, so lets work with this somewhat odd scheme.
The first chunk of our configuration:
[accesslogs]
type = "LogstreamerInput"
log_directory = "/var/log/nginx"
file_match = '(?P<Domain>[^/]+)/(?P<Year>\d+)/(?P<Month>\s+)/(?P<Day>[^/]+/access\.log'
priority = ["Year", "Month", "Day"]
differentiator = ["nginx-", "Domain", "-access"]
Now to supply the important mapping of how to translate Month and Day into sortable integers. We’ll add this:
[accesslogs.translation.Month]
hadukannas = 1
turmar = 2
karmabatas = 4
karbasiyas = 6
[accesslogs.translation.Day]
lunes = 1
miercoles = 3
jueves = 4
sabado = 6
Note
The matched values used are all lowercased before comparison, so ‘lunes’ in the example above would match captured values of ‘lunes’, ‘Lunes’, and ‘LuNeS’ equivalently.
We left off the rest of the month names and day names not used for example purposes. Note that if you prefer the week to begin on a Saturday instead of Monday you can configure it with a custom mapping.
In the examples above, the years and months were embedded in the file path as directory names, but what if the date was embedded into the filenames themselves, with a file naming schema like so?
/var/log/nginx/sally.com/access.log
/var/log/nginx/sally.com/access-20140803.log
/var/log/nginx/sally.com/access-20140804.log
/var/log/nginx/sally.com/access-20140805.log
/var/log/nginx/sally.com/access-20140806.log
/var/log/nginx/sally.com/access-20140807.log
/var/log/nginx/sally.com/access-20140808.log
Notice how the currently active log file contains no date information at all. As long as you construct your file_match regex correctly this will be fine, Logstreamer will capture all of the files and won’t complain about entries that are missing the match portions. The following config would work to capture all of these files:
[accesslogs]
type = "LogstreamerInput"
log_directory = "/var/log/nginx"
file_match = '(?P<Domain>[^/]+)/access-?(?P<Year>\d4)(?P<Month>\d2)(?P<Day>\d2)\.log'
priority = ["Year", "Month", "Day"]
differentiator = ["nginx-", "Domain", "-access"]
This works to match all of the files because match groups are implicitly optional and we explicitly made the hyphen separator optional by following it with a question mark (i.e. -?). We still have a problem, however. Heka will automatically assign a missing match a sort value of -1. Because we’re sorting by date values, which sort naturally in ascending order, the -1 value will come before every other value, it will be considered the oldest file in the stream. This is clearly incorrect, since the currently active file is actually the newest file in the stream.
It is possible to fix this by using a custom translation map to explicitly associate a sort index with the ‘missing’ value, like so:
[accesslogs.translation.Year]
missing = 9999
Note
If you create a translation map with only one key, that key must be ‘missing’. It’s possible to use the ‘missing’ value in a translation map that also contains other keys, but if you have any other key in the map you must include all possible match values, or else Heka will raise an error when it finds a match value that can’t be converted.
Given the configuration complexity for more advanced use-cases, the Logstreamer includes a command line tool that lets you verify options and shows you what logstreams were found, the name, and the order they’ll be parsed in. For convenience the same heka toml config file may be passed in to heka-logstreamer and LogstreamerInput sections will be located and parsed showing you how they were interpreted.
An example configuration that locates logfiles on an OSX system:
[osx-logfiles]
type = "LogstreamerInput"
log_directory = "/var/log"
file_match = '(?P<FileName>[^/]+).log'
differentiator = ["osx-", "FileName", "-logs"]
Running this through heka-logstreamer shows the following:
$ heka-logstreamer -config=test.toml
Found 10 Logstream(s) for section [osx-logfiles].
Logstream name: osx-appstore-logs
Files: 1 (printing oldest to newest)
/var/log/appstore.log
.... more output ....
Logstream name: osx-bookstore-logs
Files: 1 (printing oldest to newest)
/var/log/bookstore.log
Logstream name: osx-install-logs
Files: 1 (printing oldest to newest)
/var/log/install.log
It’s recommended to always run heka-logstreamer first to ensure the configuration behaves as desired.
A hekad configuration file specifies what inputs, decoders, filters, encoders, and outputs will be loaded. The configuration file is in TOML format. TOML looks very similar to INI configuration formats, but with slightly more rich data structures and nesting support.
If hekad’s config file is specified to be a directory, all contained files with a filename ending in ”.toml” will be loaded and merged into a single config. Files that don’t end with ”.toml” will be ignored. Merging will happen in alphabetical order, settings specified later in the merge sequence will win conflicts.
The config file is broken into sections, with each section representing a single instance of a plugin. The section name specifies the name of the plugin, and the “type” parameter specifies the plugin type; this must match one of the types registered via the pipeline.RegisterPlugin function. For example, the following section describes a plugin named “tcp:5565”, an instance of Heka’s plugin type “TcpInput”:
[tcp:5565]
type = "TcpInput"
parser_type = "message.proto"
decoder = "ProtobufDecoder"
address = ":5565"
If you choose a plugin name that also happens to be a plugin type name, then you can omit the “type” parameter from the section and the specified name will be used as the type. Thus, the following section describes a plugin named “TcpInput”, also of type “TcpInput”:
[TcpInput]
address = ":5566"
parser_type = "message.proto"
decoder = "ProtobufDecoder"
Note that it’s fine to have more than one instance of the same plugin type, as long as their configurations don’t interfere with each other.
Any values other than “type” in a section, such as “address” in the above examples, will be passed through to the plugin for internal configuration (see Plugin Configuration).
If a plugin fails to load during startup, hekad will exit at startup. When hekad is running, if a plugin should fail (due to connection loss, inability to write a file, etc.) then hekad will either shut down or restart the plugin if the plugin supports restarting. When a plugin is restarting, hekad will likely stop accepting messages until the plugin resumes operation (this applies only to filters/output plugins).
Plugins specify that they support restarting by implementing the Restarting interface (see Restarting Plugins). Plugins supporting Restarting can have their restarting behavior configured.
An internal diagnostic runner runs every 30 seconds to sweep the packs used for messages so that possible bugs in heka plugins can be reported and pinned down to a likely plugin(s) that failed to properly recycle the pack.
Full documentation on available plugins and settings for each one are in the hekad.plugin(5) pages.
[hekad]
maxprocs = 4
# Heka dashboard for internal metrics and time series graphs
[Dashboard]
type = "DashboardOutput"
address = ":4352"
ticker_interval = 15
# Email alerting for anomaly detection
[Alert]
type = "SmtpOutput"
message_matcher = "Type == 'heka.sandbox-output' && Fields[payload_type] == 'alert'"
send_from = "acme-alert@example.com"
send_to = ["admin@example.com"]
auth = "Plain"
user = "smtp-user"
password = "smtp-pass"
host = "mail.example.com:25"
encoder = "AlertEncoder"
# User friendly formatting of alert messages
[AlertEncoder]
type = "SandboxEncoder"
filename = "lua_encoders/alert.lua"
# Nginx access log reader
[AcmeWebserver]
type = "LogstreamerInput"
log_directory = "/var/log/nginx"
file_match = 'access\.log'
decoder = "CombinedNginxDecoder"
# Nginx access 'combined' log parser
[CombinedNginxDecoder]
type = "SandboxDecoder"
filename = "lua_decoders/nginx_access.lua"
[CombinedNginxDecoder.config]
user_agent_transform = true
user_agent_conditional = true
type = "combined"
log_format = '$remote_addr - $remote_user [$time_local] "$request" $status $body_bytes_sent "$http_referer" "$http_user_agent"'
# Collection and visualization of the HTTP status codes
[AcmeHTTPStatus]
type = "SandboxFilter"
filename = "lua_filters/http_status.lua"
ticker_interval = 60
preserve_data = true
message_matcher = "Logger == 'AcmeWebserver'"
# rate of change anomaly detection on column 1 (HTTP 200)
[AcmeHTTPStatus.config]
anomaly_config = 'roc("HTTP Status", 1, 15, 0, 1.5, true, false)'
Plugins that support being restarted have a set of options that govern how the restart is handled. If preferred, the plugin can be configured to not restart at which point hekad will exit, or it could be restarted only 100 times, or restart attempts can proceed forever.
Adding the restarting configuration is done by adding a config section to the plugins’ config called retries. A small amount of jitter will be added to the delay between restart attempts.
Config:
The longest jitter duration to add to the delay between restarts. Jitter up to 500ms by default is added to every delay to ensure more even restart attempts over time.
The longest delay between attempts to restart the plugin. Defaults to 30s (30 seconds).
The starting delay between restart attempts. This value will be the initial starting delay for the exponential back-off, and capped to be no larger than the max_delay. Defaults to 250ms.
Maximum amount of times to attempt restarting the plugin before giving up and exiting the plugin. Use 0 for no retry attempt, and -1 to continue trying forever (note that this will cause hekad to halt possibly forever if the plugin cannot be restarted). Defaults to -1.
Example:
[AMQPOutput]
url = "amqp://guest:guest@rabbitmq/"
exchange = "testout"
exchange_type = "fanout"
message_matcher = 'Logger == "TestWebserver"'
[AMQPOutput.retries]
max_delay = "30s"
delay = "250ms"
max_retries = 5
hekad(1), hekad.plugin(5)
Connects to a remote AMQP broker (RabbitMQ) and retrieves messages from the specified queue. As AMQP is dynamically programmable, the broker topology needs to be specified in the plugin configuration.
Config:
An AMQP connection string formatted per the RabbitMQ URI Spec.
AMQP exchange name
AMQP exchange type (fanout, direct, topic, or headers).
Whether the exchange should be configured as a durable exchange. Defaults to non-durable.
Whether the exchange is deleted when all queues have finished and there is no publishing. Defaults to auto-delete.
The message routing key used to bind the queue to the exchange. Defaults to empty string.
How many messages to fetch at once before message acks are sent. See RabbitMQ performance measurements for help in tuning this number. Defaults to 2.
Name of the queue to consume from, an empty string will have the broker generate a name for the queue. Defaults to empty string.
Whether the queue is durable or not. Defaults to non-durable.
Whether the queue is exclusive (only one consumer allowed) or not. Defaults to non-exclusive.
Whether the queue is deleted when the last consumer un-subscribes. Defaults to auto-delete.
Allows ability to specify TTL in milliseconds on Queue declaration for expiring messages. Defaults to undefined/infinite.
Decoder name used to transform a raw message body into a structured hekad message. Must be a decoder appropriate for the messages that come in from the exchange. If accepting messages that have been generated by an AMQPOutput in another Heka process then this should be a ProtobufDecoder instance.
A sub-section that specifies the settings to be used for restart behavior. See Configuring Restarting Behavior
New in version 0.6.
An optional sub-section that specifies the settings to be used for any SSL/TLS encryption. This will only have any impact if URL uses the AMQPS URI scheme. See Configuring TLS.
Since many of these parameters have sane defaults, a minimal configuration to consume serialized messages would look like:
[AMQPInput]
url = "amqp://guest:guest@rabbitmq/"
exchange = "testout"
exchange_type = "fanout"
Or you might use a PayloadRegexDecoder to parse OSX syslog messages with the following:
[AMQPInput]
url = "amqp://guest:guest@rabbitmq/"
exchange = "testout"
exchange_type = "fanout"
decoder = "logparser"
[logparser]
type = "MultiDecoder"
subs = ["logline", "leftovers"]
[logline]
type = "PayloadRegexDecoder"
MatchRegex = '\w+ \d+ \d+:\d+:\d+ \S+ (?P<Reporter>[^\[]+)\[(?P<Pid>\d+)](?P<Sandbox>[^:]+)?: (?P Remaining>.*)'
[logline.MessageFields]
Type = "amqplogline"
Hostname = "myhost"
Reporter = "%Reporter%"
Remaining = "%Remaining%"
Logger = "%Logger%"
Payload = "%Remaining%"
[leftovers]
type = "PayloadRegexDecoder"
MatchRegex = '.*'
[leftovers.MessageFields]
Type = "drop"
Payload = ""
New in version 0.8.
The DockerLogInput plugin attaches to all containers running on a host and sends their logs messages into the Heka pipeline. The plugin is based on Logspout by Jeff Lindsay. Messages will be populated as follows:
Config:
A Docker endpoint. Defaults to “unix:///var/run/docker.sock”.
The name of the decoder used to further transform the message into a structured hekad message. No default decoder is specified.
Example:
[nginx_log_decoder]
type = "SandboxDecoder"
filename = "lua_decoders/nginx_access.lua"
[nginx_log_decoder.config]
type = "nginx.access"
user_agent_transform = true
log_format = '$remote_addr - $remote_user [$time_local] "$request" $status $body_bytes_sent "$http_referer" "$http_user_agent"'
[DockerLogInput]
decoder = "nginx_log_decoder"
New in version 0.7.
FilePollingInputs periodically read (unbuffered) the contents of a file specified, and creates a Heka message with the contents of the file as the payload.
Config:
The absolute path to the file which the input should read.
How often, in seconds to input should read the contents of the file.
The name of the decoder used to process the payload of the input.
Example:
[MemStats]
type = "FilePollingInput"
ticker_interval = 1
file_path = "/proc/meminfo"
decoder = "MemStatsDecoder"
HttpInput plugins intermittently poll remote HTTP URLs for data and populate message objects based on the results of the HTTP interactions. Messages will be populated as follows:
Uuid: Type 4 (random) UUID generated by Heka.
Timestamp: Time HTTP request is completed.
not the request completed. (Note that a response returned with an HTTP error code is still considered complete and will generate type heka.httpinput.data.)
Hostname: Hostname of the machine on which Heka is running.
Payload: Entire contents of the HTTP response body.
results use error_severity config value.
Logger: Fetched URL.
Fields[“Status”] (string): HTTP status string value (e.g. “200 OK”).
Fields[“StatusCode”] (int): HTTP status code integer value.
Fields[“ResponseSize”] (int): Value of HTTP Content-Length header.
seconds.
“HTTP/1.0”)
The Fields values above will only be populated in the event of a completed HTTP request. Also, it is possible to specify a decoder to further process the results of the HTTP response before injecting the message into the router.
Config:
A HTTP URL which this plugin will regularly poll for data. This option cannot be used with the urls option. No default URL is specified.
New in version 0.5.
An array of HTTP URLs which this plugin will regularly poll for data. This option cannot be used with the url option. No default URLs are specified.
New in version 0.5.
The HTTP method to use for the request. Defaults to “GET”.
New in version 0.5.
Subsection defining headers for the request. By default the User-Agent header is set to “Heka”
New in version 0.5.
The request body (e.g. for an HTTP POST request). No default body is specified.
New in version 0.5.
The username for HTTP Basic Authentication. No default username is specified.
New in version 0.5.
The password for HTTP Basic Authentication. No default password is specified.
Time interval (in seconds) between attempts to poll for new data. Defaults to 10.
New in version 0.5.
Severity level of successful HTTP request. Defaults to 6 (information).
New in version 0.5.
Severity level of errors, unreachable connections, and non-200 responses of successful HTTP requests. Defaults to 1 (alert).
The name of the decoder used to further transform the response body text into a structured hekad message. No default decoder is specified.
Example:
[HttpInput]
url = "http://localhost:9876/"
ticker_interval = 5
success_severity = 6
error_severity = 1
decoder = "MyCustomJsonDecoder"
[HttpInput.headers]
user-agent = "MyCustomUserAgent"
New in version 0.5.
HttpListenInput plugins start a webserver listening on the specified address and port. If no decoder is specified data in the request body will be populated as the message payload. Messages will be populated as follows:
Uuid: Type 4 (random) UUID generated by Heka.
Timestamp: Time HTTP request is handled.
Type: heka.httpdata.request
Hostname: The remote network address of requester.
Payload: Entire contents of the HTTP response body.
Severity: 6
Logger: HttpListenInput
Fields[“UserAgent”] (string): Request User-Agent header (e.g. “GitHub Hookshot dd0772a”).
Fields[“ContentType”] (string): Request Content-Type header (e.g. “application/x-www-form-urlencoded”).
“HTTP/1.0”)
Config:
An IP address:port on which this plugin will expose a HTTP server. Defaults to “127.0.0.1:8325”.
The name of the decoder used to further transform the request body text into a structured hekad message. No default decoder is specified.
New in version 0.7.
It is possible to inject arbitrary HTTP headers into each outgoing response by adding a TOML subsection entitled “headers” to you HttpOutput config section. All entries in the subsection must be a list of string values.
Example:
[HttpListenInput]
address = "0.0.0.0:8325"
New in version 0.5.
Tails a single log file, a sequential single log source, or multiple log sources of either a single logstream or multiple logstreams.
See also
Config:
The hostname to use for the messages, by default this will be the machine’s qualified hostname. This can be set explicitly to ensure it’s the correct name in the event the machine has multiple interfaces/hostnames.
A time duration string (e.x. “2s”, “2m”, “2h”). Logfiles with a last modified time older than oldest_duration ago will not be included for parsing.
The directory to store the journal files in for tracking the location that has been read to thus far. By default this is stored under heka’s base directory.
The root directory to scan files from. This scan is recursive so it should be suitably restricted to the most specific directory this selection of logfiles will be matched under. The log_directory path will be prepended to the file_match.
During logfile rotation, or if the logfile is not originally present on the system, this interval is how often the existence of the logfile will be checked for. The default of 5 seconds is usually fine. This interval is in milliseconds.
Regular expression used to match files located under the log_directory. This regular expression has $ added to the end automatically if not already present, and log_directory as the prefix. WARNING: file_match should typically be delimited with single quotes, indicating use of a raw string, rather than double quotes, which require all backslashes to be escaped. For example, ‘access\.log’ will work as expected, but “access\.log” will not, you would need “access\\.log” to achieve the same result.
When using sequential logstreams, the priority is how to sort the logfiles in order from oldest to newest.
When using multiple logstreams, the differentiator is a set of strings that will be used in the naming of the logger, and portions that match a captured group from the file_match will have their matched value substituted in.
A set of translation mappings for matched groupings to the ints to use for sorting purposes.
A ProtobufDecoder instance must be specified for the message.proto parser. Use of a decoder is optional for token and regexp parsers; if no decoder is specified the parsed data is available in the Heka message payload.
Character or regexp delimiter used by the parser (default “\n”). For the regexp delimiter a single capture group can be specified to preserve the delimiter (or part of the delimiter). The capture will be added to the start or end of the log line depending on the delimiter_location configuration. Note: when a start delimiter is used the last line in the file will not be processed (since the next record defines its end) until the log is rolled.
Executes one or more external programs on an interval, creating messages from the output. Supports a chain of commands, where stdout from each process will be piped into the stdin for the next process in the chain. In the event the program returns a non-zero exit code, ProcessInput will log that an error occurred.
Config:
The command is a structure that contains the full path to the binary, command line arguments, optional enviroment variables and an optional working directory (see below). ProcessInput expects the commands to be indexed by integers starting with 0, where 0 is the first process in the chain.
The number of seconds to wait between each run of command. Defaults to 15. A ticker_interval of 0 indicates that the command is run only once, and should only be used for long running processes that do not exit. If ticker_interval is set to 0 and the process exits, then the ProcessInput will exit, invoking the restart behavior (see Configuring Restarting Behavior).
If true, for each run of the process chain a message will be generated with the last command in the chain’s stdout as the payload. Defaults to true.
If true, for each run of the process chain a message will be generated with the last command in the chain’s stderr as the payload. Defaults to false.
Name of the decoder instance to send messages to. If omitted messages will be injected directly into Heka’s message router.
Character or regexp delimiter used by the parser (default “\n”). For the regexp delimiter a single capture group can be specified to preserve the delimiter (or part of the delimiter). The capture will be added to the start or end of the log line depending on the delimiter_location configuration. Note: when a start delimiter is used the last line in the file will not be processed (since the next record defines its end) until the log is rolled.
Timeout in seconds before any one of the commands in the chain is terminated.
Trim a single trailing newline character if one exists. Default is true.
A sub-section that specifies the settings to be used for restart behavior. See Configuring Restarting Behavior
cmd_config structure:
The full path to the binary that will be executed.
Command line arguments to pass into the executable.
Used to set environment variables before command is run. Default is nil, which uses the heka process’s environment.
Used to set the working directory of Bin Default is “”, which uses the heka process’s working directory.
Example:
[DemoProcessInput]
type = "ProcessInput"
ticker_interval = 2
parser_type = "token"
delimiter = " "
stdout = true
stderr = false
trim = true
[DemoProcessInput.command.0]
bin = "/bin/cat"
args = ["../testsupport/process_input_pipes_test.txt"]
[DemoProcessInput.command.1]
bin = "/usr/bin/grep"
args = ["ignore"]
New in version 0.5.
The ProcessDirectoryInput periodically scans a filesystem directory looking for ProcessInput configuration files. The ProcessDirectoryInput will maintain a pool of running ProcessInputs based on the contents of this directory, refreshing the set of running inputs as needed with every rescan. This allows Heka administrators to manage a set of data collection processes for a running hekad server without restarting the server.
Each ProcessDirectoryInput has a process_dir configuration setting, which is the root folder of the tree where scheduled jobs are defined. It should contain exactly one nested level of subfolders, named with ASCII numeric characters indicating the interval, in seconds, between each process run. These numeric folders must contain TOML files which specify the details regarding which processes to run.
For example, a process_dir might look like this:
-/usr/share/heka/processes/
|-5
|- check_myserver_running.toml
|-61
|- cat_proc_mounts.toml
|- get_running_processes.toml
|-302
|- some_custom_query.toml
This indicates one process to be run every five seconds, two processes to be run every 61 seconds, and one process to be run every 302 seconds.
Note that ProcessDirectory will ignore any files that are not nested one level deep, are not in a folder named for an integer 0 or greater, and do not end with ‘.toml’. Each file which meets these criteria, such as those shown in the example above, should contain the TOML configuration for exactly one ProcessInput, matching that of a standalone ProcessInput with the following restrictions:
If the specified process fails to run or the ProcessInput config fails for any other reason, ProcessDirectoryInput will log an error message and continue.
Config:
Amount of time, in seconds, between scans of the process_dir. Defaults to 300 (i.e. 5 minutes).
This is the root folder of the tree where the scheduled jobs are defined. Absolute paths will be honored, relative paths will be computed relative to Heka’s globally specified share_dir. Defaults to “processes” (i.e. “$share_dir/processes”).
A sub-section that specifies the settings to be used for restart behavior. See Configuring Restarting Behavior
Example:
[ProcessDirectoryInput]
process_dir = "/etc/hekad/processes.d"
ticker_interval = 120
Provides an implementation of the StatAccumulator interface which other plugins can use to submit Stat objects for aggregation and roll-up. Accumulates these stats and then periodically emits a “stat metric” type message containing aggregated information about the stats received since the last generated message.
Config:
Specifies whether or not the aggregated stat information should be emitted in the message fields of the generated messages. Defaults to false. NOTE: At least one of ‘emit_in_payload’ or ‘emit_in_fields’ must be true or it will be considered a configuration error and the input won’t start.
Percent threshold to use for computing “upper_N%” type stat values. Defaults to 90.
Time interval (in seconds) between generated output messages. Defaults to 10.
String value to use for the Type value of the emitted stat messages. Defaults to “heka.statmetric”.
If set to true, then use the older format for namespacing counter stats, with rates recorded under stats.<counter_name> and absolute count recorded under stats_counts.<counter_name>. See statsd metric namespacing. Defaults to false.
Global prefix to use for sending stats to graphite. Defaults to “stats”.
Secondary prefix to use for namespacing counter metrics. Has no impact unless legacy_namespaces is set to false. Defaults to “counters”.
Secondary prefix to use for namespacing timer metrics. Defaults to “timers”.
Secondary prefix to use for namespacing gauge metrics. Defaults to “gauges”.
Prefix to use for the statsd numStats metric. Defaults to “statsd”.
Don’t emit values for inactive stats instead of sending 0 or in the case of gauges, sending the previous value. Defaults to false.
Listens for statsd protocol counter, timer, or gauge messages on a UDP port, and generates Stat objects that are handed to a StatAccumulator for aggregation and processing.
Config:
An IP address:port on which this plugin will expose a statsd server. Defaults to “127.0.0.1:8125”.
Name of a StatAccumInput instance that this StatsdInput will use as its StatAccumulator for submitting received stat values. Defaults to “StatAccumInput”.
Size of a buffer used for message read from statsd. In some cases, when statsd sends a lots in single message of stats it’s required to boost this value. All over-length data will be truncated without raising an error. Defaults to 512.
Example:
[StatsdInput]
address = ":8125"
stat_accum_name = "custom_stat_accumulator"
Listens on a specific TCP address and port for messages. If the message is signed it is verified against the signer name and specified key version. If the signature is not valid the message is discarded otherwise the signer name is added to the pipeline pack and can be use to accept messages using the message_signer configuration option.
Config:
An IP address:port on which this plugin will listen.
Optional TOML subsection. Section name consists of a signer name, underscore, and numeric version of the key.
The hash key used to sign the message.
New in version 0.4.
A ProtobufDecoder instance must be specified for the message.proto parser. Use of a decoder is optional for token and regexp parsers; if no decoder is specified the raw input data is available in the Heka message payload.
Character or regexp delimiter used by the parser (default “\n”). For the regexp delimiter a single capture group can be specified to preserve the delimiter (or part of the delimiter). The capture will be added to the start or end of the message depending on the delimiter_location configuration.
New in version 0.5.
Specifies whether or not SSL/TLS encryption should be used for the TCP connections. Defaults to false.
A sub-section that specifies the settings to be used for any SSL/TLS encryption. This will only have any impact if use_tls is set to true. See Configuring TLS.
Network value must be one of: “tcp”, “tcp4”, “tcp6”, “unix” or “unixpacket”.
New in version 0.6.
Specifies whether or not TCP keepalive should be used for established TCP connections. Defaults to false.
Time duration in seconds that a TCP connection will be maintained before keepalive probes start being sent. Defaults to 7200 (i.e. 2 hours).
Example:
[TcpInput]
address = ":5565"
parser_type = "message.proto"
decoder = "ProtobufDecoder"
[TcpInput.signer.ops_0]
hmac_key = "4865ey9urgkidls xtb0[7lf9rzcivthkm"
[TcpInput.signer.ops_1]
hmac_key = "xdd908lfcgikauexdi8elogusridaxoalf"
[TcpInput.signer.dev_1]
hmac_key = "haeoufyaiofeugdsnzaogpi.ua,dp.804u"
Listens on a specific UDP address and port for messages. If the message is signed it is verified against the signer name and specified key version. If the signature is not valid the message is discarded otherwise the signer name is added to the pipeline pack and can be use to accept messages using the message_signer configuration option.
Note
The UDP payload is not restricted to a single message; since the stream parser is being used multiple messages can be sent in a single payload.
Config:
An IP address:port or Unix datagram socket file path on which this plugin will listen.
Optional TOML subsection. Section name consists of a signer name, underscore, and numeric version of the key.
The hash key used to sign the message.
New in version 0.4.
A ProtobufDecoder instance must be specified for the message.proto parser. Use of a decoder is optional for token and regexp parsers; if no decoder is specified the raw input data is available in the Heka message payload.
Character or regexp delimiter used by the parser (default “\n”). For the regexp delimiter a single capture group can be specified to preserve the delimiter (or part of the delimiter). The capture will be added to the start or end of the message depending on the delimiter_location configuration.
New in version 0.5.
Network value must be one of: “udp”, “udp4”, “udp6”, or “unixgram”.
Example:
[UdpInput]
address = "127.0.0.1:4880"
parser_type = "message.proto"
decoder = "ProtobufDecoder"
[UdpInput.signer.ops_0]
hmac_key = "4865ey9urgkidls xtb0[7lf9rzcivthkm"
[UdpInput.signer.ops_1]
hmac_key = "xdd908lfcgikauexdi8elogusridaxoalf"
[UdpInput.signer.dev_1]
hmac_key = "haeoufyaiofeugdsnzaogpi.ua,dp.804u"
New in version 0.6.
Parses the Apache access logs based on the Apache ‘LogFormat’ configuration directive. The Apache format specifiers are mapped onto the Nginx variable names where applicable e.g. %a -> remote_addr. This allows generic web filters and outputs to work with any HTTP server input.
Config:
The ‘LogFormat’ configuration directive from the apache2.conf. %t variables are converted to the number of nanosecond since the Unix epoch and used to set the Timestamp on the message. http://httpd.apache.org/docs/2.4/mod/mod_log_config.html
Sets the message ‘Type’ header to the specified value
Transform the http_user_agent into user_agent_browser, user_agent_version, user_agent_os.
Always preserve the http_user_agent value if transform is enabled.
Only preserve the http_user_agent value if transform is enabled and fails.
Always preserve the original log line in the message payload.
Example Heka Configuration
[TestWebserver]
type = "LogstreamerInput"
log_directory = "/var/log/apache"
file_match = 'access\.log'
decoder = "CombinedLogDecoder"
[CombinedLogDecoder]
type = "SandboxDecoder"
filename = "lua_decoders/apache_access.lua"
[CombinedLogDecoder.config]
type = "combined"
user_agent_transform = true
# combined log format
log_format = '%h %l %u %t \"%r\" %>s %O \"%{Referer}i\" \"%{User-Agent}i\"'
# common log format
# log_format = '%h %l %u %t \"%r\" %>s %O'
# vhost_combined log format
# log_format = '%v:%p %h %l %u %t \"%r\" %>s %O \"%{Referer}i\" \"%{User-Agent}i\"'
# referer log format
# log_format = '%{Referer}i -> %U'
Example Heka Message
Timestamp: | 2014-01-10 07:04:56 -0800 PST |
---|---|
Type: | combined |
Hostname: | test.example.com |
Pid: | 0 |
UUID: | 8e414f01-9d7f-4a48-a5e1-ae92e5954df5 |
Logger: | TestWebserver |
Payload: | |
EnvVersion: | |
Severity: | 7 |
Fields: | name:”remote_user” value_string:”-“
name:”http_x_forwarded_for” value_string:”-“
name:”http_referer” value_string:”-“
name:”body_bytes_sent” value_type:DOUBLE representation:”B” value_double:82
name:”remote_addr” value_string:”62.195.113.219” representation:”ipv4”
name:”status” value_type:DOUBLE value_double:200
name:”request” value_string:”GET /v1/recovery_email/status HTTP/1.1”
name:”user_agent_os” value_string:”FirefoxOS”
name:”user_agent_browser” value_string:”Firefox”
name:”user_agent_version” value_type:DOUBLE value_double:29
|
New in version 0.8.
Parses a payload containing JSON in the Graylog2 Extended Format specficiation. http://graylog2.org/resources/gelf/specification
Config:
Sets the message ‘Type’ header to the specified value
Always preserve the original log line in the message payload.
Example of Graylog2 Exteded Format Log
{
"version": "1.1",
"host": "rogueethic.com",
"short_message": "This is a short message to identify what is going on.",
"full_message": "An entire backtrace\ncould\ngo\nhere",
"timestamp": 1385053862.3072,
"level": 1,
"_user_id": 9001,
"_some_info": "foo",
"_some_env_var": "bar"
}
Example Heka Configuration
[GELFLogInput]
type = "LogstreamerInput"
log_directory = "/var/log"
file_match = 'application\.gelf'
decoder = "GraylogDecoder"
[GraylogDecoder]
type = "SandboxDecoder"
filename = "lua_decoders/graylog_decoder.lua"
[GraylogDecoder.config]
type = "gelf"
payload_keep = true
New in version 0.6.
New in version 0.6.
Decoder plugin that generates GeoIP data based on the IP address of a specified field. It uses the GeoIP Go project as a wrapper around MaxMind’s geoip-api-c library, and thus assumes you have the library downloaded and installed. Currently, only the GeoLiteCity database is supported, which you must also download and install yourself into a location to be referenced by the db_file config option. By default the database file is opened using “GEOIP_MEMORY_CACHE” mode. This setting is hard- coded into the wrapper’s geoip.go file. You will need to manually override that code if you want to specify one of the other modes listed here.
Note
Due to external dependencies, this plugin is not compiled in to the released Heka binaries. It will automatically be included in a source build if GeoIP.h is available in the include path during build time. The generated binary will then only work on machines with the appropriate GeoIP shared library (e.g. libGeoIP.so.1) installed.
Note
If you are using this with the ES output you will likely need to specify the raw_bytes_field option for the target_field specified. This is required to preserve the formatting of the JSON object.
Config:
The location of the GeoLiteCity.dat database. Defaults to “/var/cache/hekad/GeoLiteCity.dat”
The name of the field containing the IP address you want to derive the location for.
The name of the new field created by the decoder. The decoder will output a JSON object with the following elements:
latitute: string,
longitude: string,
- location: [ float64, float64 ],
- GeoJSON format intended for use as a geo_point for ES output. Useful when using Kibana’s Bettermap panel
coordinates: [ string, string ],
countrycode: string,
countrycode3: string,
region: string,
city: string,
postalcode: string,
areacode: int,
charset: int,
continentalcode: string
[apache_geoip_decoder]
type = "GeoIpDecoder"
db_file="/etc/geoip/GeoLiteCity.dat"
source_ip_field="remote_host"
target_field="geoip"
This decoder plugin allows you to specify an ordered list of delegate decoders. The MultiDecoder will pass the PipelinePack to be decoded to each of the delegate decoders in turn until decode succeeds. In the case of failure to decode, MultiDecoder will return an error and recycle the message.
Config:
An ordered list of subdecoders to which the MultiDecoder will delegate. Each item in the list should specify another decoder configuration section by section name. Must contain at least one entry.
If true, the DecoderRunner will log the errors returned whenever a delegate decoder fails to decode a message. Defaults to false.
Specifies behavior the MultiDecoder should exhibit with regard to cascading through the listed decoders. Supports only two valid values: “first-wins” and “all”. With “first-wins”, each decoder will be tried in turn until there is a successful decoding, after which decoding will be stopped. With “all”, all listed decoders will be applied whether or not they succeed. In each case, decoding will only be considered to have failed if none of the sub-decoders succeed.
Here is a slightly contrived example where we have protocol buffer encoded messages coming in over a TCP connection, with each message containin a single nginx log line. Our MultiDecoder will run each message through two decoders, the first to deserialize the protocol buffer and the second to parse the log text:
[TcpInput]
address = ":5565"
parser_type = "message.proto"
decoder = "shipped-nginx-decoder"
[shipped-nginx-decoder]
type = "MultiDecoder"
subs = ['ProtobufDecoder', 'nginx-access-decoder']
cascade_strategy = "all"
log_sub_errors = true
[ProtobufDecoder]
[nginx-access-decoder]
type = "SandboxDecoder"
filename = "lua_decoders/nginx_access.lua"
[nginx-access-decoder.config]
type = "combined"
user_agent_transform = true
log_format = '$remote_addr - $remote_user [$time_local] "$request" $status $body_bytes_sent "$http_referer" "$http_user_agent"'
New in version 0.7.
Parses a payload containing the contents of a /sys/block/$DISK/stat file (where $DISK is a disk identifier such as sda) into a Heka message struct. This also tries to obtain the TickerInterval of the input it recieved the data from, by extracting it from a message field named TickerInterval.
Config:
Always preserve the original log line in the message payload.
Example Heka Configuration
[DiskStats]
type = "FilePollingInput"
ticker_interval = 1
file_path = "/sys/block/sda1/stat"
decoder = "DiskStatsDecoder"
[DiskStatsDecoder]
type = "SandboxDecoder"
filename = "lua_decoders/linux_diskstats.lua"
Example Heka Message
Timestamp: | 2014-01-10 07:04:56 -0800 PST |
---|---|
Type: | stats.diskstats |
Hostname: | test.example.com |
Pid: | 0 |
UUID: | 8e414f01-9d7f-4a48-a5e1-ae92e5954df5 |
Payload: | |
EnvVersion: | |
Severity: | 7 |
Fields: | name:”ReadsCompleted” value_type:DOUBLE value_double:”20123”
name:”ReadsMerged” value_type:DOUBLE value_double:”11267”
name:”SectorsRead” value_type:DOUBLE value_double:”1.094968e+06”
name:”TimeReading” value_type:DOUBLE value_double:”45148”
name:”WritesCompleted” value_type:DOUBLE value_double:”1278”
name:”WritesMerged” value_type:DOUBLE value_double:”1278”
name:”SectorsWritten” value_type:DOUBLE value_double:”206504”
name:”TimeWriting” value_type:DOUBLE value_double:”3348”
name:”TimeDoingIO” value_type:DOUBLE value_double:”4876”
name:”WeightedTimeDoingIO” value_type:DOUBLE value_double:”48356”
name:”NumIOInProgress” value_type:DOUBLE value_double:”3”
name:”TickerInterval” value_type:DOUBLE value_double:”2”
name:”FilePath” value_string:”/sys/block/sda/stat”
|
New in version 0.7.
Parses a payload containing the contents of a /proc/loadavg file into a Heka message.
Config:
Always preserve the original log line in the message payload.
Example Heka Configuration
[LoadAvg]
type = "FilePollingInput"
ticker_interval = 1
file_path = "/proc/loadavg"
decoder = "LoadAvgDecoder"
[LoadAvgDecoder]
type = "SandboxDecoder"
filename = "lua_decoders/linux_loadavg.lua"
Example Heka Message
Timestamp: | 2014-01-10 07:04:56 -0800 PST |
---|---|
Type: | stats.loadavg |
Hostname: | test.example.com |
Pid: | 0 |
UUID: | 8e414f01-9d7f-4a48-a5e1-ae92e5954df5 |
Payload: | |
EnvVersion: | |
Severity: | 7 |
Fields: | name:”1MinAvg” value_type:DOUBLE value_double:”3.05”
name:”5MinAvg” value_type:DOUBLE value_double:”1.21”
name:”15MinAvg” value_type:DOUBLE value_double:”0.44”
name:”NumProcesses” value_type:DOUBLE value_double:”11”
name:”FilePath” value_string:”/proc/loadavg”
|
New in version 0.7.
Parses a payload containing the contents of a /proc/meminfo file into a Heka message.
Config:
Always preserve the original log line in the message payload.
Example Heka Configuration
[MemStats]
type = "FilePollingInput"
ticker_interval = 1
file_path = "/proc/meminfo"
decoder = "MemStatsDecoder"
[MemStatsDecoder]
type = "SandboxDecoder"
filename = "lua_decoders/linux_memstats.lua"
Example Heka Message
Timestamp: | 2014-01-10 07:04:56 -0800 PST |
---|---|
Type: | stats.memstats |
Hostname: | test.example.com |
Pid: | 0 |
UUID: | 8e414f01-9d7f-4a48-a5e1-ae92e5954df5 |
Payload: | |
EnvVersion: | |
Severity: | 7 |
Fields: | name:”MemTotal” value_type:DOUBLE representation:”kB” value_double:”4047616”
name:”MemFree” value_type:DOUBLE representation:”kB” value_double:”3432216”
name:”Buffers” value_type:DOUBLE representation:”kB” value_double:”82028”
name:”Cached” value_type:DOUBLE representation:”kB” value_double:”368636”
name:”FilePath” value_string:”/proc/meminfo”
|
The total available fields can be found in man procfs. All fields are of type double, and the representation is in kB (except for the HugePages fields). Here is a full list of fields available:
MemTotal, MemFree, Buffers, Cached, SwapCached, Active, Inactive, Active(anon), Inactive(anon), Active(file), Inactive(file), Unevictable, Mlocked, SwapTotal, SwapFree, Dirty, Writeback, AnonPages, Mapped, Shmem, Slab, SReclaimable, SUnreclaim, KernelStack, PageTables, NFS_Unstable, Bounce, WritebackTmp, CommitLimit, Committed_AS, VmallocTotal, VmallocUsed, VmallocChunk, HardwareCorrupted, AnonHugePages, HugePages_Total, HugePages_Free, HugePages_Rsvd, HugePages_Surp, Hugepagesize, DirectMap4k, DirectMap2M, DirectMap1G.
Note that your available fields may have a slight variance depending on the system’s kernel version.
New in version 0.6.
Parses and transforms the MySQL slow query logs. Use mariadb_slow_query.lua to parse the MariaDB variant of the MySQL slow query logs.
Config:
Truncates the SQL payload to the specified number of bytes (not UTF-8 aware) and appends ”...”. If the value is nil no truncation is performed. A negative value will truncate the specified number of bytes from the end.
Example Heka Configuration
[Sync-1_5-SlowQuery]
type = "LogstreamerInput"
log_directory = "/var/log/mysql"
file_match = 'mysql-slow\.log'
parser_type = "regexp"
delimiter = "\n(# User@Host:)"
delimiter_location = "start"
decoder = "MySqlSlowQueryDecoder"
[MySqlSlowQueryDecoder]
type = "SandboxDecoder"
filename = "lua_decoders/mysql_slow_query.lua"
[MySqlSlowQueryDecoder.config]
truncate_sql = 64
Example Heka Message
Timestamp: | 2014-05-07 15:51:28 -0700 PDT |
---|---|
Type: | mysql.slow-query |
Hostname: | 127.0.0.1 |
Pid: | 0 |
UUID: | 5324dd93-47df-485b-a88e-429f0fcd57d6 |
Logger: | Sync-1_5-SlowQuery |
Payload: | /* [queryName=FIND_ITEMS] */ SELECT bso.userid, bso.collection, ... |
EnvVersion: | |
Severity: | 7 |
Fields: | name:”Rows_examined” value_type:DOUBLE value_double:16458
name:”Query_time” value_type:DOUBLE representation:”s” value_double:7.24966
name:”Rows_sent” value_type:DOUBLE value_double:5001
name:”Lock_time” value_type:DOUBLE representation:”s” value_double:0.047038
|
New in version 0.5.
Parses the Nginx access logs based on the Nginx ‘log_format’ configuration directive.
Config:
The ‘log_format’ configuration directive from the nginx.conf. $time_local or $time_iso8601 variable is converted to the number of nanosecond since the Unix epoch and used to set the Timestamp on the message. http://nginx.org/en/docs/http/ngx_http_log_module.html
Sets the message ‘Type’ header to the specified value
Transform the http_user_agent into user_agent_browser, user_agent_version, user_agent_os.
Always preserve the http_user_agent value if transform is enabled.
Only preserve the http_user_agent value if transform is enabled and fails.
Always preserve the original log line in the message payload.
Example Heka Configuration
[TestWebserver]
type = "LogstreamerInput"
log_directory = "/var/log/nginx"
file_match = 'access\.log'
decoder = "CombinedLogDecoder"
[CombinedLogDecoder]
type = "SandboxDecoder"
filename = "lua_decoders/nginx_access.lua"
[CombinedLogDecoder.config]
type = "combined"
user_agent_transform = true
# combined log format
log_format = '$remote_addr - $remote_user [$time_local] "$request" $status $body_bytes_sent "$http_referer" "$http_user_agent"'
Example Heka Message
Timestamp: | 2014-01-10 07:04:56 -0800 PST |
---|---|
Type: | combined |
Hostname: | test.example.com |
Pid: | 0 |
UUID: | 8e414f01-9d7f-4a48-a5e1-ae92e5954df5 |
Logger: | TestWebserver |
Payload: | |
EnvVersion: | |
Severity: | 7 |
Fields: | name:”remote_user” value_string:”-“
name:”http_x_forwarded_for” value_string:”-“
name:”http_referer” value_string:”-“
name:”body_bytes_sent” value_type:DOUBLE representation:”B” value_double:82
name:”remote_addr” value_string:”62.195.113.219” representation:”ipv4”
name:”status” value_type:DOUBLE value_double:200
name:”request” value_string:”GET /v1/recovery_email/status HTTP/1.1”
name:”user_agent_os” value_string:”FirefoxOS”
name:”user_agent_browser” value_string:”Firefox”
name:”user_agent_version” value_type:DOUBLE value_double:29
|
New in version 0.6.
Parses the Nginx error logs based on the Nginx hard coded internal format.
Config:
The conversion actually happens on the Go side since there isn’t good TZ support here.
Example Heka Configuration
[TestWebserverError]
type = "LogstreamerInput"
log_directory = "/var/log/nginx"
file_match = 'error\.log'
decoder = "NginxErrorDecoder"
[NginxErrorDecoder]
type = "SandboxDecoder"
filename = "lua_decoders/nginx_error.lua"
[NginxErrorDecoder.config]
tz = "America/Los_Angeles"
Example Heka Message
Timestamp: | 2014-01-10 07:04:56 -0800 PST |
---|---|
Type: | nginx.error |
Hostname: | trink-x230 |
Pid: | 16842 |
UUID: | 8e414f01-9d7f-4a48-a5e1-ae92e5954df5 |
Logger: | TestWebserverError |
Payload: | using inherited sockets from “6;” |
EnvVersion: | |
Severity: | 5 |
Fields: | name:”tid” value_type:DOUBLE value_double:0
name:”connection” value_type:DOUBLE value_double:8878
|
Decoder plugin that accepts messages of a specified form and generates new outgoing messages from extracted data, effectively transforming one message format into another.
Note
The Go regular expression tester is an invaluable tool for constructing and debugging regular expressions to be used for parsing your input data.
Config:
Regular expression that must match for the decoder to process the message.
Subsection defining severity strings and the numerical value they should be translated to. hekad uses numerical severity codes, so a severity of WARNING can be translated to 3 by settings in this section. See Heka Message.
Subsection defining message fields to populate and the interpolated values that should be used. Valid interpolated values are any captured in a regex in the message_matcher, and any other field that exists in the message. In the event that a captured name overlaps with a message field, the captured name’s value will be used. Optional representation metadata can be added at the end of the field name using a pipe delimiter i.e. ResponseSize|B = “%ResponseSize%” will create Fields[ResponseSize] representing the number of bytes. Adding a representation string to a standard message header name will cause it to be added as a user defined field i.e., Payload|json will create Fields[Payload] with a json representation (see Field Variables).
Interpolated values should be surrounded with % signs, for example:
[my_decoder.message_fields]
Type = "%Type%Decoded"
This will result in the new message’s Type being set to the old messages Type with Decoded appended.
A formatting string instructing hekad how to turn a time string into the actual time representation used internally. Example timestamp layouts can be seen in Go’s time documentation. In addition to the Go time formatting, special timestamp_layout values of “Epoch”, “EpochMilli”, “EpochMicro”, and “EpochNano” are supported for Unix style timestamps represented in seconds, milliseconds, microseconds, and nanoseconds since the Epoch, respectively.
Time zone in which the timestamps in the text are presumed to be in. Should be a location name corresponding to a file in the IANA Time Zone database (e.g. “America/Los_Angeles”), as parsed by Go’s time.LoadLocation() function (see http://golang.org/pkg/time/#LoadLocation). Defaults to “UTC”. Not required if valid time zone info is embedded in every parsed timestamp, since those can be parsed as specified in the timestamp_layout. This setting will have no impact if one of the supported “Epoch*” values is used as the timestamp_layout setting.
New in version 0.5.
If set to false, payloads that can not be matched against the regex will not be logged as errors. Defaults to true.
Example (Parsing Apache Combined Log Format):
[apache_transform_decoder]
type = "PayloadRegexDecoder"
match_regex = '^(?P<RemoteIP>\S+) \S+ \S+ \[(?P<Timestamp>[^\]]+)\] "(?P<Method>[A-Z]+) (?P<Url>[^\s]+)[^"]*" (?P<StatusCode>\d+) (?P<RequestSize>\d+) "(?P<Referer>[^"]*)" "(?P<Browser>[^"]*)"'
timestamp_layout = "02/Jan/2006:15:04:05 -0700"
# severities in this case would work only if a (?P<Severity>...) matching
# group was present in the regex, and the log file contained this information.
[apache_transform_decoder.severity_map]
DEBUG = 7
INFO = 6
WARNING = 4
[apache_transform_decoder.message_fields]
Type = "ApacheLogfile"
Logger = "apache"
Url|uri = "%Url%"
Method = "%Method%"
Status = "%Status%"
RequestSize|B = "%RequestSize%"
Referer = "%Referer%"
Browser = "%Browser%"
This decoder plugin accepts XML blobs in the message payload and allows you to map parts of the XML into Field attributes of the pipeline pack message using XPath syntax using the xmlpath library.
Config:
A subsection defining a capture name that maps to an XPath expression. Each expression can fetch a single value, if the expression does not resolve to a valid node in the XML blob, the capture group will be assigned an empty string value.
Subsection defining severity strings and the numerical value they should be translated to. hekad uses numerical severity codes, so a severity of WARNING can be translated to 3 by settings in this section. See Heka Message.
Subsection defining message fields to populate and the interpolated values that should be used. Valid interpolated values are any captured in an XPath in the message_matcher, and any other field that exists in the message. In the event that a captured name overlaps with a message field, the captured name’s value will be used. Optional representation metadata can be added at the end of the field name using a pipe delimiter i.e. ResponseSize|B = “%ResponseSize%” will create Fields[ResponseSize] representing the number of bytes. Adding a representation string to a standard message header name will cause it to be added as a user defined field i.e., Payload|json will create Fields[Payload] with a json representation (see Field Variables).
Interpolated values should be surrounded with % signs, for example:
[my_decoder.message_fields]
Type = "%Type%Decoded"
This will result in the new message’s Type being set to the old messages Type with Decoded appended.
A formatting string instructing hekad how to turn a time string into the actual time representation used internally. Example timestamp layouts can be seen in Go’s time documentation. The default layout is ISO8601 - the same as Javascript. In addition to the Go time formatting, special timestamp_layout values of “Epoch”, “EpochMilli”, “EpochMicro”, and “EpochNano” are supported for Unix style timestamps represented in seconds, milliseconds, microseconds, and nanoseconds since the Epoch, respectively.
Time zone in which the timestamps in the text are presumed to be in. Should be a location name corresponding to a file in the IANA Time Zone database (e.g. “America/Los_Angeles”), as parsed by Go’s time.LoadLocation() function (see http://golang.org/pkg/time/#LoadLocation). Defaults to “UTC”. Not required if valid time zone info is embedded in every parsed timestamp, since those can be parsed as specified in the timestamp_layout. This setting will have no impact if one of the supported “Epoch*” values is used as the timestamp_layout setting.
Example:
[myxml_decoder]
type = "PayloadXmlDecoder"
[myxml_decoder.xpath_map]
Count = "/some/path/count"
Name = "/some/path/name"
Pid = "//pid"
Timestamp = "//timestamp"
Severity = "//severity"
[myxml_decoder.severity_map]
DEBUG = 7
INFO = 6
WARNING = 4
[myxml_decoder.message_fields]
Pid = "%Pid%"
StatCount = "%Count%"
StatName = "%Name%"
Timestamp = "%Timestamp%"
PayloadXmlDecoder’s xpath_map config subsection supports XPath as implemented by the xmlpath library.
- All axes are supported (“child”, “following-sibling”, etc)
- All abbreviated forms are supported (”.”, “//”, etc)
- All node types except for namespace are supported
- Predicates are restricted to [N], [path], and [path=literal] forms
- Only a single predicate is supported per path step
- Richer expressions and namespaces are not supported
The ProtobufDecoder is used for Heka message objects that have been serialized into protocol buffers format. This is the format that Heka uses to communicate with other Heka instances, so one will always be included in your Heka configuration whether specified or not. The ProtobufDecoder has no configuration options.
The hekad protocol buffers message schema in defined in the message.proto file in the message package.
Example:
[ProtobufDecoder]
New in version 0.5.
Parses the rsyslog output using the string based configuration template.
Config:
The ‘template’ configuration string from rsyslog.conf. http://rsyslog-5-8-6-doc.neocities.org/rsyslog_conf_templates.html
If your rsyslog timestamp field in the template does not carry zone offset information, you may set an offset to be applied to your events here. Typically this would be used with the “Traditional” rsyslog formats.
Parsing is done by Go, supports values of “UTC”, “Local”, or a location name corresponding to a file in the IANA Time Zone database, e.g. “America/New_York”.
Example Heka Configuration
[RsyslogDecoder]
type = "SandboxDecoder"
filename = "lua_decoders/rsyslog.lua"
[RsyslogDecoder.config]
type = "RSYSLOG_TraditionalFileFormat"
template = '%TIMESTAMP% %HOSTNAME% %syslogtag%%msg:::sp-if-no-1st-sp%%msg:::drop-last-lf%\n'
tz = "America/Los_Angeles"
Example Heka Message
Timestamp: | 2014-02-10 12:58:58 -0800 PST |
---|---|
Type: | RSYSLOG_TraditionalFileFormat |
Hostname: | trink-x230 |
Pid: | 0 |
UUID: | e0eef205-0b64-41e8-a307-5772b05e16c1 |
Logger: | RsyslogInput |
Payload: | “imklog 5.8.6, log source = /proc/kmsg started.” |
EnvVersion: | |
Severity: | 7 |
Fields: | name:”programname” value_string:”kernel”
|
The SandboxDecoder provides an isolated execution environment for data parsing and complex transformations without the need to recompile Heka. See Sandbox.
Config:
Example
[sql_decoder]
type = "SandboxDecoder"
filename = "sql_decoder.lua"
New in version 0.5.
The ScribbleDecoder is a trivial decoder that makes it possible to set one or more static field values on every decoded message. It is often used in conjunction with another decoder (i.e. in a MultiDecoder w/ cascade_strategy set to “all”) to, for example, set the message type of every message to a specific custom value after the messages have been decoded from Protocol Buffers format. Note that this only supports setting the exact same value on every message, if any dynamic computation is required to determine what the value should be, or whether it should be applied to a specific message, a SandboxDecoder using the provided write_message API call should be used instead.
Config:
Subsection defining message fields to populate. Optional representation metadata can be added at the end of the field name using a pipe delimiter i.e. host|ipv4 = “192.168.55.55” will create Fields[Host] containing an IPv4 address. Adding a representation string to a standard message header name will cause it to be added as a user defined field, i.e. Payload|json will create Fields[Payload] with a json representation (see Field Variables). Does not support Timestamp or Uuid.
Example (in MultiDecoder context)
[mytypedecoder]
type = "MultiDecoder"
subs = ["ProtobufDecoder", "mytype"]
cascade_strategy = "all"
log_sub_errors = true
[ProtobufDecoder]
[mytype]
type = "ScribbleDecoder"
[mytype.message_fields]
Type = "MyType"
New in version 0.4.
The StatsToFieldsDecoder will parse time series statistics data in the graphite message format and encode the data into the message fields, in the same format produced by a StatAccumInput plugin with the emit_in_fields value set to true. This is useful if you have externally generated graphite string data flowing through Heka that you’d like to process without having to roll your own string parsing code.
This decoder has no configuration options. It simply expects to be passed messages with statsd string data in the payload. Incorrect or malformed content will cause a decoding error, dropping the message.
The fields format only contains a single “timestamp” field, so any payloads containing multiple timestamps will end up generating a separate message for each timestamp. Extra messages will be a copy of the original message except a) the payload will be empty and b) the unique timestamp and related stats will be the only message fields.
Example:
[StatsToFieldsDecoder]
There are some configuration options that are universally available to all Heka filter plugins. These will be consumed by Heka itself when Heka initializes the plugin and do not need to be handled by the plugin-specific initialization code.
Boolean expression, when evaluated to true passes the message to the filter for processing. Defaults to matching nothing. See: Message Matcher Syntax
The name of the message signer. If specified only messages with this signer are passed to the filter for processing.
Frequency (in seconds) that a timer event will be sent to the filter. Defaults to not sending timer events.
New in version 0.5.
Collects the circular buffer delta output from multiple instances of an upstream sandbox filter (the filters should all be the same version at least with respect to their cbuf output). The purpose is to recreate the view at a larger scope in each level of the aggregation i.e., host view -> datacenter view -> service level view.
Config:
Specifies whether or not this aggregator should generate cbuf deltas.
A list of anomaly detection specifications. If not specified no anomaly detection/alerting will be performed.
If preserve_data = true is set in the SandboxFilter configuration, then this value should be incremented every time the enable_delta configuration is changed to prevent the plugin from failing to start during data restoration.
Example Heka Configuration
[TelemetryServerMetricsAggregator]
type = "SandboxFilter"
message_matcher = "Logger == 'TelemetryServerMetrics' && Fields[payload_type] == 'cbufd'"
ticker_interval = 60
filename = "lua_filters/cbufd_aggregator.lua"
preserve_data = true
[TelemetryServerMetricsAggregator.config]
enable_delta = false
anomaly_config = 'roc("Request Statistics", 1, 15, 0, 1.5, true, false)'
preservation_version = 0
New in version 0.5.
Collects the circular buffer delta output from multiple instances of an upstream sandbox filter (the filters should all be the same version at least with respect to their cbuf output). Each column from the source circular buffer will become its own graph. i.e., ‘Error Count’ will become a graph with each host being represented in a column.
Config:
Pre-allocates the number of host columns in the graph(s). If the number of active hosts exceed this value, the plugin will terminate.
The number of rows to keep from the original circular buffer. Storing all the data from all the hosts is not practical since you will most likely run into memory and output size restrictions (adjust the view down as necessary).
The amount of time a host has to be inactive before it can be replaced by a new host.
If preserve_data = true is set in the SandboxFilter configuration, then this value should be incremented every time the max_hosts or rows configuration is changed to prevent the plugin from failing to start during data restoration.
Example Heka Configuration
[TelemetryServerMetricsHostAggregator]
type = "SandboxFilter"
message_matcher = "Logger == 'TelemetryServerMetrics' && Fields[payload_type] == 'cbufd'"
ticker_interval = 60
filename = "lua_filters/cbufd_host_aggregator.lua"
preserve_data = true
[TelemetryServerMetricsHostAggregator.config]
max_hosts = 5
rows = 60
host_expiration = 120
preservation_version = 0
Once per ticker interval a CounterFilter will generate a message of type heka .counter-output. The payload will contain text indicating the number of messages that matched the filter’s message_matcher value during that interval (i.e. it counts the messages the plugin received). Every ten intervals an extra message (also of type heka.counter-output) goes out, containing an aggregate count and average per second throughput of messages received.
Config:
Interval between generated counter messages, in seconds. Defaults to 5.
Example:
[CounterFilter]
message_matcher = "Type != 'heka.counter-output'"
New in version 0.7.
Graphs the load average and process count data. Expects to receive messages containing fields entitled 1MinAvg, 5MinAvg, 15MinAvg, and NumProcesses, such as those generated by the Linux Load Average Decoder.
Config:
Sets the size of each bucket (resolution in seconds) in the sliding window.
Sets the size of the sliding window i.e., 1440 rows representing 60 seconds per row is a 24 sliding hour window with 1 minute resolution.
If preserve_data = true is set in the SandboxFilter configuration, then this value should be incremented every time the sec_per_row or rows configuration is changed to prevent the plugin from failing to start during data restoration.
Example Heka Configuration
[LoadAvgFilter]
type = "SandboxFilter"
filename = "lua_filters/loadavg.lua"
ticker_interval = 60
preserve_data = true
message_matcher = "Type == 'stats.loadavg'"
New in version 0.7.
Graphs disk IO stats. It automatically converts the running totals of Writes and Reads into rates of the values. The time based fields are left as running totals of the amount of time doing IO. Expects to receive messages with disk IO data embedded in a particular set of message fields which matches what is generated by Linux Disk Stats Decoder: WritesCompleted, ReadsCompleted, SectorsWritten, SectorsRead, WritesMerged, ReadsMerged, TimeWriting, TimeReading, TimeDoingIO, WeightedTimeDoingIO, TickerInterval.
Config:
Sets the size of the sliding window i.e., 1440 rows representing 60 seconds per row is a 24 sliding hour window with 1 minute resolution.
anomaly_config(string) - (see Anomaly Detection Module)
Example Heka Configuration
[DiskStatsFilter]
type = "SandboxFilter"
filename = "lua_filters/diskstats.lua"
preserve_data = true
message_matcher = "Type == 'stats.diskstats'"
New in version 0.5.
Calculates the most frequent items in a data stream.
Config:
The message variable name containing the items to be counted.
The maximum size of the sample set (higher will produce a more accurate list).
Used to reduce the long tail output by only outputting the higher frequency items.
Resets the list after the specified number of days (on the UTC day boundary). A value of 0 will never reset the list.
Example Heka Configuration
[FxaAuthServerFrequentIP]
type = "SandboxFilter"
filename = "lua_filters/frequent_items.lua"
ticker_interval = 60
preserve_data = true
message_matcher = "Logger == 'nginx.access' && Type == 'fxa-auth-server'"
[FxaAuthServerFrequentIP.config]
message_variable = "Fields[remote_addr]"
max_items = 10000
min_output_weight = 100
reset_days = 1
New in version 0.6.
Graphs the Heka memory statistics using the heka.memstat message generated by pipeline/report.go.
Config:
Sets the size of the sliding window i.e., 1440 rows representing 60 seconds per row is a 24 sliding hour window with 1 minute resolution.
Sets the size of each bucket (resolution in seconds) in the sliding window.
If preserve_data = true is set in the SandboxFilter configuration, then this value should be incremented every time the rows or sec_per_row configuration is changed to prevent the plugin from failing to start during data restoration.
Example Heka Configuration
[HekaMemstat]
type = "SandboxFilter"
filename = "lua_filters/heka_memstat.lua"
ticker_interval = 60
preserve_data = true
message_matcher = "Type == 'heka.memstat'"
New in version 0.5.
Generates documentation for each unique message in a data stream. The output is a hierarchy of Logger, Type, EnvVersion, and a list of associated message field attributes including their counts (number in the brackets). This plugin is meant for data discovery/exploration and should not be left running on a production system.
Config:
<none>
Example Heka Configuration
[SyncMessageSchema]
type = "SandboxFilter"
filename = "lua_filters/heka_message_schema.lua"
ticker_interval = 60
preserve_data = false
message_matcher = "Logger =~ /^Sync/"
Example Output
New in version 0.5.
Graphs HTTP status codes using the numeric Fields[status] variable collected from web server access logs.
Config:
Sets the size of each bucket (resolution in seconds) in the sliding window.
Sets the size of the sliding window i.e., 1440 rows representing 60 seconds per row is a 24 sliding hour window with 1 minute resolution.
If preserve_data = true is set in the SandboxFilter configuration, then this value should be incremented every time the sec_per_row or rows configuration is changed to prevent the plugin from failing to start during data restoration.
Example Heka Configuration
[FxaAuthServerHTTPStatus]
type = "SandboxFilter"
filename = "lua_filters/http_status.lua"
ticker_interval = 60
preserve_data = true
message_matcher = "Logger == 'nginx.access' && Type == 'fxa-auth-server'"
[FxaAuthServerHTTPStatus.config]
sec_per_row = 60
rows = 1440
anomaly_config = 'roc("HTTP Status", 2, 15, 0, 1.5, true, false) roc("HTTP Status", 4, 15, 0, 1.5, true, false) mww_nonparametric("HTTP Status", 5, 15, 10, 0.8)'
preservation_version = 0
New in version 0.7.
Graphs memory usage statistics. Expects to receive messages with memory usage data embedded in a specific set of message fields, which matches the messages generated by Linux Memory Stats Decoder: MemFree, Cached, Active, Inactive, VmallocUsed, Shmem, SwapCached.
Config:
Sets the size of each bucket (resolution in seconds) in the sliding window.
Sets the size of the sliding window i.e., 1440 rows representing 60 seconds per row is a 24 sliding hour window with 1 minute resolution.
If preserve_data = true is set in the SandboxFilter configuration, then this value should be incremented every time the sec_per_row or rows configuration is changed to prevent the plugin from failing to start during data restoration.
Example Heka Configuration
[MemoryStatsFilter]
type = "SandboxFilter"
filename = "lua_filters/memstats.lua"
ticker_interval = 60
preserve_data = true
message_matcher = "Type == 'stats.memstats'"
New in version 0.6.
Graphs MySQL slow query data produced by the MySQL Slow Query Log Decoder.
Config:
Sets the size of each bucket (resolution in seconds) in the sliding window.
Sets the size of the sliding window i.e., 1440 rows representing 60 seconds per row is a 24 sliding hour window with 1 minute resolution.
If preserve_data = true is set in the SandboxFilter configuration, then this value should be incremented every time the sec_per_row or rows configuration is changed to prevent the plugin from failing to start during data restoration.
Example Heka Configuration
[Sync-1_5-SlowQueries]
type = "SandboxFilter"
message_matcher = "Logger == 'Sync-1_5-SlowQuery'"
ticker_interval = 60
filename = "lua_filters/mysql_slow_query.lua"
[Sync-1_5-SlowQueries.config]
anomaly_config = 'mww_nonparametric("Statistics", 5, 15, 10, 0.8)'
preservation_version = 0
Filter plugin that accepts messages of a specfied form and uses extracted message data to feed statsd-style numerical metrics in the form of Stat objects to a StatAccumulator.
Config:
Metric:
Subsection defining a single metric to be generated. Both the name and value fields for each metric support interpolation of message field values (from ‘Type’, ‘Hostname’, ‘Logger’, ‘Payload’, or any dynamic field name) with the use of %% delimiters, so %Hostname% would be replaced by the message’s Hostname field, and %Foo% would be replaced by the first value of a dynamic field called “Foo”:
- type (string):
Metric type, supports “Counter”, “Timer”, “Gauge”.
- name (string):
Metric name, must be unique.
- value (string):
Expression representing the (possibly dynamic) value that the StatFilter should emit for each received message.
Name of a StatAccumInput instance that this StatFilter will use as its StatAccumulator for submitting generate stat values. Defaults to “StatAccumInput”.
Example:
[StatAccumInput]
ticker_interval = 5
[StatsdInput]
address = "127.0.0.1:29301"
[Hits]
type = "StatFilter"
message_matcher = 'Type == "ApacheLogfile"'
[Hits.Metric.bandwidth]
type = "Counter"
name = "httpd.bytes.%Hostname%"
value = "%Bytes%"
[Hits.Metric.method_counts]
type = "Counter"
name = "httpd.hits.%Method%.%Hostname%"
value = "1"
Note
StatFilter requires an available StatAccumInput to be running.
The sandbox filter provides an isolated execution environment for data analysis. Any output generated by the sandbox is injected into the payload of a new message for further processing or to be output.
Config:
Example:
[hekabench_counter]
type = "SandboxFilter"
message_matcher = "Type == 'hekabench'"
ticker_interval = 1
filename = "counter.lua"
preserve_data = true
profile = false
[hekabench_counter.config]
rows = 1440
sec_per_row = 60
The SandboxManagerFilter provides dynamic control (start/stop) of sandbox filters in a secure manner without stopping the Heka daemon. Commands are sent to a SandboxManagerFilter using a signed Heka message. The intent is to have one manager per access control group each with their own message signing key. Users in each group can submit a signed control message to manage any filters running under the associated manager. A signed message is not an enforced requirement but it is highly recommended in order to restrict access to this functionality.
The directory where the filter configurations, code, and states are preserved. The directory can be unique or shared between sandbox managers since the filter names are unique per manager. Defaults to a directory in ${BASE_DIR}/sbxmgrs with a name generated from the plugin name.
The directory where ‘require’ will attempt to load the external Lua modules from. Defaults to ${SHARE_DIR}/lua_modules.
The maximum number of filters this manager can run.
New in version 0.5.
The number of bytes managed sandboxes are allowed to consume before being terminated (default 8MiB).
The number of instructions managed sandboxes are allowed to execute during the process_message/timer_event functions before being terminated (default 1M).
The number of bytes managed sandbox output buffers can hold before being terminated (default 63KiB). Warning: messages exceeding 64KiB will generate an error and be discarded by the standard output plugins (File, TCP, UDP) since they exceed the maximum message size.
Example
[OpsSandboxManager]
type = "SandboxManagerFilter"
message_signer = "ops"
# message_matcher = "Type == 'heka.control.sandbox'" # automatic default setting
max_filters = 100
New in version 0.7.
Converts stat values extracted from statmetric messages (see StatAccumInput) to circular buffer data and periodically emits messages containing this data to be graphed by a DashboardOutput. Note that this filter expects the stats data to be available in the message fields, so the StatAccumInput must be configured with emit_in_fields set to true for this filter to work correctly.
Config:
Title for the graph output generated by this filter.
The number of rows to store in our circular buffer. Each row represents one time interval.
The number of seconds in each circular buffer time interval.
Space separated list of stat names. Each specified stat will be expected to be found in the fields of the received statmetric messages, and will be extracted and inserted into its own column in the accumulated circular buffer.
Space separated list of header label names to use for the extracted stats. Must be in the same order as the specified stats. Any label longer than 15 characters will be truncated.
Anomaly detection configuration, see Anomaly Detection Module.
If preserve_data = true is set in the SandboxFilter configuration, then this value should be incremented every time any edits are made to your rows, sec_per_row, stats, or stat_labels values, or else Heka will fail to start because the preserved data will no longer match the filter’s data structure.
Example Heka Configuration
[stat-graph]
type = "SandboxFilter"
filename = "lua_filters/stat_graph.lua"
ticker_interval = 10
preserve_data = true
message_matcher = "Type == 'heka.statmetric'"
[stat-graph.config]
title = "Hits and Misses"
rows = 1440
sec_per_row = 10
stats = "stats.counters.hits.count stats.counters.misses.count"
stat_labels = "hits misses"
anomaly_config = 'roc("Hits and Misses", 1, 15, 0, 1.5, true, false) roc("Hits and Misses", 2, 15, 0, 1.5, true, false)'
preservation_version = 0
New in version 0.6.
Counts the number of unique items per day e.g. active daily users by uid.
Config:
The Heka message variable containing the item to be counted.
The graph title for the cbuf output.
Specifies whether or not this plugin should generate cbuf deltas. Deltas should be enabled when sharding is used; see: Circular Buffer Delta Aggregator.
If preserve_data = true is set in the SandboxFilter configuration, then this value should be incremented every time the enable_delta configuration is changed to prevent the plugin from failing to start during data restoration.
Example Heka Configuration
[FxaActiveDailyUsers]
type = "SandboxFilter"
filename = "lua_filters/unique_items.lua"
ticker_interval = 60
preserve_data = true
message_matcher = "Logger == 'FxaAuth' && Type == 'request.summary' && Fields[path] == '/v1/certificate/sign' && Fields[errno] == 0"
[FxaActiveDailyUsers.config]
message_variable = "Fields[uid]"
title = "Estimated Active Daily Users"
preservation_version = 0
There are some configuration options that are universally available to all Heka output plugins. These will be consumed by Heka itself when Heka initializes the plugin and do not need to be handled by the plugin-specific initialization code.
Boolean expression, when evaluated to true passes the message to the filter for processing. Defaults to matching nothing. See: Message Matcher Syntax
The name of the message signer. If specified only messages with this signer are passed to the filter for processing.
Frequency (in seconds) that a timer event will be sent to the filter. Defaults to not sending timer events.
Encoder to be used by the output. This should refer to the name of an encoder plugin section that is specified elsewhere in the TOML configuration. Messages can be encoded using the specified encoder by calling the OutputRunner’s Encode() method.
Specifies whether or not Heka’s Stream Framing should be applied to the binary data returned from the OutputRunner’s Encode() method.
Connects to a remote AMQP broker (RabbitMQ) and sends messages to the specified queue. The message is serialized if specified, otherwise only the raw payload of the message will be sent. As AMQP is dynamically programmable, the broker topology needs to be specified.
Config:
An AMQP connection string formatted per the RabbitMQ URI Spec.
AMQP exchange name
AMQP exchange type (fanout, direct, topic, or headers).
Whether the exchange should be configured as a durable exchange. Defaults to non-durable.
Whether the exchange is deleted when all queues have finished and there is no publishing. Defaults to auto-delete.
The message routing key used to bind the queue to the exchange. Defaults to empty string.
Whether published messages should be marked as persistent or transient. Defaults to non-persistent.
A sub-section that specifies the settings to be used for restart behavior. See Configuring Restarting Behavior
New in version 0.6.
MIME content type of the payload used in the AMQP header. Defaults to “application/hekad”.
Specifies which of the registered encoders should be used for converting Heka messages to binary data that is sent out over the AMQP connection. Defaults to the always available “ProtobufEncoder”.
Specifies whether or not the encoded data sent out over the TCP connection should be delimited by Heka’s Stream Framing. Defaults to true.
New in version 0.6.
An optional sub-section that specifies the settings to be used for any SSL/TLS encryption. This will only have any impact if URL uses the AMQPS URI scheme. See Configuring TLS.
Example (that sends log lines from the logger):
[AMQPOutput]
url = "amqp://guest:guest@rabbitmq/"
exchange = "testout"
exchange_type = "fanout"
message_matcher = 'Logger == "TestWebserver"'
CarbonOutput plugins parse the “stat metric” messages generated by a StatAccumulator and write the extracted counter, timer, and gauge data out to a graphite compatible carbon daemon. Output is written over a TCP or UDP socket using the plaintext protocol.
Config:
An IP address:port on which this plugin will write to. (default: “localhost:2003”)
New in version 0.5.
“tcp” or “udp” (default: “tcp”)
if set, keep the TCP connection open and reuse it until a failure; then retry (default: false)
Example:
[CarbonOutput]
message_matcher = "Type == 'heka.statmetric'"
address = "localhost:2003"
protocol = "udp"
Specialized output plugin that listens for certain Heka reporting message types and generates JSON data which is made available via HTTP for use in web based dashboards and health reports.
Config:
Specifies how often, in seconds, the dashboard files should be updated. Defaults to 5.
Defaults to “Type == ‘heka.all-report’ || Type == ‘heka.sandbox-output’ || Type == ‘heka.sandbox-terminated’”. Not recommended to change this unless you know what you’re doing.
An IP address:port on which we will serve output via HTTP. Defaults to “0.0.0.0:4352”.
File system directory into which the plugin will write data files and from which it will serve HTTP. The Heka process must have read / write access to this directory. Relative paths will be evaluated relative to the Heka base directory. Defaults to $(BASE_DIR)/dashboard.
File system directory where the Heka dashboard source code can be found. The Heka process must have read access to this directory. Relative paths will be evaluated relative to the Heka base directory. Defaults to ${SHARE_DIR}/dasher.
New in version 0.7.
It is possible to inject arbitrary HTTP headers into each outgoing response by adding a TOML subsection entitled “headers” to you HttpOutput config section. All entries in the subsection must be a list of string values.
Example:
[DashboardOutput]
ticker_interval = 30
Output plugin that uses HTTP or UDP to insert records into an ElasticSearch database. Note that it is up to the specified encoder to both serialize the message into a JSON structure and to prepend that with the appropriate ElasticSearch BulkAPI indexing JSON. Usually this output is used in conjunction with an ElasticSearch-specific encoder plugin, such as ESJsonEncoder, ESLogstashV0Encoder, or ESPayloadEncoder.
Config:
Interval at which accumulated messages should be bulk indexed into ElasticSearch, in milliseconds. Defaults to 1000 (i.e. one second).
Number of messages that, if processed, will trigger them to be bulk indexed into ElasticSearch. Defaults to 10.
Time in milliseconds to wait for a response for each http post to ES. This may drop data as there is currently no retry. Default is 0 (no timeout).
Specifies whether or not re-using of established TCP connections to ElasticSearch should be disabled. Defaults to false, that means using both HTTP keep-alive mode and TCP keep-alives. Set it to true to close each TCP connection after ‘flushing’ messages to ElasticSearch.
Example:
[ElasticSearchOutput]
message_matcher = "Type == 'sync.log'"
server = "http://es-server:9200"
flush_interval = 5000
flush_count = 10
encoder = "ESJsonEncoder"
Writes message data out to a file system.
Config:
Full path to the output file.
File permission for writing. A string of the octal digit representation. Defaults to “644”.
Permissions to apply to directories created for FileOutput’s parent directory if it doesn’t exist. Must be a string representation of an octal integer. Defaults to “700”.
Interval at which accumulated file data should be written to disk, in milliseconds (default 1000, i.e. 1 second). Set to 0 to disable.
Number of messages to accumulate until file data should be written to disk (default 1, minimum 1).
Operator describing how the two parameters “flush_interval” and “flush_count” are combined. Allowed values are “AND” or “OR” (default is “AND”).
New in version 0.6.
Specifies whether or not the encoded data sent out over the TCP connection should be delimited by Heka’s Stream Framing. Defaults to true if a ProtobufEncoder is used, false otherwise.
Example:
[counter_file]
type = "FileOutput"
message_matcher = "Type == 'heka.counter-output'"
path = "/var/log/heka/counter-output.log"
prefix_ts = true
perm = "666"
flush_count = 100
flush_operator = "OR"
encoder = "PayloadEncoder"
New in version 0.6.
A very simple output plugin that uses HTTP GET, POST, or PUT requests to deliver data to an HTTP endpoint. When using POST or PUT request methods the encoded output will be uploaded as the request body. When using GET the encoded output will be ignored.
This output doesn’t support any request batching; each received message will generate an HTTP request. Batching can be achieved by use of a filter plugin that accumulates message data, periodically emitting a single message containing the batched, encoded HTTP request data in the payload. An HttpOutput can then be configured to capture these batch messages, using a PayloadEncoder to extract the message payload.
For now the HttpOutput only supports statically defined request parameters (URL, headers, auth, etc.). Future iterations will provide a mechanism for dynamically specifying these values on a per-message basis.
Config:
HTTP request method to use, must be one of GET, POST, or PUT. Defaults to POST.
If specified, HTTP Basic Auth will be used with the provided user name.
If specified, HTTP Basic Auth will be used with the provided password.
It is possible to inject arbitrary HTTP headers into each outgoing request by adding a TOML subsection entitled “headers” to you HttpOutput config section. All entries in the subsection must be a list of string values.
A sub-section that specifies the settings to be used for any SSL/TLS encryption. This will only have any impact if an “https://” address is used. See Configuring TLS.
Example:
[PayloadEncoder]
[influxdb]
message_matcher = "Type == 'influx.formatted'"
address = "http://influxdb.example.com:8086/db/stats/series"
encoder = "PayloadEncoder"
username = "MyUserName"
password = "MyPassword"
Connects to an Irc Server and sends messages to the specified Irc channels. Output is encoded using the specified encoder, and expects output to be properly truncated to fit within the bounds of an Irc message before being receiving the output.
Config:
A host:port of the irc server that Heka will connect to for sending output.
Irc nick used by Heka.
The Irc identity used to login with by Heka.
The password used to connect to the Irc server.
A list of Irc channels which every matching Heka message is sent to. If there is a space in the channel string, then the part after the space is expected to be a password for a protected irc channel.
The maximum amount of time (in seconds) to wait before timing out when connect, reading, or writing to the Irc server. Defaults to 10.
A sub-section that specifies the settings to be used for any SSL/TLS encryption. This will only have any impact if use_tls is set to true. See Configuring TLS.
This is the maximum amount of messages Heka will queue per Irc channel before discarding messages. There is also a queue of the same size used if all per-irc channel queues are full. This is used when Heka is unable to send a message to an Irc channel, such as when it hasn’t joined or has been disconnected. Defaults to 100.
Set this if you want Heka to automatically re-join an Irc channel after being kicked. If not set, and Heka is kicked, it will not attempt to rejoin ever. Defaults to false.
How often (in seconds) heka should send a message to the server. This is on a per message basis, not per channel. Defaults to 2.
How long to wait (in seconds) before reconnecting to the Irc server after being disconnected. Defaults to 3.
How long to wait (in seconds) before attempting to rejoin an Irc channel which is full. Defaults to 3.
The maximum amount of attempts Heka will attempt to join an Irc channel before giving up. After attempts are exhausted, Heka will no longer attempt to join the channel. Defaults to 3.
Enable to see raw internal message events Heka is receiving from the server. Defaults to false.
Specifies which of the registered encoders should be used for converting Heka messages into what is sent to the irc channels.
A sub-section that specifies the settings to be used for restart behavior. See Configuring Restarting Behavior
Example:
[IrcOutput]
message_matcher = 'Type == "alert"'
encoder = "PayloadEncoder"
server = "irc.mozilla.org:6667"
nick = "heka_bot"
ident = "heka_ident"
channels = [ "#heka_bot_irc testkeypassword" ]
rejoin_on_kick = true
queue_size = 200
ticker_interval = 1
Logs messages to stdout using Go’s log package.
Config:
<none>
Example:
[counter_output]
type = "LogOutput"
message_matcher = "Type == 'heka.counter-output'"
encoder = "PayloadEncoder"
Specialized output plugin that listens for Nagios external command message types and delivers passive service check results to Nagios using either HTTP requests made to the Nagios cmd.cgi API or the use of the send_ncsa binary. The message payload must consist of a state followed by a colon and then the message e.g., “OK:Service is functioning properly”. The valid states are: OK|WARNING|CRITICAL|UNKNOWN. Nagios must be configured with a service name that matches the Heka plugin instance name and the hostname where the plugin is running.
Config:
An HTTP URL to the Nagios cmd.cgi. Defaults to http://localhost/nagios/cgi-bin/cmd.cgi.
Username used to authenticate with the Nagios web interface. Defaults to empty string.
Password used to authenticate with the Nagios web interface. Defaults to empty string.
Specifies the amount of time, in seconds, to wait for a server’s response headers after fully writing the request. Defaults to 2.
Must match Nagios service’s service_description attribute. Defaults to the name of the output.
Must match the hostname of the server in nagios. Defaults to the Hostname attribute of the message.
New in version 0.5.
Use send_nsca program, as provided, rather than sending HTTP requests. Not supplying this value means HTTP will be used, and any other send_nsca_* settings will be ignored.
New in version 0.5.
Arguments to use with send_nsca, usually at least the nagios hostname, e.g. [“-H”, “nagios.somehost.com”]. Defaults to an empty list.
New in version 0.5.
Timeout for the send_nsca command, in seconds. Defaults to 5.
New in version 0.5.
Specifies whether or not SSL/TLS encryption should be used for the TCP connections. Defaults to false.
New in version 0.5.
A sub-section that specifies the settings to be used for any SSL/TLS encryption. This will only have any impact if use_tls is set to true. See Configuring TLS.
Example configuration to output alerts from SandboxFilter plugins:
[NagiosOutput]
url = "http://localhost/nagios/cgi-bin/cmd.cgi"
username = "nagiosadmin"
password = "nagiospw"
message_matcher = "Type == 'heka.sandbox-output' && Fields[payload_type] == 'nagios-external-command' && Fields[payload_name] == 'PROCESS_SERVICE_CHECK_RESULT'"
Example Lua code to generate a Nagios alert:
inject_payload("nagios-external-command", "PROCESS_SERVICE_CHECK_RESULT", "OK:Alerts are working!")
New in version 0.5.
Outputs a Heka message in an email. The message subject is the plugin name and the message content is controlled by the payload_only setting. The primary purpose is for email alert notifications e.g., PagerDuty.
Config:
The email address of the sender. (default: “heka@localhost.localdomain”)
An array of email addresses where the output will be sent to.
Custom subject line of email. (default: “Heka [SmtpOutput]”)
SMTP host to send the email to (default: “127.0.0.1:25”)
SMTP authentication type: “none”, “Plain”, “CRAMMD5” (default: “none”)
SMTP user name
SMTP user password
Example:
[FxaAlert]
type = "SmtpOutput"
message_matcher = "((Type == 'heka.sandbox-output' && Fields[payload_type] == 'alert') || Type == 'heka.sandbox-terminated') && Logger =~ /^Fxa/"
send_from = "heka@example.com"
send_to = ["alert@example.com"]
auth = "Plain"
user = "test"
password = "testpw"
host = "localhost:25"
encoder = "AlertEncoder"
Output plugin that delivers Heka message data to a listening TCP connection. Can be used to deliver messages from a local running Heka agent to a remote Heka instance set up as an aggregator and/or router, or to any other arbitrary listening TCP server that knows how to process the encoded data.
Config:
An IP address:port to which we will send our output data.
Specifies whether or not SSL/TLS encryption should be used for the TCP connections. Defaults to false.
New in version 0.5.
A sub-section that specifies the settings to be used for any SSL/TLS encryption. This will only have any impact if use_tls is set to true. See Configuring TLS.
Specifies how often, in seconds, the output queue files are rolled. Defaults to 300.
New in version 0.6.
A local IP address to use as the source address for outgoing traffic to this destination. Cannot currently be combined with TLS connections.
Specifies which of the registered encoders should be used for converting Heka messages to binary data that is sent out over the TCP connection. Defaults to the always available “ProtobufEncoder”.
Specifies whether or not the encoded data sent out over the TCP connection should be delimited by Heka’s Stream Framing. Defaults to true if a ProtobufEncoder is used, false otherwise.
Specifies whether or not TCP keepalive should be used for established TCP connections. Defaults to false.
Time duration in seconds that a TCP connection will be maintained before keepalive probes start being sent. Defaults to 7200 (i.e. 2 hours).
Example:
[aggregator_output]
type = "TcpOutput"
address = "heka-aggregator.mydomain.com:55"
local_address = "127.0.0.1"
message_matcher = "Type != 'logfile' && Type != 'heka.counter-output' && Type != 'heka.all-report'"
New in version 0.7.
Output plugin that delivers Heka message data to a specified UDP or Unix datagram socket location.
Config:
Network type to use for communication. Must be one of “udp”, “udp4”, “udp6”, or “unixgram”. “unixgram” option only available on systems that support Unix datagram sockets. Defaults to “udp”.
Address to which we will be sending the data. Must be IP:port for net types of “udp”, “udp4”, or “udp6”. Must be a path to a Unix datagram socket file for net type “unixgram”.
Local address to use on the datagram packets being generated. Must be IP:port for net types of “udp”, “udp4”, or “udp6”. Must be a path to a Unix datagram socket file for net type “unixgram”.
Name of registered encoder plugin that will extract and/or serialized data from the Heka message.
Example:
[PayloadEncoder]
[UdpOutput]
address = "myserver.example.com:34567"
encoder = "PayloadEncoder"
WhisperOutput plugins parse the “statmetric” messages generated by a StatAccumulator and write the extracted counter, timer, and gauge data out to a graphite compatible whisper database file tree structure.
Config:
Path to the base directory where the whisper file tree will be written. Absolute paths will be honored, relative paths will be calculated relative to the Heka base directory. Defaults to “whisper” (i.e. “$(BASE_DIR)/whisper”).
Default aggregation method to use for each whisper output file. Supports the following values:
Default specification for new whisper db archives. Should be a sequence of 3-tuples, where each tuple describes a time interval’s storage policy: [<offset> <# of secs per datapoint> <# of datapoints>] (see whisper docs for more info). Defaults to:
[ [0, 60, 1440], [0, 900, 8], [0, 3600, 168], [0, 43200, 1456]]
The above defines four archive sections. The first uses 60 seconds for each of 1440 data points, which equals one day of retention. The second uses 15 minutes for each of 8 data points, for two hours of retention. The third uses one hour for each of 168 data points, or 7 days of retention. Finally, the fourth uses 12 hours for each of 1456 data points, representing two years of data.
Permission mask to be applied to folders created in the whisper database file tree. Must be a string representation of an octal integer. Defaults to “700”.
Example:
[WhisperOutput]
message_matcher = "Type == 'heka.statmetric'"
default_agg_method = 3
default_archive_info = [ [0, 30, 1440], [0, 900, 192], [0, 3600, 168], [0, 43200, 1456] ]
folder_perm = "755"
hekad(1), hekad.config(5)
hekad [-version] [-config config_file]
Heka is an open source stream processing software system developed by Mozilla. Heka is a “Swiss Army Knife” type tool for data processing, useful for a wide variety of different tasks, such as:
The following resources are available to those who would like to ask questions, report problems, or learn more:
Heka is a heavily plugin based system. There are five different types of Heka plugins:
Input plugins acquire data from the outside world and inject it into the Heka pipeline. They can do this by reading files from a file system, actively making network connections to acquire data from remote servers, listening on a network socket for external actors to push data in, launching processes on the local system to gather arbitrary data, or any other mechanism. They must be written in Go.
Decoder plugins convert data that comes in through the Input plugins to Heka’s internal Message data structure. Typically decoders are responsible for any parsing, deserializing, or extracting of structure from unstructured data that needs to happen. They can be written entirely in Go, or the core logic can be written in sandboxed Lua code.
Filter plugins are Heka’s processing engines. They are configured to receive messages matching certain specific characteristics (using Heka’s Message Matcher Syntax) and are able to perform arbitrary monitoring, aggregation, and/or processing of the data. Filters are also able to generate new messages that can be reinjected into the Heka pipeline, such as summary messages containing aggregate data, notification messages in cases where suspicious anomalies are detected, or circular buffer data messages that will show up as real time graphs in Heka’s dashboard. Filters can be written entirely in Go, or the core logic can be written in sandboxed Lua code. It is also possible to configure Heka to allow Lua filters to be dynamically injected into a running Heka instance with needing to reconfigure or restart the Heka process, nor even to have shell access to the server on which Heka is running.
Encoder plugins are the inverse of Decoders. They generate arbitrary byte streams using data extracted from Heka Message structs. Encoders are embedded within Output plugins; Encoders handle the serialization, Outputs handle the details of interacting with the outside world. They can be written entirely in Go, or the core logic can be written in sandboxed Lua code.
Output plugins send data that has been serialized by an Encoder to some external destination. They handle all of the details of interacting with the network, filesystem, or any other outside resource. They are, like Filters, configured using Heka’s Message Matcher Syntax so they will only receive and deliver messages matching certain characteristics. They must be written in Go.
Information about developing plugins in Go can be found in the Extending Heka section. Details about using Lua sandboxes for Decoder, Filter, and Encoder plugins can be found in the Sandbox section.
/etc/hekad.toml configuration file
hekad.config(5), hekad.plugin(5)