mozilla

hekad

Description

Available hekad plugins compiled with this version of hekad.

Inputs

AMQPInput

Connects to a remote AMQP broker (RabbitMQ) and retrieves messages from the specified queue. As AMQP is dynamically programmable, the broker topology needs to be specified in the plugin configuration.

Config:

  • url (string):

    An AMQP connection string formatted per the RabbitMQ URI Spec.

  • exchange (string):

    AMQP exchange name

  • exchange_type (string):

    AMQP exchange type (fanout, direct, topic, or headers).

  • exchange_durability (bool):

    Whether the exchange should be configured as a durable exchange. Defaults to non-durable.

  • exchange_auto_delete (bool):

    Whether the exchange is deleted when all queues have finished and there is no publishing. Defaults to auto-delete.

  • routing_key (string):

    The message routing key used to bind the queue to the exchange. Defaults to empty string.

  • prefetch_count (int):

    How many messages to fetch at once before message acks are sent. See RabbitMQ performance measurements for help in tuning this number. Defaults to 2.

  • queue (string):

    Name of the queue to consume from, an empty string will have the broker generate a name for the queue. Defaults to empty string.

  • queue_durability (bool):

    Whether the queue is durable or not. Defaults to non-durable.

  • queue_exclusive (bool):

    Whether the queue is exclusive (only one consumer allowed) or not. Defaults to non-exclusive.

  • queue_auto_delete (bool):

    Whether the queue is deleted when the last consumer un-subscribes. Defaults to auto-delete.

  • queue_ttl (int):

    Allows ability to specify TTL in milliseconds on Queue declaration for expiring messages. Defaults to undefined/infinite.

  • decoder (string):

    Decoder name used to transform a raw message body into a structured hekad message. Must be a decoder appropriate for the messages that come in from the exchange. If accepting messages that have been generated by an AMQPOutput in another Heka process then this should be a ProtobufDecoder instance.

  • retries (RetryOptions, optional):

    A sub-section that specifies the settings to be used for restart behavior. See Configuring Restarting Behavior

New in version 0.6.

  • tls (TlsConfig):

    An optional sub-section that specifies the settings to be used for any SSL/TLS encryption. This will only have any impact if URL uses the AMQPS URI scheme. See Configuring TLS.

Since many of these parameters have sane defaults, a minimal configuration to consume serialized messages would look like:

[AMQPInput]
url = "amqp://guest:guest@rabbitmq/"
exchange = "testout"
exchange_type = "fanout"

Or you might use a PayloadRegexDecoder to parse OSX syslog messages with the following:

[AMQPInput]
url = "amqp://guest:guest@rabbitmq/"
exchange = "testout"
exchange_type = "fanout"
decoder = "logparser"

[logparser]
type = "MultiDecoder"
subs = ["logline", "leftovers"]

[logline]
type = "PayloadRegexDecoder"
MatchRegex = '\w+ \d+ \d+:\d+:\d+ \S+ (?P<Reporter>[^\[]+)\[(?P<Pid>\d+)](?P<Sandbox>[^:]+)?: (?P Remaining>.*)'

    [logline.MessageFields]
    Type = "amqplogline"
    Hostname = "myhost"
    Reporter = "%Reporter%"
    Remaining = "%Remaining%"
    Logger = "%Logger%"
    Payload = "%Remaining%"

[leftovers]
type = "PayloadRegexDecoder"
MatchRegex = '.*'

    [leftovers.MessageFields]
    Type = "drop"
    Payload = ""

DockerLogInput

New in version 0.8.

The DockerLogInput plugin attaches to all containers running on a host and sends their logs messages into the Heka pipeline. The plugin is based on Logspout by Jeff Lindsay. Messages will be populated as follows:

  • Uuid: Type 4 (random) UUID generated by Heka.
  • Timestamp: Time when the log line was received by the plugin.
  • Type: DockerLog.
  • Hostname: Hostname of the machine on which Heka is running.
  • Payload: The log line received from a Docker container.
  • Logger: stdout or stderr, depending on source.
  • Fields[“ContainerID”] (string): The container ID
  • Fields[“ContainerName”] (string): The container name

Config:

  • endpoint (string):

    A Docker endpoint. Defaults to “unix:///var/run/docker.sock”.

  • decoder (string):

    The name of the decoder used to further transform the message into a structured hekad message. No default decoder is specified.

Example:

    [nginx_log_decoder]
    type = "SandboxDecoder"
    filename = "lua_decoders/nginx_access.lua"

        [nginx_log_decoder.config]
        type = "nginx.access"
        user_agent_transform = true
        log_format = '$remote_addr - $remote_user [$time_local] "$request" $status $body_bytes_sent "$http_referer" "$http_user_agent"'

[DockerLogInput]
decoder = "nginx_log_decoder"

FilePollingInput

New in version 0.7.

FilePollingInputs periodically read (unbuffered) the contents of a file specified, and creates a Heka message with the contents of the file as the payload.

Config:

  • file_path(string):

    The absolute path to the file which the input should read.

  • ticker_interval (unit):

    How often, in seconds to input should read the contents of the file.

  • decoder (string):

    The name of the decoder used to process the payload of the input.

Example:

[MemStats]
type = "FilePollingInput"
ticker_interval = 1
file_path = "/proc/meminfo"
decoder = "MemStatsDecoder"

HttpInput

HttpInput plugins intermittently poll remote HTTP URLs for data and populate message objects based on the results of the HTTP interactions. Messages will be populated as follows:

  • Uuid: Type 4 (random) UUID generated by Heka.

  • Timestamp: Time HTTP request is completed.

  • Type: heka.httpinput.data or heka.httpinput.error depending on whether or

    not the request completed. (Note that a response returned with an HTTP error code is still considered complete and will generate type heka.httpinput.data.)

  • Hostname: Hostname of the machine on which Heka is running.

  • Payload: Entire contents of the HTTP response body.

  • Severity: HTTP response 200 uses success_severity config value, all other

    results use error_severity config value.

  • Logger: Fetched URL.

  • Fields[“Status”] (string): HTTP status string value (e.g. “200 OK”).

  • Fields[“StatusCode”] (int): HTTP status code integer value.

  • Fields[“ResponseSize”] (int): Value of HTTP Content-Length header.

  • Fields[“ResponseTime”] (float64): Clock time elapsed for HTTP request, in

    seconds.

  • Fields[“Protocol”] (string): HTTP protocol used for the request (e.g.

    “HTTP/1.0”)

The Fields values above will only be populated in the event of a completed HTTP request. Also, it is possible to specify a decoder to further process the results of the HTTP response before injecting the message into the router.

Config:

  • url (string):

    A HTTP URL which this plugin will regularly poll for data. This option cannot be used with the urls option. No default URL is specified.

  • urls (array):

    New in version 0.5.

    An array of HTTP URLs which this plugin will regularly poll for data. This option cannot be used with the url option. No default URLs are specified.

  • method (string):

    New in version 0.5.

    The HTTP method to use for the request. Defaults to “GET”.

  • headers (subsection):

    New in version 0.5.

    Subsection defining headers for the request. By default the User-Agent header is set to “Heka”

  • body (string):

    New in version 0.5.

    The request body (e.g. for an HTTP POST request). No default body is specified.

  • username (string):

    New in version 0.5.

    The username for HTTP Basic Authentication. No default username is specified.

  • password (string):

    New in version 0.5.

    The password for HTTP Basic Authentication. No default password is specified.

  • ticker_interval (uint):

    Time interval (in seconds) between attempts to poll for new data. Defaults to 10.

  • success_severity (uint):

    New in version 0.5.

    Severity level of successful HTTP request. Defaults to 6 (information).

  • error_severity (uint):

    New in version 0.5.

    Severity level of errors, unreachable connections, and non-200 responses of successful HTTP requests. Defaults to 1 (alert).

  • decoder (string):

    The name of the decoder used to further transform the response body text into a structured hekad message. No default decoder is specified.

Example:

[HttpInput]
url = "http://localhost:9876/"
ticker_interval = 5
success_severity = 6
error_severity = 1
decoder = "MyCustomJsonDecoder"
    [HttpInput.headers]
    user-agent = "MyCustomUserAgent"

HttpListenInput

New in version 0.5.

HttpListenInput plugins start a webserver listening on the specified address and port. If no decoder is specified data in the request body will be populated as the message payload. Messages will be populated as follows:

  • Uuid: Type 4 (random) UUID generated by Heka.

  • Timestamp: Time HTTP request is handled.

  • Type: heka.httpdata.request

  • Hostname: The remote network address of requester.

  • Payload: Entire contents of the HTTP response body.

  • Severity: 6

  • Logger: HttpListenInput

  • Fields[“UserAgent”] (string): Request User-Agent header (e.g. “GitHub Hookshot dd0772a”).

  • Fields[“ContentType”] (string): Request Content-Type header (e.g. “application/x-www-form-urlencoded”).

  • Fields[“Protocol”] (string): HTTP protocol used for the request (e.g.

    “HTTP/1.0”)

Config:

  • address (string):

    An IP address:port on which this plugin will expose a HTTP server. Defaults to “127.0.0.1:8325”.

  • decoder (string):

    The name of the decoder used to further transform the request body text into a structured hekad message. No default decoder is specified.

New in version 0.7.

  • headers (subsection, optional):

    It is possible to inject arbitrary HTTP headers into each outgoing response by adding a TOML subsection entitled “headers” to you HttpOutput config section. All entries in the subsection must be a list of string values.

Example:

[HttpListenInput]
address = "0.0.0.0:8325"

Logstreamer Input

New in version 0.5.

Tails a single log file, a sequential single log source, or multiple log sources of either a single logstream or multiple logstreams.

Config:

  • hostname (string):

    The hostname to use for the messages, by default this will be the machine’s qualified hostname. This can be set explicitly to ensure it’s the correct name in the event the machine has multiple interfaces/hostnames.

  • oldest_duration (string):

    A time duration string (e.x. “2s”, “2m”, “2h”). Logfiles with a last modified time older than oldest_duration ago will not be included for parsing.

  • journal_directory (string):

    The directory to store the journal files in for tracking the location that has been read to thus far. By default this is stored under heka’s base directory.

  • log_directory (string):

    The root directory to scan files from. This scan is recursive so it should be suitably restricted to the most specific directory this selection of logfiles will be matched under. The log_directory path will be prepended to the file_match.

  • rescan_interval (int):

    During logfile rotation, or if the logfile is not originally present on the system, this interval is how often the existence of the logfile will be checked for. The default of 5 seconds is usually fine. This interval is in milliseconds.

  • file_match (string):

    Regular expression used to match files located under the log_directory. This regular expression has $ added to the end automatically if not already present, and log_directory as the prefix. WARNING: file_match should typically be delimited with single quotes, indicating use of a raw string, rather than double quotes, which require all backslashes to be escaped. For example, ‘access\.log’ will work as expected, but “access\.log” will not, you would need “access\\.log” to achieve the same result.

  • priority (list of strings):

    When using sequential logstreams, the priority is how to sort the logfiles in order from oldest to newest.

  • differentiator (list of strings):

    When using multiple logstreams, the differentiator is a set of strings that will be used in the naming of the logger, and portions that match a captured group from the file_match will have their matched value substituted in.

  • translation (hash map of hash maps of ints):

    A set of translation mappings for matched groupings to the ints to use for sorting purposes.

  • decoder (string):

    A ProtobufDecoder instance must be specified for the message.proto parser. Use of a decoder is optional for token and regexp parsers; if no decoder is specified the parsed data is available in the Heka message payload.

  • parser_type (string):
    • token - splits the log on a byte delimiter (default).
    • regexp - splits the log on a regexp delimiter.
    • message.proto - splits the log on protobuf message boundaries
  • delimiter (string): Only used for token or regexp parsers.

    Character or regexp delimiter used by the parser (default “\n”). For the regexp delimiter a single capture group can be specified to preserve the delimiter (or part of the delimiter). The capture will be added to the start or end of the log line depending on the delimiter_location configuration. Note: when a start delimiter is used the last line in the file will not be processed (since the next record defines its end) until the log is rolled.

  • delimiter_location (string): Only used for regexp parsers.
    • start - the regexp delimiter occurs at the start of a log line.
    • end - the regexp delimiter occurs at the end of the log line (default).

ProcessInput

Executes one or more external programs on an interval, creating messages from the output. Supports a chain of commands, where stdout from each process will be piped into the stdin for the next process in the chain. In the event the program returns a non-zero exit code, ProcessInput will log that an error occurred.

Config:

  • command (map[uint]cmd_config):

    The command is a structure that contains the full path to the binary, command line arguments, optional enviroment variables and an optional working directory (see below). ProcessInput expects the commands to be indexed by integers starting with 0, where 0 is the first process in the chain.

  • ticker_interval (uint):

    The number of seconds to wait between each run of command. Defaults to 15. A ticker_interval of 0 indicates that the command is run only once, and should only be used for long running processes that do not exit. If ticker_interval is set to 0 and the process exits, then the ProcessInput will exit, invoking the restart behavior (see Configuring Restarting Behavior).

  • stdout (bool):

    If true, for each run of the process chain a message will be generated with the last command in the chain’s stdout as the payload. Defaults to true.

  • stderr (bool):

    If true, for each run of the process chain a message will be generated with the last command in the chain’s stderr as the payload. Defaults to false.

  • decoder (string):

    Name of the decoder instance to send messages to. If omitted messages will be injected directly into Heka’s message router.

  • parser_type (string):
    • token - splits the log on a byte delimiter (default).
    • regexp - splits the log on a regexp delimiter.
  • delimiter (string): Only used for token or regexp parsers.

    Character or regexp delimiter used by the parser (default “\n”). For the regexp delimiter a single capture group can be specified to preserve the delimiter (or part of the delimiter). The capture will be added to the start or end of the log line depending on the delimiter_location configuration. Note: when a start delimiter is used the last line in the file will not be processed (since the next record defines its end) until the log is rolled.

  • delimiter_location (string): Only used for regexp parsers.
    • start - the regexp delimiter occurs at the start of a log line.
    • end - the regexp delimiter occurs at the end of the log line (default).
  • timeout (uint):

    Timeout in seconds before any one of the commands in the chain is terminated.

  • trim (bool) :

    Trim a single trailing newline character if one exists. Default is true.

  • retries (RetryOptions, optional):

    A sub-section that specifies the settings to be used for restart behavior. See Configuring Restarting Behavior

cmd_config structure:

  • bin (string):

    The full path to the binary that will be executed.

  • args ([]string):

    Command line arguments to pass into the executable.

  • env ([]string):

    Used to set environment variables before command is run. Default is nil, which uses the heka process’s environment.

  • directory (string):

    Used to set the working directory of Bin Default is “”, which uses the heka process’s working directory.

Example:

[DemoProcessInput]
type = "ProcessInput"
ticker_interval = 2
parser_type = "token"
delimiter = " "
stdout = true
stderr = false
trim = true

    [DemoProcessInput.command.0]
    bin = "/bin/cat"
    args = ["../testsupport/process_input_pipes_test.txt"]

    [DemoProcessInput.command.1]
    bin = "/usr/bin/grep"
    args = ["ignore"]

ProcessDirectoryInput

New in version 0.5.

The ProcessDirectoryInput periodically scans a filesystem directory looking for ProcessInput configuration files. The ProcessDirectoryInput will maintain a pool of running ProcessInputs based on the contents of this directory, refreshing the set of running inputs as needed with every rescan. This allows Heka administrators to manage a set of data collection processes for a running hekad server without restarting the server.

Each ProcessDirectoryInput has a process_dir configuration setting, which is the root folder of the tree where scheduled jobs are defined. It should contain exactly one nested level of subfolders, named with ASCII numeric characters indicating the interval, in seconds, between each process run. These numeric folders must contain TOML files which specify the details regarding which processes to run.

For example, a process_dir might look like this:

-/usr/share/heka/processes/
 |-5
   |- check_myserver_running.toml
 |-61
   |- cat_proc_mounts.toml
   |- get_running_processes.toml
 |-302
   |- some_custom_query.toml

This indicates one process to be run every five seconds, two processes to be run every 61 seconds, and one process to be run every 302 seconds.

Note that ProcessDirectory will ignore any files that are not nested one level deep, are not in a folder named for an integer 0 or greater, and do not end with ‘.toml’. Each file which meets these criteria, such as those shown in the example above, should contain the TOML configuration for exactly one ProcessInput, matching that of a standalone ProcessInput with the following restrictions:

  • The section name must be ProcessInput. Any TOML sections named anything other than ProcessInput will be ignored.
  • Any specified ticker_interval value will be ignored. The ticker interval value to use will be parsed from the directory path.

If the specified process fails to run or the ProcessInput config fails for any other reason, ProcessDirectoryInput will log an error message and continue.

Config:

  • ticker_interval (int, optional):

    Amount of time, in seconds, between scans of the process_dir. Defaults to 300 (i.e. 5 minutes).

  • process_dir (string, optional):

    This is the root folder of the tree where the scheduled jobs are defined. Absolute paths will be honored, relative paths will be computed relative to Heka’s globally specified share_dir. Defaults to “processes” (i.e. “$share_dir/processes”).

  • retries (RetryOptions, optional):

    A sub-section that specifies the settings to be used for restart behavior. See Configuring Restarting Behavior

Example:

[ProcessDirectoryInput]
process_dir = "/etc/hekad/processes.d"
ticker_interval = 120

StatAccumInput

Provides an implementation of the StatAccumulator interface which other plugins can use to submit Stat objects for aggregation and roll-up. Accumulates these stats and then periodically emits a “stat metric” type message containing aggregated information about the stats received since the last generated message.

Config:

  • emit_in_payload (bool):

    Specifies whether or not the aggregated stat information should be emitted in the payload of the generated messages, in the format accepted by the carbon portion of the graphite graphing software. Defaults to true.

  • emit_in_fields (bool):

    Specifies whether or not the aggregated stat information should be emitted in the message fields of the generated messages. Defaults to false. NOTE: At least one of ‘emit_in_payload’ or ‘emit_in_fields’ must be true or it will be considered a configuration error and the input won’t start.

  • percent_threshold (int):

    Percent threshold to use for computing “upper_N%” type stat values. Defaults to 90.

  • ticker_interval (uint):

    Time interval (in seconds) between generated output messages. Defaults to 10.

  • message_type (string):

    String value to use for the Type value of the emitted stat messages. Defaults to “heka.statmetric”.

  • legacy_namespaces (bool):

    If set to true, then use the older format for namespacing counter stats, with rates recorded under stats.<counter_name> and absolute count recorded under stats_counts.<counter_name>. See statsd metric namespacing. Defaults to false.

  • global_prefix (string):

    Global prefix to use for sending stats to graphite. Defaults to “stats”.

  • counter_prefix (string):

    Secondary prefix to use for namespacing counter metrics. Has no impact unless legacy_namespaces is set to false. Defaults to “counters”.

  • timer_prefix (string):

    Secondary prefix to use for namespacing timer metrics. Defaults to “timers”.

  • gauge_prefix (string):

    Secondary prefix to use for namespacing gauge metrics. Defaults to “gauges”.

  • statsd_prefix (string):

    Prefix to use for the statsd numStats metric. Defaults to “statsd”.

  • delete_idle_stats (bool):

    Don’t emit values for inactive stats instead of sending 0 or in the case of gauges, sending the previous value. Defaults to false.

StatsdInput

Listens for statsd protocol counter, timer, or gauge messages on a UDP port, and generates Stat objects that are handed to a StatAccumulator for aggregation and processing.

Config:

  • address (string):

    An IP address:port on which this plugin will expose a statsd server. Defaults to “127.0.0.1:8125”.

  • stat_accum_name (string):

    Name of a StatAccumInput instance that this StatsdInput will use as its StatAccumulator for submitting received stat values. Defaults to “StatAccumInput”.

  • max_msg_size (uint):

    Size of a buffer used for message read from statsd. In some cases, when statsd sends a lots in single message of stats it’s required to boost this value. All over-length data will be truncated without raising an error. Defaults to 512.

Example:

[StatsdInput]
address = ":8125"
stat_accum_name = "custom_stat_accumulator"

TcpInput

Listens on a specific TCP address and port for messages. If the message is signed it is verified against the signer name and specified key version. If the signature is not valid the message is discarded otherwise the signer name is added to the pipeline pack and can be use to accept messages using the message_signer configuration option.

Config:

  • address (string):

    An IP address:port on which this plugin will listen.

  • signer:

    Optional TOML subsection. Section name consists of a signer name, underscore, and numeric version of the key.

    • hmac_key (string):

      The hash key used to sign the message.

New in version 0.4.

  • decoder (string):

    A ProtobufDecoder instance must be specified for the message.proto parser. Use of a decoder is optional for token and regexp parsers; if no decoder is specified the raw input data is available in the Heka message payload.

  • parser_type (string):
    • token - splits the stream on a byte delimiter.
    • regexp - splits the stream on a regexp delimiter.
    • message.proto - splits the stream on protobuf message boundaries.
  • delimiter (string): Only used for token or regexp parsers.

    Character or regexp delimiter used by the parser (default “\n”). For the regexp delimiter a single capture group can be specified to preserve the delimiter (or part of the delimiter). The capture will be added to the start or end of the message depending on the delimiter_location configuration.

  • delimiter_location (string): Only used for regexp parsers.
    • start - the regexp delimiter occurs at the start of the message.
    • end - the regexp delimiter occurs at the end of the message (default).

New in version 0.5.

  • use_tls (bool):

    Specifies whether or not SSL/TLS encryption should be used for the TCP connections. Defaults to false.

  • tls (TlsConfig):

    A sub-section that specifies the settings to be used for any SSL/TLS encryption. This will only have any impact if use_tls is set to true. See Configuring TLS.

  • net (string, optional, default: “tcp”)

    Network value must be one of: “tcp”, “tcp4”, “tcp6”, “unix” or “unixpacket”.

New in version 0.6.

  • keep_alive (bool):

    Specifies whether or not TCP keepalive should be used for established TCP connections. Defaults to false.

  • keep_alive_period (int):

    Time duration in seconds that a TCP connection will be maintained before keepalive probes start being sent. Defaults to 7200 (i.e. 2 hours).

Example:

[TcpInput]
address = ":5565"
parser_type = "message.proto"
decoder = "ProtobufDecoder"

[TcpInput.signer.ops_0]
hmac_key = "4865ey9urgkidls xtb0[7lf9rzcivthkm"
[TcpInput.signer.ops_1]
hmac_key = "xdd908lfcgikauexdi8elogusridaxoalf"

[TcpInput.signer.dev_1]
hmac_key = "haeoufyaiofeugdsnzaogpi.ua,dp.804u"

UdpInput

Listens on a specific UDP address and port for messages. If the message is signed it is verified against the signer name and specified key version. If the signature is not valid the message is discarded otherwise the signer name is added to the pipeline pack and can be use to accept messages using the message_signer configuration option.

Note

The UDP payload is not restricted to a single message; since the stream parser is being used multiple messages can be sent in a single payload.

Config:

  • address (string):

    An IP address:port or Unix datagram socket file path on which this plugin will listen.

  • signer:

    Optional TOML subsection. Section name consists of a signer name, underscore, and numeric version of the key.

    • hmac_key (string):

      The hash key used to sign the message.

New in version 0.4.

  • decoder (string):

    A ProtobufDecoder instance must be specified for the message.proto parser. Use of a decoder is optional for token and regexp parsers; if no decoder is specified the raw input data is available in the Heka message payload.

  • parser_type (string):
    • token - splits the stream on a byte delimiter.
    • regexp - splits the stream on a regexp delimiter.
    • message.proto - splits the stream on protobuf message boundaries.
  • delimiter (string): Only used for token or regexp parsers.

    Character or regexp delimiter used by the parser (default “\n”). For the regexp delimiter a single capture group can be specified to preserve the delimiter (or part of the delimiter). The capture will be added to the start or end of the message depending on the delimiter_location configuration.

  • delimiter_location (string): Only used for regexp parsers.
    • start - the regexp delimiter occurs at the start of the message.
    • end - the regexp delimiter occurs at the end of the message (default).

New in version 0.5.

  • net (string, optional, default: “udp”)

    Network value must be one of: “udp”, “udp4”, “udp6”, or “unixgram”.

Example:

[UdpInput]
address = "127.0.0.1:4880"
parser_type = "message.proto"
decoder = "ProtobufDecoder"

[UdpInput.signer.ops_0]
hmac_key = "4865ey9urgkidls xtb0[7lf9rzcivthkm"
[UdpInput.signer.ops_1]
hmac_key = "xdd908lfcgikauexdi8elogusridaxoalf"

[UdpInput.signer.dev_1]
hmac_key = "haeoufyaiofeugdsnzaogpi.ua,dp.804u"

Decoders

Apache Access Log Decoder

New in version 0.6.

Parses the Apache access logs based on the Apache ‘LogFormat’ configuration directive. The Apache format specifiers are mapped onto the Nginx variable names where applicable e.g. %a -> remote_addr. This allows generic web filters and outputs to work with any HTTP server input.

Config:

  • log_format (string)

    The ‘LogFormat’ configuration directive from the apache2.conf. %t variables are converted to the number of nanosecond since the Unix epoch and used to set the Timestamp on the message. http://httpd.apache.org/docs/2.4/mod/mod_log_config.html

  • type (string, optional, default nil):

    Sets the message ‘Type’ header to the specified value

  • user_agent_transform (bool, optional, default false)

    Transform the http_user_agent into user_agent_browser, user_agent_version, user_agent_os.

  • user_agent_keep (bool, optional, default false)

    Always preserve the http_user_agent value if transform is enabled.

  • user_agent_conditional (bool, optional, default false)

    Only preserve the http_user_agent value if transform is enabled and fails.

  • payload_keep (bool, optional, default false)

    Always preserve the original log line in the message payload.

Example Heka Configuration

[TestWebserver]
type = "LogstreamerInput"
log_directory = "/var/log/apache"
file_match = 'access\.log'
decoder = "CombinedLogDecoder"

[CombinedLogDecoder]
type = "SandboxDecoder"
filename = "lua_decoders/apache_access.lua"

[CombinedLogDecoder.config]
type = "combined"
user_agent_transform = true
# combined log format
log_format = '%h %l %u %t \"%r\" %>s %O \"%{Referer}i\" \"%{User-Agent}i\"'

# common log format
# log_format = '%h %l %u %t \"%r\" %>s %O'

# vhost_combined log format
# log_format = '%v:%p %h %l %u %t \"%r\" %>s %O \"%{Referer}i\" \"%{User-Agent}i\"'

# referer log format
# log_format = '%{Referer}i -> %U'

Example Heka Message

Timestamp:2014-01-10 07:04:56 -0800 PST
Type:combined
Hostname:test.example.com
Pid:0
UUID:8e414f01-9d7f-4a48-a5e1-ae92e5954df5
Logger:TestWebserver
Payload:
EnvVersion:
Severity:7
Fields:
name:”remote_user” value_string:”-“
name:”http_x_forwarded_for” value_string:”-“
name:”http_referer” value_string:”-“
name:”body_bytes_sent” value_type:DOUBLE representation:”B” value_double:82
name:”remote_addr” value_string:”62.195.113.219” representation:”ipv4”
name:”status” value_type:DOUBLE value_double:200
name:”request” value_string:”GET /v1/recovery_email/status HTTP/1.1”
name:”user_agent_os” value_string:”FirefoxOS”
name:”user_agent_browser” value_string:”Firefox”
name:”user_agent_version” value_type:DOUBLE value_double:29

Graylog Extended Log Format Decoder

New in version 0.8.

Parses a payload containing JSON in the Graylog2 Extended Format specficiation. http://graylog2.org/resources/gelf/specification

Config:

  • type (string, optional, default nil):

    Sets the message ‘Type’ header to the specified value

  • payload_keep (bool, optional, default false)

    Always preserve the original log line in the message payload.

Example of Graylog2 Exteded Format Log

{
  "version": "1.1",
  "host": "rogueethic.com",
  "short_message": "This is a short message to identify what is going on.",
  "full_message": "An entire backtrace\ncould\ngo\nhere",
  "timestamp": 1385053862.3072,
  "level": 1,
  "_user_id": 9001,
  "_some_info": "foo",
  "_some_env_var": "bar"
}

Example Heka Configuration

[GELFLogInput]
type = "LogstreamerInput"
log_directory = "/var/log"
file_match = 'application\.gelf'
decoder = "GraylogDecoder"

[GraylogDecoder]
type = "SandboxDecoder"
filename = "lua_decoders/graylog_decoder.lua"

    [GraylogDecoder.config]
    type = "gelf"
    payload_keep = true

New in version 0.6.

GeoIpDecoder

New in version 0.6.

Decoder plugin that generates GeoIP data based on the IP address of a specified field. It uses the GeoIP Go project as a wrapper around MaxMind’s geoip-api-c library, and thus assumes you have the library downloaded and installed. Currently, only the GeoLiteCity database is supported, which you must also download and install yourself into a location to be referenced by the db_file config option. By default the database file is opened using “GEOIP_MEMORY_CACHE” mode. This setting is hard- coded into the wrapper’s geoip.go file. You will need to manually override that code if you want to specify one of the other modes listed here.

Note

Due to external dependencies, this plugin is not compiled in to the released Heka binaries. It will automatically be included in a source build if GeoIP.h is available in the include path during build time. The generated binary will then only work on machines with the appropriate GeoIP shared library (e.g. libGeoIP.so.1) installed.

Note

If you are using this with the ES output you will likely need to specify the raw_bytes_field option for the target_field specified. This is required to preserve the formatting of the JSON object.

Config:

  • db_file:

    The location of the GeoLiteCity.dat database. Defaults to “/var/cache/hekad/GeoLiteCity.dat”

  • source_ip_field:

    The name of the field containing the IP address you want to derive the location for.

  • target_field:

    The name of the new field created by the decoder. The decoder will output a JSON object with the following elements:

    • latitute: string,

    • longitude: string,

    • location: [ float64, float64 ],
    • coordinates: [ string, string ],

    • countrycode: string,

    • countrycode3: string,

    • region: string,

    • city: string,

    • postalcode: string,

    • areacode: int,

    • charset: int,

    • continentalcode: string

[apache_geoip_decoder]
type = "GeoIpDecoder"
db_file="/etc/geoip/GeoLiteCity.dat"
source_ip_field="remote_host"
target_field="geoip"

MultiDecoder

This decoder plugin allows you to specify an ordered list of delegate decoders. The MultiDecoder will pass the PipelinePack to be decoded to each of the delegate decoders in turn until decode succeeds. In the case of failure to decode, MultiDecoder will return an error and recycle the message.

Config:

  • subs ([]string):

    An ordered list of subdecoders to which the MultiDecoder will delegate. Each item in the list should specify another decoder configuration section by section name. Must contain at least one entry.

  • log_sub_errors (bool):

    If true, the DecoderRunner will log the errors returned whenever a delegate decoder fails to decode a message. Defaults to false.

  • cascade_strategy (string):

    Specifies behavior the MultiDecoder should exhibit with regard to cascading through the listed decoders. Supports only two valid values: “first-wins” and “all”. With “first-wins”, each decoder will be tried in turn until there is a successful decoding, after which decoding will be stopped. With “all”, all listed decoders will be applied whether or not they succeed. In each case, decoding will only be considered to have failed if none of the sub-decoders succeed.

Here is a slightly contrived example where we have protocol buffer encoded messages coming in over a TCP connection, with each message containin a single nginx log line. Our MultiDecoder will run each message through two decoders, the first to deserialize the protocol buffer and the second to parse the log text:

[TcpInput]
address = ":5565"
parser_type = "message.proto"
decoder = "shipped-nginx-decoder"

[shipped-nginx-decoder]
type = "MultiDecoder"
subs = ['ProtobufDecoder', 'nginx-access-decoder']
cascade_strategy = "all"
log_sub_errors = true

[ProtobufDecoder]

[nginx-access-decoder]
type = "SandboxDecoder"
filename = "lua_decoders/nginx_access.lua"

    [nginx-access-decoder.config]
    type = "combined"
    user_agent_transform = true
    log_format = '$remote_addr - $remote_user [$time_local] "$request" $status $body_bytes_sent "$http_referer" "$http_user_agent"'

Linux Disk Stats Decoder

New in version 0.7.

Parses a payload containing the contents of a /sys/block/$DISK/stat file (where $DISK is a disk identifier such as sda) into a Heka message struct. This also tries to obtain the TickerInterval of the input it recieved the data from, by extracting it from a message field named TickerInterval.

Config:

  • payload_keep (bool, optional, default false)

    Always preserve the original log line in the message payload.

Example Heka Configuration

[DiskStats]
type = "FilePollingInput"
ticker_interval = 1
file_path = "/sys/block/sda1/stat"
decoder = "DiskStatsDecoder"

[DiskStatsDecoder]
type = "SandboxDecoder"
filename = "lua_decoders/linux_diskstats.lua"

Example Heka Message

Timestamp:2014-01-10 07:04:56 -0800 PST
Type:stats.diskstats
Hostname:test.example.com
Pid:0
UUID:8e414f01-9d7f-4a48-a5e1-ae92e5954df5
Payload:
EnvVersion:
Severity:7
Fields:
name:”ReadsCompleted” value_type:DOUBLE value_double:”20123”
name:”ReadsMerged” value_type:DOUBLE value_double:”11267”
name:”SectorsRead” value_type:DOUBLE value_double:”1.094968e+06”
name:”TimeReading” value_type:DOUBLE value_double:”45148”
name:”WritesCompleted” value_type:DOUBLE value_double:”1278”
name:”WritesMerged” value_type:DOUBLE value_double:”1278”
name:”SectorsWritten” value_type:DOUBLE value_double:”206504”
name:”TimeWriting” value_type:DOUBLE value_double:”3348”
name:”TimeDoingIO” value_type:DOUBLE value_double:”4876”
name:”WeightedTimeDoingIO” value_type:DOUBLE value_double:”48356”
name:”NumIOInProgress” value_type:DOUBLE value_double:”3”
name:”TickerInterval” value_type:DOUBLE value_double:”2”
name:”FilePath” value_string:”/sys/block/sda/stat”

Linux Load Average Decoder

New in version 0.7.

Parses a payload containing the contents of a /proc/loadavg file into a Heka message.

Config:

  • payload_keep (bool, optional, default false)

    Always preserve the original log line in the message payload.

Example Heka Configuration

[LoadAvg]
type = "FilePollingInput"
ticker_interval = 1
file_path = "/proc/loadavg"
decoder = "LoadAvgDecoder"

[LoadAvgDecoder]
type = "SandboxDecoder"
filename = "lua_decoders/linux_loadavg.lua"

Example Heka Message

Timestamp:2014-01-10 07:04:56 -0800 PST
Type:stats.loadavg
Hostname:test.example.com
Pid:0
UUID:8e414f01-9d7f-4a48-a5e1-ae92e5954df5
Payload:
EnvVersion:
Severity:7
Fields:
name:”1MinAvg” value_type:DOUBLE value_double:”3.05”
name:”5MinAvg” value_type:DOUBLE value_double:”1.21”
name:”15MinAvg” value_type:DOUBLE value_double:”0.44”
name:”NumProcesses” value_type:DOUBLE value_double:”11”
name:”FilePath” value_string:”/proc/loadavg”

Linux Memory Stats Decoder

New in version 0.7.

Parses a payload containing the contents of a /proc/meminfo file into a Heka message.

Config:

  • payload_keep (bool, optional, default false)

    Always preserve the original log line in the message payload.

Example Heka Configuration

[MemStats]
type = "FilePollingInput"
ticker_interval = 1
file_path = "/proc/meminfo"
decoder = "MemStatsDecoder"

[MemStatsDecoder]
type = "SandboxDecoder"
filename = "lua_decoders/linux_memstats.lua"

Example Heka Message

Timestamp:2014-01-10 07:04:56 -0800 PST
Type:stats.memstats
Hostname:test.example.com
Pid:0
UUID:8e414f01-9d7f-4a48-a5e1-ae92e5954df5
Payload:
EnvVersion:
Severity:7
Fields:
name:”MemTotal” value_type:DOUBLE representation:”kB” value_double:”4047616”
name:”MemFree” value_type:DOUBLE representation:”kB” value_double:”3432216”
name:”Buffers” value_type:DOUBLE representation:”kB” value_double:”82028”
name:”Cached” value_type:DOUBLE representation:”kB” value_double:”368636”
name:”FilePath” value_string:”/proc/meminfo”

The total available fields can be found in man procfs. All fields are of type double, and the representation is in kB (except for the HugePages fields). Here is a full list of fields available:

MemTotal, MemFree, Buffers, Cached, SwapCached, Active, Inactive, Active(anon), Inactive(anon), Active(file), Inactive(file), Unevictable, Mlocked, SwapTotal, SwapFree, Dirty, Writeback, AnonPages, Mapped, Shmem, Slab, SReclaimable, SUnreclaim, KernelStack, PageTables, NFS_Unstable, Bounce, WritebackTmp, CommitLimit, Committed_AS, VmallocTotal, VmallocUsed, VmallocChunk, HardwareCorrupted, AnonHugePages, HugePages_Total, HugePages_Free, HugePages_Rsvd, HugePages_Surp, Hugepagesize, DirectMap4k, DirectMap2M, DirectMap1G.

Note that your available fields may have a slight variance depending on the system’s kernel version.

MySQL Slow Query Log Decoder

New in version 0.6.

Parses and transforms the MySQL slow query logs. Use mariadb_slow_query.lua to parse the MariaDB variant of the MySQL slow query logs.

Config:

  • truncate_sql (int, optional, default nil)

    Truncates the SQL payload to the specified number of bytes (not UTF-8 aware) and appends ”...”. If the value is nil no truncation is performed. A negative value will truncate the specified number of bytes from the end.

Example Heka Configuration

[Sync-1_5-SlowQuery]
type = "LogstreamerInput"
log_directory = "/var/log/mysql"
file_match = 'mysql-slow\.log'
parser_type = "regexp"
delimiter = "\n(# User@Host:)"
delimiter_location = "start"
decoder = "MySqlSlowQueryDecoder"

[MySqlSlowQueryDecoder]
type = "SandboxDecoder"
filename = "lua_decoders/mysql_slow_query.lua"

    [MySqlSlowQueryDecoder.config]
    truncate_sql = 64

Example Heka Message

Timestamp:2014-05-07 15:51:28 -0700 PDT
Type:mysql.slow-query
Hostname:127.0.0.1
Pid:0
UUID:5324dd93-47df-485b-a88e-429f0fcd57d6
Logger:Sync-1_5-SlowQuery
Payload:/* [queryName=FIND_ITEMS] */ SELECT bso.userid, bso.collection, ...
EnvVersion:
Severity:7
Fields:
name:”Rows_examined” value_type:DOUBLE value_double:16458
name:”Query_time” value_type:DOUBLE representation:”s” value_double:7.24966
name:”Rows_sent” value_type:DOUBLE value_double:5001
name:”Lock_time” value_type:DOUBLE representation:”s” value_double:0.047038

Nginx Access Log Decoder

New in version 0.5.

Parses the Nginx access logs based on the Nginx ‘log_format’ configuration directive.

Config:

  • log_format (string)

    The ‘log_format’ configuration directive from the nginx.conf. $time_local or $time_iso8601 variable is converted to the number of nanosecond since the Unix epoch and used to set the Timestamp on the message. http://nginx.org/en/docs/http/ngx_http_log_module.html

  • type (string, optional, default nil):

    Sets the message ‘Type’ header to the specified value

  • user_agent_transform (bool, optional, default false)

    Transform the http_user_agent into user_agent_browser, user_agent_version, user_agent_os.

  • user_agent_keep (bool, optional, default false)

    Always preserve the http_user_agent value if transform is enabled.

  • user_agent_conditional (bool, optional, default false)

    Only preserve the http_user_agent value if transform is enabled and fails.

  • payload_keep (bool, optional, default false)

    Always preserve the original log line in the message payload.

Example Heka Configuration

[TestWebserver]
type = "LogstreamerInput"
log_directory = "/var/log/nginx"
file_match = 'access\.log'
decoder = "CombinedLogDecoder"

[CombinedLogDecoder]
type = "SandboxDecoder"
filename = "lua_decoders/nginx_access.lua"

[CombinedLogDecoder.config]
type = "combined"
user_agent_transform = true
# combined log format
log_format = '$remote_addr - $remote_user [$time_local] "$request" $status $body_bytes_sent "$http_referer" "$http_user_agent"'

Example Heka Message

Timestamp:2014-01-10 07:04:56 -0800 PST
Type:combined
Hostname:test.example.com
Pid:0
UUID:8e414f01-9d7f-4a48-a5e1-ae92e5954df5
Logger:TestWebserver
Payload:
EnvVersion:
Severity:7
Fields:
name:”remote_user” value_string:”-“
name:”http_x_forwarded_for” value_string:”-“
name:”http_referer” value_string:”-“
name:”body_bytes_sent” value_type:DOUBLE representation:”B” value_double:82
name:”remote_addr” value_string:”62.195.113.219” representation:”ipv4”
name:”status” value_type:DOUBLE value_double:200
name:”request” value_string:”GET /v1/recovery_email/status HTTP/1.1”
name:”user_agent_os” value_string:”FirefoxOS”
name:”user_agent_browser” value_string:”Firefox”
name:”user_agent_version” value_type:DOUBLE value_double:29

Nginx Error Log Decoder

New in version 0.6.

Parses the Nginx error logs based on the Nginx hard coded internal format.

Config:

  • tz (string, optional, defaults to UTC)

    The conversion actually happens on the Go side since there isn’t good TZ support here.

Example Heka Configuration

[TestWebserverError]
type = "LogstreamerInput"
log_directory = "/var/log/nginx"
file_match = 'error\.log'
decoder = "NginxErrorDecoder"

[NginxErrorDecoder]
type = "SandboxDecoder"
filename = "lua_decoders/nginx_error.lua"

[NginxErrorDecoder.config]
tz = "America/Los_Angeles"

Example Heka Message

Timestamp:2014-01-10 07:04:56 -0800 PST
Type:nginx.error
Hostname:trink-x230
Pid:16842
UUID:8e414f01-9d7f-4a48-a5e1-ae92e5954df5
Logger:TestWebserverError
Payload:using inherited sockets from “6;”
EnvVersion:
Severity:5
Fields:
name:”tid” value_type:DOUBLE value_double:0
name:”connection” value_type:DOUBLE value_double:8878

PayloadRegexDecoder

Decoder plugin that accepts messages of a specified form and generates new outgoing messages from extracted data, effectively transforming one message format into another.

Note

The Go regular expression tester is an invaluable tool for constructing and debugging regular expressions to be used for parsing your input data.

Config:

  • match_regex:

    Regular expression that must match for the decoder to process the message.

  • severity_map:

    Subsection defining severity strings and the numerical value they should be translated to. hekad uses numerical severity codes, so a severity of WARNING can be translated to 3 by settings in this section. See Heka Message.

  • message_fields:

    Subsection defining message fields to populate and the interpolated values that should be used. Valid interpolated values are any captured in a regex in the message_matcher, and any other field that exists in the message. In the event that a captured name overlaps with a message field, the captured name’s value will be used. Optional representation metadata can be added at the end of the field name using a pipe delimiter i.e. ResponseSize|B = “%ResponseSize%” will create Fields[ResponseSize] representing the number of bytes. Adding a representation string to a standard message header name will cause it to be added as a user defined field i.e., Payload|json will create Fields[Payload] with a json representation (see Field Variables).

    Interpolated values should be surrounded with % signs, for example:

    [my_decoder.message_fields]
    Type = "%Type%Decoded"
    

    This will result in the new message’s Type being set to the old messages Type with Decoded appended.

  • timestamp_layout (string):

    A formatting string instructing hekad how to turn a time string into the actual time representation used internally. Example timestamp layouts can be seen in Go’s time documentation. In addition to the Go time formatting, special timestamp_layout values of “Epoch”, “EpochMilli”, “EpochMicro”, and “EpochNano” are supported for Unix style timestamps represented in seconds, milliseconds, microseconds, and nanoseconds since the Epoch, respectively.

  • timestamp_location (string):

    Time zone in which the timestamps in the text are presumed to be in. Should be a location name corresponding to a file in the IANA Time Zone database (e.g. “America/Los_Angeles”), as parsed by Go’s time.LoadLocation() function (see http://golang.org/pkg/time/#LoadLocation). Defaults to “UTC”. Not required if valid time zone info is embedded in every parsed timestamp, since those can be parsed as specified in the timestamp_layout. This setting will have no impact if one of the supported “Epoch*” values is used as the timestamp_layout setting.

  • log_errors (bool):

    New in version 0.5.

    If set to false, payloads that can not be matched against the regex will not be logged as errors. Defaults to true.

Example (Parsing Apache Combined Log Format):

[apache_transform_decoder]
type = "PayloadRegexDecoder"
match_regex = '^(?P<RemoteIP>\S+) \S+ \S+ \[(?P<Timestamp>[^\]]+)\] "(?P<Method>[A-Z]+) (?P<Url>[^\s]+)[^"]*" (?P<StatusCode>\d+) (?P<RequestSize>\d+) "(?P<Referer>[^"]*)" "(?P<Browser>[^"]*)"'
timestamp_layout = "02/Jan/2006:15:04:05 -0700"

# severities in this case would work only if a (?P<Severity>...) matching
# group was present in the regex, and the log file contained this information.
[apache_transform_decoder.severity_map]
DEBUG = 7
INFO = 6
WARNING = 4

[apache_transform_decoder.message_fields]
Type = "ApacheLogfile"
Logger = "apache"
Url|uri = "%Url%"
Method = "%Method%"
Status = "%Status%"
RequestSize|B = "%RequestSize%"
Referer = "%Referer%"
Browser = "%Browser%"

PayloadXmlDecoder

This decoder plugin accepts XML blobs in the message payload and allows you to map parts of the XML into Field attributes of the pipeline pack message using XPath syntax using the xmlpath library.

Config:

  • xpath_map:

    A subsection defining a capture name that maps to an XPath expression. Each expression can fetch a single value, if the expression does not resolve to a valid node in the XML blob, the capture group will be assigned an empty string value.

  • severity_map:

    Subsection defining severity strings and the numerical value they should be translated to. hekad uses numerical severity codes, so a severity of WARNING can be translated to 3 by settings in this section. See Heka Message.

  • message_fields:

    Subsection defining message fields to populate and the interpolated values that should be used. Valid interpolated values are any captured in an XPath in the message_matcher, and any other field that exists in the message. In the event that a captured name overlaps with a message field, the captured name’s value will be used. Optional representation metadata can be added at the end of the field name using a pipe delimiter i.e. ResponseSize|B = “%ResponseSize%” will create Fields[ResponseSize] representing the number of bytes. Adding a representation string to a standard message header name will cause it to be added as a user defined field i.e., Payload|json will create Fields[Payload] with a json representation (see Field Variables).

    Interpolated values should be surrounded with % signs, for example:

    [my_decoder.message_fields]
    Type = "%Type%Decoded"
    

    This will result in the new message’s Type being set to the old messages Type with Decoded appended.

  • timestamp_layout (string):

    A formatting string instructing hekad how to turn a time string into the actual time representation used internally. Example timestamp layouts can be seen in Go’s time documentation. The default layout is ISO8601 - the same as Javascript. In addition to the Go time formatting, special timestamp_layout values of “Epoch”, “EpochMilli”, “EpochMicro”, and “EpochNano” are supported for Unix style timestamps represented in seconds, milliseconds, microseconds, and nanoseconds since the Epoch, respectively.

  • timestamp_location (string):

    Time zone in which the timestamps in the text are presumed to be in. Should be a location name corresponding to a file in the IANA Time Zone database (e.g. “America/Los_Angeles”), as parsed by Go’s time.LoadLocation() function (see http://golang.org/pkg/time/#LoadLocation). Defaults to “UTC”. Not required if valid time zone info is embedded in every parsed timestamp, since those can be parsed as specified in the timestamp_layout. This setting will have no impact if one of the supported “Epoch*” values is used as the timestamp_layout setting.

Example:

[myxml_decoder]
type = "PayloadXmlDecoder"

[myxml_decoder.xpath_map]
Count = "/some/path/count"
Name = "/some/path/name"
Pid = "//pid"
Timestamp = "//timestamp"
Severity = "//severity"

[myxml_decoder.severity_map]
DEBUG = 7
INFO = 6
WARNING = 4

[myxml_decoder.message_fields]
Pid = "%Pid%"
StatCount = "%Count%"
StatName =  "%Name%"
Timestamp = "%Timestamp%"

PayloadXmlDecoder’s xpath_map config subsection supports XPath as implemented by the xmlpath library.

  • All axes are supported (“child”, “following-sibling”, etc)
  • All abbreviated forms are supported (”.”, “//”, etc)
  • All node types except for namespace are supported
  • Predicates are restricted to [N], [path], and [path=literal] forms
  • Only a single predicate is supported per path step
  • Richer expressions and namespaces are not supported

ProtobufDecoder

The ProtobufDecoder is used for Heka message objects that have been serialized into protocol buffers format. This is the format that Heka uses to communicate with other Heka instances, so one will always be included in your Heka configuration whether specified or not. The ProtobufDecoder has no configuration options.

The hekad protocol buffers message schema in defined in the message.proto file in the message package.

Example:

[ProtobufDecoder]

Rsyslog Decoder

New in version 0.5.

Parses the rsyslog output using the string based configuration template.

Config:

  • template (string)

    The ‘template’ configuration string from rsyslog.conf. http://rsyslog-5-8-6-doc.neocities.org/rsyslog_conf_templates.html

  • tz (string, optional, defaults to UTC)

    If your rsyslog timestamp field in the template does not carry zone offset information, you may set an offset to be applied to your events here. Typically this would be used with the “Traditional” rsyslog formats.

    Parsing is done by Go, supports values of “UTC”, “Local”, or a location name corresponding to a file in the IANA Time Zone database, e.g. “America/New_York”.

Example Heka Configuration

[RsyslogDecoder]
type = "SandboxDecoder"
filename = "lua_decoders/rsyslog.lua"

[RsyslogDecoder.config]
type = "RSYSLOG_TraditionalFileFormat"
template = '%TIMESTAMP% %HOSTNAME% %syslogtag%%msg:::sp-if-no-1st-sp%%msg:::drop-last-lf%\n'
tz = "America/Los_Angeles"

Example Heka Message

Timestamp:2014-02-10 12:58:58 -0800 PST
Type:RSYSLOG_TraditionalFileFormat
Hostname:trink-x230
Pid:0
UUID:e0eef205-0b64-41e8-a307-5772b05e16c1
Logger:RsyslogInput
Payload:“imklog 5.8.6, log source = /proc/kmsg started.”
EnvVersion:
Severity:7
Fields:
name:”programname” value_string:”kernel”

SandboxDecoder

The SandboxDecoder provides an isolated execution environment for data parsing and complex transformations without the need to recompile Heka. See Sandbox.

Config:

Example

[sql_decoder]
type = "SandboxDecoder"
filename = "sql_decoder.lua"

ScribbleDecoder

New in version 0.5.

The ScribbleDecoder is a trivial decoder that makes it possible to set one or more static field values on every decoded message. It is often used in conjunction with another decoder (i.e. in a MultiDecoder w/ cascade_strategy set to “all”) to, for example, set the message type of every message to a specific custom value after the messages have been decoded from Protocol Buffers format. Note that this only supports setting the exact same value on every message, if any dynamic computation is required to determine what the value should be, or whether it should be applied to a specific message, a SandboxDecoder using the provided write_message API call should be used instead.

Config:

  • message_fields:

    Subsection defining message fields to populate. Optional representation metadata can be added at the end of the field name using a pipe delimiter i.e. host|ipv4 = “192.168.55.55” will create Fields[Host] containing an IPv4 address. Adding a representation string to a standard message header name will cause it to be added as a user defined field, i.e. Payload|json will create Fields[Payload] with a json representation (see Field Variables). Does not support Timestamp or Uuid.

Example (in MultiDecoder context)

[mytypedecoder]
type = "MultiDecoder"
subs = ["ProtobufDecoder", "mytype"]
cascade_strategy = "all"
log_sub_errors = true

[ProtobufDecoder]

[mytype]
type = "ScribbleDecoder"

    [mytype.message_fields]
    Type = "MyType"

StatsToFieldsDecoder

New in version 0.4.

The StatsToFieldsDecoder will parse time series statistics data in the graphite message format and encode the data into the message fields, in the same format produced by a StatAccumInput plugin with the emit_in_fields value set to true. This is useful if you have externally generated graphite string data flowing through Heka that you’d like to process without having to roll your own string parsing code.

This decoder has no configuration options. It simply expects to be passed messages with statsd string data in the payload. Incorrect or malformed content will cause a decoding error, dropping the message.

The fields format only contains a single “timestamp” field, so any payloads containing multiple timestamps will end up generating a separate message for each timestamp. Extra messages will be a copy of the original message except a) the payload will be empty and b) the unique timestamp and related stats will be the only message fields.

Example:

[StatsToFieldsDecoder]

Filters

Common Filter Parameters

There are some configuration options that are universally available to all Heka filter plugins. These will be consumed by Heka itself when Heka initializes the plugin and do not need to be handled by the plugin-specific initialization code.

  • message_matcher (string, optional):

    Boolean expression, when evaluated to true passes the message to the filter for processing. Defaults to matching nothing. See: Message Matcher Syntax

  • message_signer (string, optional):

    The name of the message signer. If specified only messages with this signer are passed to the filter for processing.

  • ticker_interval (uint, optional):

    Frequency (in seconds) that a timer event will be sent to the filter. Defaults to not sending timer events.

Circular Buffer Delta Aggregator

New in version 0.5.

Collects the circular buffer delta output from multiple instances of an upstream sandbox filter (the filters should all be the same version at least with respect to their cbuf output). The purpose is to recreate the view at a larger scope in each level of the aggregation i.e., host view -> datacenter view -> service level view.

Config:

  • enable_delta (bool, optional, default false)

    Specifies whether or not this aggregator should generate cbuf deltas.

  • anomaly_config(string) - (see Anomaly Detection Module)

    A list of anomaly detection specifications. If not specified no anomaly detection/alerting will be performed.

  • preservation_version (uint, optional, default 0)

    If preserve_data = true is set in the SandboxFilter configuration, then this value should be incremented every time the enable_delta configuration is changed to prevent the plugin from failing to start during data restoration.

Example Heka Configuration

[TelemetryServerMetricsAggregator]
type = "SandboxFilter"
message_matcher = "Logger == 'TelemetryServerMetrics' && Fields[payload_type] == 'cbufd'"
ticker_interval = 60
filename = "lua_filters/cbufd_aggregator.lua"
preserve_data = true

[TelemetryServerMetricsAggregator.config]
enable_delta = false
anomaly_config = 'roc("Request Statistics", 1, 15, 0, 1.5, true, false)'
preservation_version = 0

CBuf Delta Aggregator By Hostname

New in version 0.5.

Collects the circular buffer delta output from multiple instances of an upstream sandbox filter (the filters should all be the same version at least with respect to their cbuf output). Each column from the source circular buffer will become its own graph. i.e., ‘Error Count’ will become a graph with each host being represented in a column.

Config:

  • max_hosts (uint)

    Pre-allocates the number of host columns in the graph(s). If the number of active hosts exceed this value, the plugin will terminate.

  • rows (uint)

    The number of rows to keep from the original circular buffer. Storing all the data from all the hosts is not practical since you will most likely run into memory and output size restrictions (adjust the view down as necessary).

  • host_expiration (uint, optional, default 120 seconds)

    The amount of time a host has to be inactive before it can be replaced by a new host.

  • preservation_version (uint, optional, default 0)

    If preserve_data = true is set in the SandboxFilter configuration, then this value should be incremented every time the max_hosts or rows configuration is changed to prevent the plugin from failing to start during data restoration.

Example Heka Configuration

[TelemetryServerMetricsHostAggregator]
type = "SandboxFilter"
message_matcher = "Logger == 'TelemetryServerMetrics' && Fields[payload_type] == 'cbufd'"
ticker_interval = 60
filename = "lua_filters/cbufd_host_aggregator.lua"
preserve_data = true

[TelemetryServerMetricsHostAggregator.config]
max_hosts = 5
rows = 60
host_expiration = 120
preservation_version = 0

CounterFilter

Once per ticker interval a CounterFilter will generate a message of type heka .counter-output. The payload will contain text indicating the number of messages that matched the filter’s message_matcher value during that interval (i.e. it counts the messages the plugin received). Every ten intervals an extra message (also of type heka.counter-output) goes out, containing an aggregate count and average per second throughput of messages received.

Config:

  • ticker_interval (int, optional):

    Interval between generated counter messages, in seconds. Defaults to 5.

Example:

[CounterFilter]
message_matcher = "Type != 'heka.counter-output'"

Cpu Stats Filter

New in version 0.7.

Graphs the load average and process count data. Expects to receive messages containing fields entitled 1MinAvg, 5MinAvg, 15MinAvg, and NumProcesses, such as those generated by the Linux Load Average Decoder.

Config:

  • sec_per_row (uint, optional, default 60)

    Sets the size of each bucket (resolution in seconds) in the sliding window.

  • rows (uint, optional, default 1440)

    Sets the size of the sliding window i.e., 1440 rows representing 60 seconds per row is a 24 sliding hour window with 1 minute resolution.

  • anomaly_config (string, optional)

    See Anomaly Detection Module.

  • preservation_version (uint, optional, default 0)

    If preserve_data = true is set in the SandboxFilter configuration, then this value should be incremented every time the sec_per_row or rows configuration is changed to prevent the plugin from failing to start during data restoration.

Example Heka Configuration

[LoadAvgFilter]
type = "SandboxFilter"
filename = "lua_filters/loadavg.lua"
ticker_interval = 60
preserve_data = true
message_matcher = "Type == 'stats.loadavg'"

Disk Stats Filter

New in version 0.7.

Graphs disk IO stats. It automatically converts the running totals of Writes and Reads into rates of the values. The time based fields are left as running totals of the amount of time doing IO. Expects to receive messages with disk IO data embedded in a particular set of message fields which matches what is generated by Linux Disk Stats Decoder: WritesCompleted, ReadsCompleted, SectorsWritten, SectorsRead, WritesMerged, ReadsMerged, TimeWriting, TimeReading, TimeDoingIO, WeightedTimeDoingIO, TickerInterval.

Config:

  • rows (uint, optional, default 1440)

    Sets the size of the sliding window i.e., 1440 rows representing 60 seconds per row is a 24 sliding hour window with 1 minute resolution.

  • anomaly_config(string) - (see Anomaly Detection Module)

Example Heka Configuration

[DiskStatsFilter]
type = "SandboxFilter"
filename = "lua_filters/diskstats.lua"
preserve_data = true
message_matcher = "Type == 'stats.diskstats'"

Frequent Items

New in version 0.5.

Calculates the most frequent items in a data stream.

Config:

  • message_variable (string)

    The message variable name containing the items to be counted.

  • max_items (uint, optional, default 1000)

    The maximum size of the sample set (higher will produce a more accurate list).

  • min_output_weight (uint, optional, default 100)

    Used to reduce the long tail output by only outputting the higher frequency items.

  • reset_days (uint, optional, default 1)

    Resets the list after the specified number of days (on the UTC day boundary). A value of 0 will never reset the list.

Example Heka Configuration

[FxaAuthServerFrequentIP]
type = "SandboxFilter"
filename = "lua_filters/frequent_items.lua"
ticker_interval = 60
preserve_data = true
message_matcher = "Logger == 'nginx.access' && Type == 'fxa-auth-server'"

[FxaAuthServerFrequentIP.config]
message_variable = "Fields[remote_addr]"
max_items = 10000
min_output_weight = 100
reset_days = 1

Heka Memory Statistics

New in version 0.6.

Graphs the Heka memory statistics using the heka.memstat message generated by pipeline/report.go.

Config:

  • rows (uint, optional, default 1440)

    Sets the size of the sliding window i.e., 1440 rows representing 60 seconds per row is a 24 sliding hour window with 1 minute resolution.

  • sec_per_row (uint, optional, default 60)

    Sets the size of each bucket (resolution in seconds) in the sliding window.

  • anomaly_config (string, optional)

    See Anomaly Detection Module.

  • preservation_version (uint, optional, default 0)

    If preserve_data = true is set in the SandboxFilter configuration, then this value should be incremented every time the rows or sec_per_row configuration is changed to prevent the plugin from failing to start during data restoration.

Example Heka Configuration

[HekaMemstat]
type = "SandboxFilter"
filename = "lua_filters/heka_memstat.lua"
ticker_interval = 60
preserve_data = true
message_matcher = "Type == 'heka.memstat'"

Heka Message Schema

New in version 0.5.

Generates documentation for each unique message in a data stream. The output is a hierarchy of Logger, Type, EnvVersion, and a list of associated message field attributes including their counts (number in the brackets). This plugin is meant for data discovery/exploration and should not be left running on a production system.

Config:

<none>

Example Heka Configuration

[SyncMessageSchema]
type = "SandboxFilter"
filename = "lua_filters/heka_message_schema.lua"
ticker_interval = 60
preserve_data = false
message_matcher = "Logger =~ /^Sync/"

Example Output

Sync-1_5-Webserver [54600]
slf [54600]
-no version- [54600]
upstream_response_time (mismatch)
http_user_agent (string)
body_bytes_sent (number)
remote_addr (string)
request (string)
upstream_status (mismatch)
status (number)
request_time (number)
request_length (number)
Sync-1_5-SlowQuery [37]
mysql.slow-query [37]
-no version- [37]
Query_time (number)
Rows_examined (number)
Rows_sent (number)
Lock_time (number)

HTTP Status Graph

New in version 0.5.

Graphs HTTP status codes using the numeric Fields[status] variable collected from web server access logs.

Config:

  • sec_per_row (uint, optional, default 60)

    Sets the size of each bucket (resolution in seconds) in the sliding window.

  • rows (uint, optional, default 1440)

    Sets the size of the sliding window i.e., 1440 rows representing 60 seconds per row is a 24 sliding hour window with 1 minute resolution.

  • anomaly_config (string, optional)

    See Anomaly Detection Module.

  • preservation_version (uint, optional, default 0)

    If preserve_data = true is set in the SandboxFilter configuration, then this value should be incremented every time the sec_per_row or rows configuration is changed to prevent the plugin from failing to start during data restoration.

Example Heka Configuration

[FxaAuthServerHTTPStatus]
type = "SandboxFilter"
filename = "lua_filters/http_status.lua"
ticker_interval = 60
preserve_data = true
message_matcher = "Logger == 'nginx.access' && Type == 'fxa-auth-server'"

[FxaAuthServerHTTPStatus.config]
sec_per_row = 60
rows = 1440
anomaly_config = 'roc("HTTP Status", 2, 15, 0, 1.5, true, false) roc("HTTP Status", 4, 15, 0, 1.5, true, false) mww_nonparametric("HTTP Status", 5, 15, 10, 0.8)'
preservation_version = 0

Memory Stats Filter

New in version 0.7.

Graphs memory usage statistics. Expects to receive messages with memory usage data embedded in a specific set of message fields, which matches the messages generated by Linux Memory Stats Decoder: MemFree, Cached, Active, Inactive, VmallocUsed, Shmem, SwapCached.

Config:

  • sec_per_row (uint, optional, default 60)

    Sets the size of each bucket (resolution in seconds) in the sliding window.

  • rows (uint, optional, default 1440)

    Sets the size of the sliding window i.e., 1440 rows representing 60 seconds per row is a 24 sliding hour window with 1 minute resolution.

  • anomaly_config (string, optional)

    See Anomaly Detection Module.

  • preservation_version (uint, optional, default 0)

    If preserve_data = true is set in the SandboxFilter configuration, then this value should be incremented every time the sec_per_row or rows configuration is changed to prevent the plugin from failing to start during data restoration.

Example Heka Configuration

[MemoryStatsFilter]
type = "SandboxFilter"
filename = "lua_filters/memstats.lua"
ticker_interval = 60
preserve_data = true
message_matcher = "Type == 'stats.memstats'"

MySQL Slow Query

New in version 0.6.

Graphs MySQL slow query data produced by the MySQL Slow Query Log Decoder.

Config:

  • sec_per_row (uint, optional, default 60)

    Sets the size of each bucket (resolution in seconds) in the sliding window.

  • rows (uint, optional, default 1440)

    Sets the size of the sliding window i.e., 1440 rows representing 60 seconds per row is a 24 sliding hour window with 1 minute resolution.

  • anomaly_config (string, optional)

    See Anomaly Detection Module.

  • preservation_version (uint, optional, default 0)

    If preserve_data = true is set in the SandboxFilter configuration, then this value should be incremented every time the sec_per_row or rows configuration is changed to prevent the plugin from failing to start during data restoration.

Example Heka Configuration

[Sync-1_5-SlowQueries]
type = "SandboxFilter"
message_matcher = "Logger == 'Sync-1_5-SlowQuery'"
ticker_interval = 60
filename = "lua_filters/mysql_slow_query.lua"

    [Sync-1_5-SlowQueries.config]
    anomaly_config = 'mww_nonparametric("Statistics", 5, 15, 10, 0.8)'
    preservation_version = 0

StatFilter

Filter plugin that accepts messages of a specfied form and uses extracted message data to feed statsd-style numerical metrics in the form of Stat objects to a StatAccumulator.

Config:

  • Metric:

    Subsection defining a single metric to be generated. Both the name and value fields for each metric support interpolation of message field values (from ‘Type’, ‘Hostname’, ‘Logger’, ‘Payload’, or any dynamic field name) with the use of %% delimiters, so %Hostname% would be replaced by the message’s Hostname field, and %Foo% would be replaced by the first value of a dynamic field called “Foo”:

    • type (string):

      Metric type, supports “Counter”, “Timer”, “Gauge”.

    • name (string):

      Metric name, must be unique.

    • value (string):

      Expression representing the (possibly dynamic) value that the StatFilter should emit for each received message.

  • stat_accum_name (string):

    Name of a StatAccumInput instance that this StatFilter will use as its StatAccumulator for submitting generate stat values. Defaults to “StatAccumInput”.

Example:

[StatAccumInput]
ticker_interval = 5

[StatsdInput]
address = "127.0.0.1:29301"

[Hits]
type = "StatFilter"
message_matcher = 'Type == "ApacheLogfile"'

[Hits.Metric.bandwidth]
type = "Counter"
name = "httpd.bytes.%Hostname%"
value = "%Bytes%"

[Hits.Metric.method_counts]
type = "Counter"
name = "httpd.hits.%Method%.%Hostname%"
value = "1"

Note

StatFilter requires an available StatAccumInput to be running.

SandboxFilter

The sandbox filter provides an isolated execution environment for data analysis. Any output generated by the sandbox is injected into the payload of a new message for further processing or to be output.

Config:

Example:

[hekabench_counter]
type = "SandboxFilter"
message_matcher = "Type == 'hekabench'"
ticker_interval = 1
filename = "counter.lua"
preserve_data = true
profile = false

    [hekabench_counter.config]
    rows = 1440
    sec_per_row = 60

SandboxManagerFilter

The SandboxManagerFilter provides dynamic control (start/stop) of sandbox filters in a secure manner without stopping the Heka daemon. Commands are sent to a SandboxManagerFilter using a signed Heka message. The intent is to have one manager per access control group each with their own message signing key. Users in each group can submit a signed control message to manage any filters running under the associated manager. A signed message is not an enforced requirement but it is highly recommended in order to restrict access to this functionality.

SandboxManagerFilter Settings

  • Common Filter Parameters

  • working_directory (string):

    The directory where the filter configurations, code, and states are preserved. The directory can be unique or shared between sandbox managers since the filter names are unique per manager. Defaults to a directory in ${BASE_DIR}/sbxmgrs with a name generated from the plugin name.

  • module_directory (string):

    The directory where ‘require’ will attempt to load the external Lua modules from. Defaults to ${SHARE_DIR}/lua_modules.

  • max_filters (uint):

    The maximum number of filters this manager can run.

New in version 0.5.

  • memory_limit (uint):

    The number of bytes managed sandboxes are allowed to consume before being terminated (default 8MiB).

  • instruction_limit (uint):

    The number of instructions managed sandboxes are allowed to execute during the process_message/timer_event functions before being terminated (default 1M).

  • output_limit (uint):

    The number of bytes managed sandbox output buffers can hold before being terminated (default 63KiB). Warning: messages exceeding 64KiB will generate an error and be discarded by the standard output plugins (File, TCP, UDP) since they exceed the maximum message size.

Example

[OpsSandboxManager]
type = "SandboxManagerFilter"
message_signer = "ops"
# message_matcher = "Type == 'heka.control.sandbox'" # automatic default setting
max_filters = 100

Stats Graph

New in version 0.7.

Converts stat values extracted from statmetric messages (see StatAccumInput) to circular buffer data and periodically emits messages containing this data to be graphed by a DashboardOutput. Note that this filter expects the stats data to be available in the message fields, so the StatAccumInput must be configured with emit_in_fields set to true for this filter to work correctly.

Config:

  • title (string, optional, default “Stats”):

    Title for the graph output generated by this filter.

  • rows (uint, optional, default 300):

    The number of rows to store in our circular buffer. Each row represents one time interval.

  • sec_per_row (uint, optional, default 1):

    The number of seconds in each circular buffer time interval.

  • stats (string):

    Space separated list of stat names. Each specified stat will be expected to be found in the fields of the received statmetric messages, and will be extracted and inserted into its own column in the accumulated circular buffer.

  • stat_labels (string):

    Space separated list of header label names to use for the extracted stats. Must be in the same order as the specified stats. Any label longer than 15 characters will be truncated.

  • anomaly_config (string, optional):

    Anomaly detection configuration, see Anomaly Detection Module.

  • preservation_version (uint, optional, default 0):

    If preserve_data = true is set in the SandboxFilter configuration, then this value should be incremented every time any edits are made to your rows, sec_per_row, stats, or stat_labels values, or else Heka will fail to start because the preserved data will no longer match the filter’s data structure.

Example Heka Configuration

[stat-graph]
type = "SandboxFilter"
filename = "lua_filters/stat_graph.lua"
ticker_interval = 10
preserve_data = true
message_matcher = "Type == 'heka.statmetric'"

  [stat-graph.config]
  title = "Hits and Misses"
  rows = 1440
  sec_per_row = 10
  stats = "stats.counters.hits.count stats.counters.misses.count"
  stat_labels = "hits misses"
  anomaly_config = 'roc("Hits and Misses", 1, 15, 0, 1.5, true, false) roc("Hits and Misses", 2, 15, 0, 1.5, true, false)'
  preservation_version = 0

Unique Items

New in version 0.6.

Counts the number of unique items per day e.g. active daily users by uid.

Config:

  • message_variable (string, required)

    The Heka message variable containing the item to be counted.

  • title (string, optional, default “Estimated Unique Daily message_variable”)

    The graph title for the cbuf output.

  • enable_delta (bool, optional, default false)

    Specifies whether or not this plugin should generate cbuf deltas. Deltas should be enabled when sharding is used; see: Circular Buffer Delta Aggregator.

  • preservation_version (uint, optional, default 0)

    If preserve_data = true is set in the SandboxFilter configuration, then this value should be incremented every time the enable_delta configuration is changed to prevent the plugin from failing to start during data restoration.

Example Heka Configuration

[FxaActiveDailyUsers]
type = "SandboxFilter"
filename = "lua_filters/unique_items.lua"
ticker_interval = 60
preserve_data = true
message_matcher = "Logger == 'FxaAuth' && Type == 'request.summary' && Fields[path] == '/v1/certificate/sign' && Fields[errno] == 0"

    [FxaActiveDailyUsers.config]
    message_variable = "Fields[uid]"
    title = "Estimated Active Daily Users"
    preservation_version = 0

Outputs

Common Output Parameters

There are some configuration options that are universally available to all Heka output plugins. These will be consumed by Heka itself when Heka initializes the plugin and do not need to be handled by the plugin-specific initialization code.

  • message_matcher (string, optional):

    Boolean expression, when evaluated to true passes the message to the filter for processing. Defaults to matching nothing. See: Message Matcher Syntax

  • message_signer (string, optional):

    The name of the message signer. If specified only messages with this signer are passed to the filter for processing.

  • ticker_interval (uint, optional):

    Frequency (in seconds) that a timer event will be sent to the filter. Defaults to not sending timer events.

  • encoder (string, optional):

    Encoder to be used by the output. This should refer to the name of an encoder plugin section that is specified elsewhere in the TOML configuration. Messages can be encoded using the specified encoder by calling the OutputRunner’s Encode() method.

  • use_framing (bool, optional):

    Specifies whether or not Heka’s Stream Framing should be applied to the binary data returned from the OutputRunner’s Encode() method.

AMQPOutput

Connects to a remote AMQP broker (RabbitMQ) and sends messages to the specified queue. The message is serialized if specified, otherwise only the raw payload of the message will be sent. As AMQP is dynamically programmable, the broker topology needs to be specified.

Config:

  • url (string):

    An AMQP connection string formatted per the RabbitMQ URI Spec.

  • exchange (string):

    AMQP exchange name

  • exchange_type (string):

    AMQP exchange type (fanout, direct, topic, or headers).

  • exchange_durability (bool):

    Whether the exchange should be configured as a durable exchange. Defaults to non-durable.

  • exchange_auto_delete (bool):

    Whether the exchange is deleted when all queues have finished and there is no publishing. Defaults to auto-delete.

  • routing_key (string):

    The message routing key used to bind the queue to the exchange. Defaults to empty string.

  • persistent (bool):

    Whether published messages should be marked as persistent or transient. Defaults to non-persistent.

  • retries (RetryOptions, optional):

    A sub-section that specifies the settings to be used for restart behavior. See Configuring Restarting Behavior

New in version 0.6.

  • content_type (string):

    MIME content type of the payload used in the AMQP header. Defaults to “application/hekad”.

  • encoder (string, optional)

    Specifies which of the registered encoders should be used for converting Heka messages to binary data that is sent out over the AMQP connection. Defaults to the always available “ProtobufEncoder”.

  • use_framing (bool, optional):

    Specifies whether or not the encoded data sent out over the TCP connection should be delimited by Heka’s Stream Framing. Defaults to true.

New in version 0.6.

  • tls (TlsConfig):

    An optional sub-section that specifies the settings to be used for any SSL/TLS encryption. This will only have any impact if URL uses the AMQPS URI scheme. See Configuring TLS.

Example (that sends log lines from the logger):

[AMQPOutput]
url = "amqp://guest:guest@rabbitmq/"
exchange = "testout"
exchange_type = "fanout"
message_matcher = 'Logger == "TestWebserver"'

CarbonOutput

CarbonOutput plugins parse the “stat metric” messages generated by a StatAccumulator and write the extracted counter, timer, and gauge data out to a graphite compatible carbon daemon. Output is written over a TCP or UDP socket using the plaintext protocol.

Config:

  • address (string):

    An IP address:port on which this plugin will write to. (default: “localhost:2003”)

New in version 0.5.

  • protocol (string):

    “tcp” or “udp” (default: “tcp”)

  • tcp_keep_alive (bool)

    if set, keep the TCP connection open and reuse it until a failure; then retry (default: false)

Example:

[CarbonOutput]
message_matcher = "Type == 'heka.statmetric'"
address = "localhost:2003"
protocol = "udp"

DashboardOutput

Specialized output plugin that listens for certain Heka reporting message types and generates JSON data which is made available via HTTP for use in web based dashboards and health reports.

Config:

  • ticker_interval (uint):

    Specifies how often, in seconds, the dashboard files should be updated. Defaults to 5.

  • message_matcher (string):

    Defaults to “Type == ‘heka.all-report’ || Type == ‘heka.sandbox-output’ || Type == ‘heka.sandbox-terminated’”. Not recommended to change this unless you know what you’re doing.

  • address (string):

    An IP address:port on which we will serve output via HTTP. Defaults to “0.0.0.0:4352”.

  • working_directory (string):

    File system directory into which the plugin will write data files and from which it will serve HTTP. The Heka process must have read / write access to this directory. Relative paths will be evaluated relative to the Heka base directory. Defaults to $(BASE_DIR)/dashboard.

  • static_directory (string):

    File system directory where the Heka dashboard source code can be found. The Heka process must have read access to this directory. Relative paths will be evaluated relative to the Heka base directory. Defaults to ${SHARE_DIR}/dasher.

New in version 0.7.

  • headers (subsection, optional):

    It is possible to inject arbitrary HTTP headers into each outgoing response by adding a TOML subsection entitled “headers” to you HttpOutput config section. All entries in the subsection must be a list of string values.

Example:

[DashboardOutput]
ticker_interval = 30

ElasticSearchOutput

Output plugin that uses HTTP or UDP to insert records into an ElasticSearch database. Note that it is up to the specified encoder to both serialize the message into a JSON structure and to prepend that with the appropriate ElasticSearch BulkAPI indexing JSON. Usually this output is used in conjunction with an ElasticSearch-specific encoder plugin, such as ESJsonEncoder, ESLogstashV0Encoder, or ESPayloadEncoder.

Config:

  • flush_interval (int):

    Interval at which accumulated messages should be bulk indexed into ElasticSearch, in milliseconds. Defaults to 1000 (i.e. one second).

  • flush_count (int):

    Number of messages that, if processed, will trigger them to be bulk indexed into ElasticSearch. Defaults to 10.

  • server (string):

    ElasticSearch server URL. Supports http://, https:// and udp:// urls. Defaults to “http://localhost:9200”.

  • http_timeout (int):

    Time in milliseconds to wait for a response for each http post to ES. This may drop data as there is currently no retry. Default is 0 (no timeout).

  • http_disable_keepalives (bool):

    Specifies whether or not re-using of established TCP connections to ElasticSearch should be disabled. Defaults to false, that means using both HTTP keep-alive mode and TCP keep-alives. Set it to true to close each TCP connection after ‘flushing’ messages to ElasticSearch.

Example:

[ElasticSearchOutput]
message_matcher = "Type == 'sync.log'"
server = "http://es-server:9200"
flush_interval = 5000
flush_count = 10
encoder = "ESJsonEncoder"

FileOutput

Writes message data out to a file system.

Config:

  • path (string):

    Full path to the output file.

  • perm (string, optional):

    File permission for writing. A string of the octal digit representation. Defaults to “644”.

  • folder_perm (string, optional):

    Permissions to apply to directories created for FileOutput’s parent directory if it doesn’t exist. Must be a string representation of an octal integer. Defaults to “700”.

  • flush_interval (uint32, optional):

    Interval at which accumulated file data should be written to disk, in milliseconds (default 1000, i.e. 1 second). Set to 0 to disable.

  • flush_count (uint32, optional):

    Number of messages to accumulate until file data should be written to disk (default 1, minimum 1).

  • flush_operator (string, optional):

    Operator describing how the two parameters “flush_interval” and “flush_count” are combined. Allowed values are “AND” or “OR” (default is “AND”).

New in version 0.6.

  • use_framing (bool, optional):

    Specifies whether or not the encoded data sent out over the TCP connection should be delimited by Heka’s Stream Framing. Defaults to true if a ProtobufEncoder is used, false otherwise.

Example:

[counter_file]
type = "FileOutput"
message_matcher = "Type == 'heka.counter-output'"
path = "/var/log/heka/counter-output.log"
prefix_ts = true
perm = "666"
flush_count = 100
flush_operator = "OR"
encoder = "PayloadEncoder"

New in version 0.6.

HttpOutput

A very simple output plugin that uses HTTP GET, POST, or PUT requests to deliver data to an HTTP endpoint. When using POST or PUT request methods the encoded output will be uploaded as the request body. When using GET the encoded output will be ignored.

This output doesn’t support any request batching; each received message will generate an HTTP request. Batching can be achieved by use of a filter plugin that accumulates message data, periodically emitting a single message containing the batched, encoded HTTP request data in the payload. An HttpOutput can then be configured to capture these batch messages, using a PayloadEncoder to extract the message payload.

For now the HttpOutput only supports statically defined request parameters (URL, headers, auth, etc.). Future iterations will provide a mechanism for dynamically specifying these values on a per-message basis.

Config:

  • address (string):

    URL address of HTTP server to which requests should be sent. Must begin with “http://” or “https://”.

  • method (string, optional):

    HTTP request method to use, must be one of GET, POST, or PUT. Defaults to POST.

  • username (string, optional):

    If specified, HTTP Basic Auth will be used with the provided user name.

  • password (string, optional):

    If specified, HTTP Basic Auth will be used with the provided password.

  • headers (subsection, optional):

    It is possible to inject arbitrary HTTP headers into each outgoing request by adding a TOML subsection entitled “headers” to you HttpOutput config section. All entries in the subsection must be a list of string values.

  • tls (subsection, optional):

    A sub-section that specifies the settings to be used for any SSL/TLS encryption. This will only have any impact if an “https://” address is used. See Configuring TLS.

Example:

[PayloadEncoder]

[influxdb]
message_matcher = "Type == 'influx.formatted'"
address = "http://influxdb.example.com:8086/db/stats/series"
encoder = "PayloadEncoder"
username = "MyUserName"
password = "MyPassword"

IrcOutput

Connects to an Irc Server and sends messages to the specified Irc channels. Output is encoded using the specified encoder, and expects output to be properly truncated to fit within the bounds of an Irc message before being receiving the output.

Config:

  • server (string):

    A host:port of the irc server that Heka will connect to for sending output.

  • nick (string):

    Irc nick used by Heka.

  • ident (string):

    The Irc identity used to login with by Heka.

  • password (string, optional):

    The password used to connect to the Irc server.

  • channels (list of strings):

    A list of Irc channels which every matching Heka message is sent to. If there is a space in the channel string, then the part after the space is expected to be a password for a protected irc channel.

  • timeout (uint, optional):

    The maximum amount of time (in seconds) to wait before timing out when connect, reading, or writing to the Irc server. Defaults to 10.

  • tls (TlsConfig, optional):

    A sub-section that specifies the settings to be used for any SSL/TLS encryption. This will only have any impact if use_tls is set to true. See Configuring TLS.

  • queue_size (uint, optional):

    This is the maximum amount of messages Heka will queue per Irc channel before discarding messages. There is also a queue of the same size used if all per-irc channel queues are full. This is used when Heka is unable to send a message to an Irc channel, such as when it hasn’t joined or has been disconnected. Defaults to 100.

  • rejoin_on_kick (bool, optional):

    Set this if you want Heka to automatically re-join an Irc channel after being kicked. If not set, and Heka is kicked, it will not attempt to rejoin ever. Defaults to false.

  • ticker_interval (uint, optional):

    How often (in seconds) heka should send a message to the server. This is on a per message basis, not per channel. Defaults to 2.

  • time_before_reconnect (uint, optional):

    How long to wait (in seconds) before reconnecting to the Irc server after being disconnected. Defaults to 3.

  • time_before_rejoin (uint, optional):

    How long to wait (in seconds) before attempting to rejoin an Irc channel which is full. Defaults to 3.

  • max_join_retries (uint, optional):

    The maximum amount of attempts Heka will attempt to join an Irc channel before giving up. After attempts are exhausted, Heka will no longer attempt to join the channel. Defaults to 3.

  • verbose_irc_logging (bool, optional):

    Enable to see raw internal message events Heka is receiving from the server. Defaults to false.

  • encoder (string):

    Specifies which of the registered encoders should be used for converting Heka messages into what is sent to the irc channels.

  • retries (RetryOptions, optional):

    A sub-section that specifies the settings to be used for restart behavior. See Configuring Restarting Behavior

Example:

[IrcOutput]
message_matcher = 'Type == "alert"'
encoder = "PayloadEncoder"
server = "irc.mozilla.org:6667"
nick = "heka_bot"
ident = "heka_ident"
channels = [ "#heka_bot_irc testkeypassword" ]
rejoin_on_kick = true
queue_size = 200
ticker_interval = 1

LogOutput

Logs messages to stdout using Go’s log package.

Config:

<none>

Example:

[counter_output]
type = "LogOutput"
message_matcher = "Type == 'heka.counter-output'"
encoder = "PayloadEncoder"

NagiosOutput

Specialized output plugin that listens for Nagios external command message types and delivers passive service check results to Nagios using either HTTP requests made to the Nagios cmd.cgi API or the use of the send_ncsa binary. The message payload must consist of a state followed by a colon and then the message e.g., “OK:Service is functioning properly”. The valid states are: OK|WARNING|CRITICAL|UNKNOWN. Nagios must be configured with a service name that matches the Heka plugin instance name and the hostname where the plugin is running.

Config:

  • url (string, optional):

    An HTTP URL to the Nagios cmd.cgi. Defaults to http://localhost/nagios/cgi-bin/cmd.cgi.

  • username (string, optional):

    Username used to authenticate with the Nagios web interface. Defaults to empty string.

  • password (string, optional):

    Password used to authenticate with the Nagios web interface. Defaults to empty string.

  • response_header_timeout (uint, optional):

    Specifies the amount of time, in seconds, to wait for a server’s response headers after fully writing the request. Defaults to 2.

  • nagios_service_description (string, optional):

    Must match Nagios service’s service_description attribute. Defaults to the name of the output.

  • nagios_host (string, optional):

    Must match the hostname of the server in nagios. Defaults to the Hostname attribute of the message.

  • send_nsca_bin (string, optional):

    New in version 0.5.

    Use send_nsca program, as provided, rather than sending HTTP requests. Not supplying this value means HTTP will be used, and any other send_nsca_* settings will be ignored.

  • send_nsca_args ([]string, optional):

    New in version 0.5.

    Arguments to use with send_nsca, usually at least the nagios hostname, e.g. [“-H”, “nagios.somehost.com”]. Defaults to an empty list.

  • send_nsca_timeout (int, optional):

    New in version 0.5.

    Timeout for the send_nsca command, in seconds. Defaults to 5.

  • use_tls (bool, optional):

    New in version 0.5.

    Specifies whether or not SSL/TLS encryption should be used for the TCP connections. Defaults to false.

  • tls (TlsConfig, optional):

    New in version 0.5.

    A sub-section that specifies the settings to be used for any SSL/TLS encryption. This will only have any impact if use_tls is set to true. See Configuring TLS.

Example configuration to output alerts from SandboxFilter plugins:

[NagiosOutput]
url = "http://localhost/nagios/cgi-bin/cmd.cgi"
username = "nagiosadmin"
password = "nagiospw"
message_matcher = "Type == 'heka.sandbox-output' && Fields[payload_type] == 'nagios-external-command' && Fields[payload_name] == 'PROCESS_SERVICE_CHECK_RESULT'"

Example Lua code to generate a Nagios alert:

inject_payload("nagios-external-command", "PROCESS_SERVICE_CHECK_RESULT", "OK:Alerts are working!")

SmtpOutput

New in version 0.5.

Outputs a Heka message in an email. The message subject is the plugin name and the message content is controlled by the payload_only setting. The primary purpose is for email alert notifications e.g., PagerDuty.

Config:

  • send_from (string)

    The email address of the sender. (default: “heka@localhost.localdomain”)

  • send_to (array of strings)

    An array of email addresses where the output will be sent to.

  • subject (string)

    Custom subject line of email. (default: “Heka [SmtpOutput]”)

  • host (string)

    SMTP host to send the email to (default: “127.0.0.1:25”)

  • auth (string)

    SMTP authentication type: “none”, “Plain”, “CRAMMD5” (default: “none”)

  • user (string, optional)

    SMTP user name

  • password (string, optional)

    SMTP user password

Example:

[FxaAlert]
type = "SmtpOutput"
message_matcher = "((Type == 'heka.sandbox-output' && Fields[payload_type] == 'alert') || Type == 'heka.sandbox-terminated') && Logger =~ /^Fxa/"
send_from = "heka@example.com"
send_to = ["alert@example.com"]
auth = "Plain"
user = "test"
password = "testpw"
host = "localhost:25"
encoder = "AlertEncoder"

TcpOutput

Output plugin that delivers Heka message data to a listening TCP connection. Can be used to deliver messages from a local running Heka agent to a remote Heka instance set up as an aggregator and/or router, or to any other arbitrary listening TCP server that knows how to process the encoded data.

Config:

  • address (string):

    An IP address:port to which we will send our output data.

  • use_tls (bool, optional):

    Specifies whether or not SSL/TLS encryption should be used for the TCP connections. Defaults to false.

New in version 0.5.

  • tls (TlsConfig, optional):

    A sub-section that specifies the settings to be used for any SSL/TLS encryption. This will only have any impact if use_tls is set to true. See Configuring TLS.

  • ticker_interval (uint, optional):

    Specifies how often, in seconds, the output queue files are rolled. Defaults to 300.

New in version 0.6.

  • local_address (string, optional):

    A local IP address to use as the source address for outgoing traffic to this destination. Cannot currently be combined with TLS connections.

  • encoder (string, optional):

    Specifies which of the registered encoders should be used for converting Heka messages to binary data that is sent out over the TCP connection. Defaults to the always available “ProtobufEncoder”.

  • use_framing (bool, optional):

    Specifies whether or not the encoded data sent out over the TCP connection should be delimited by Heka’s Stream Framing. Defaults to true if a ProtobufEncoder is used, false otherwise.

  • keep_alive (bool):

    Specifies whether or not TCP keepalive should be used for established TCP connections. Defaults to false.

  • keep_alive_period (int):

    Time duration in seconds that a TCP connection will be maintained before keepalive probes start being sent. Defaults to 7200 (i.e. 2 hours).

Example:

[aggregator_output]
type = "TcpOutput"
address = "heka-aggregator.mydomain.com:55"
local_address = "127.0.0.1"
message_matcher = "Type != 'logfile' && Type != 'heka.counter-output' && Type != 'heka.all-report'"

New in version 0.7.

UdpOutput

Output plugin that delivers Heka message data to a specified UDP or Unix datagram socket location.

Config:

  • net (string, optional):

    Network type to use for communication. Must be one of “udp”, “udp4”, “udp6”, or “unixgram”. “unixgram” option only available on systems that support Unix datagram sockets. Defaults to “udp”.

  • address (string):

    Address to which we will be sending the data. Must be IP:port for net types of “udp”, “udp4”, or “udp6”. Must be a path to a Unix datagram socket file for net type “unixgram”.

  • local_address (string, optional):

    Local address to use on the datagram packets being generated. Must be IP:port for net types of “udp”, “udp4”, or “udp6”. Must be a path to a Unix datagram socket file for net type “unixgram”.

  • encoder (string):

    Name of registered encoder plugin that will extract and/or serialized data from the Heka message.

Example:

[PayloadEncoder]

[UdpOutput]
address = "myserver.example.com:34567"
encoder = "PayloadEncoder"

WhisperOutput

WhisperOutput plugins parse the “statmetric” messages generated by a StatAccumulator and write the extracted counter, timer, and gauge data out to a graphite compatible whisper database file tree structure.

Config:

  • base_path (string):

    Path to the base directory where the whisper file tree will be written. Absolute paths will be honored, relative paths will be calculated relative to the Heka base directory. Defaults to “whisper” (i.e. “$(BASE_DIR)/whisper”).

  • default_agg_method (int):

    Default aggregation method to use for each whisper output file. Supports the following values:

    1. Unknown aggregation method.
    2. Aggregate using averaging. (default)
    3. Aggregate using summation.
    4. Aggregate using last received value.
    5. Aggregate using maximum value.
    6. Aggregate using minimum value.
  • default_archive_info ([][]int):

    Default specification for new whisper db archives. Should be a sequence of 3-tuples, where each tuple describes a time interval’s storage policy: [<offset> <# of secs per datapoint> <# of datapoints>] (see whisper docs for more info). Defaults to:

    [ [0, 60, 1440], [0, 900, 8], [0, 3600, 168], [0, 43200, 1456]]
    

    The above defines four archive sections. The first uses 60 seconds for each of 1440 data points, which equals one day of retention. The second uses 15 minutes for each of 8 data points, for two hours of retention. The third uses one hour for each of 168 data points, or 7 days of retention. Finally, the fourth uses 12 hours for each of 1456 data points, representing two years of data.

  • folder_perm (string):

    Permission mask to be applied to folders created in the whisper database file tree. Must be a string representation of an octal integer. Defaults to “700”.

Example:

[WhisperOutput]
message_matcher = "Type == 'heka.statmetric'"
default_agg_method = 3
default_archive_info = [ [0, 30, 1440], [0, 900, 192], [0, 3600, 168], [0, 43200, 1456] ]
folder_perm = "755"

See Also

hekad(1), hekad.config(5)