mozilla

New in version 0.6.

Encoders

Alert Encoder

Produces more human readable alert messages.

Config:

<none>

Example Heka Configuration

[FxaAlert]
type = "SmtpOutput"
message_matcher = "Type == 'heka.sandbox-output' && Fields[payload_type] == 'alert' && Logger =~ /^Fxa/" || Type == 'heka.sandbox-terminated' && Fields[plugin] =~ /^Fxa/"
send_from = "heka@example.com"
send_to = ["alert@example.com"]
auth = "Plain"
user = "test"
password = "testpw"
host = "localhost:25"
encoder = "AlertEncoder"

[AlertEncoder]
type = "SandboxEncoder"
filename = "lua_encoders/alert.lua"

Example Output

Timestamp:2014-05-14T14:20:18Z
Hostname:ip-10-226-204-51
Plugin:FxaBrowserIdHTTPStatus
Alert:HTTP Status - algorithm: roc col: 1 msg: detected anomaly, standard deviation exceeds 1.5

CBUF Librato Encoder

New in version 0.8.

Extracts data from SandboxFilter circular buffer output messages and uses it to generate time series JSON structures that will be accepted by Librato’s POST API. It will keep track of the last time it’s seen a particular message, keyed by filter name and output name. The first time it sees a new message, it will send data from all of the rows except the last one, which is possibly incomplete. For subsequent messages, the encoder will automatically extract data from all of the rows that have elapsed since the last message was received.

The SandboxEncoder preserve_data setting should be set to true when using this encoder, or else the list of received messages will be lost whenever Heka is restarted, possibly causing the same data rows to be sent to Librato multiple times.

Config:

<none>

Example Heka Configuration

[cbuf_librato_encoder]
type = "SandboxEncoder"
filename = "lua_encoders/cbuf_librato"
preserve_data = true

[librato]
type = "HttpOutput"
message_matcher = "Type == 'heka.sandbox-output && Fields[payload_type] == 'cbuf'"
encoder = "cbuf_librato_encoder"
address = "https://metrics-api.librato.com/v1/metrics"
username = "username@example.com"
password = "SECRET"
    [librato.headers]
    Content-Type = ["application/json"]

Example Output

{"gauges":[{"value":12,"measure_time":1410824950,"name":"HTTP_200","source":"thor"},{"value":1,"measure_time":1410824950,"name":"HTTP_300","source":"thor"},{"value":1,"measure_time":1410824950,"name":"HTTP_400","source":"thor"}]}

ESJsonEncoder

This encoder serializes a Heka message into a clean JSON format, preceded by a separate JSON structure containing information required for ElasticSearch BulkAPI indexing. The JSON serialization is done by hand, without the use of Go’s stdlib JSON marshalling. This is so serialization can succeed even if the message contains invalid UTF-8 characters, which will be encoded as U+FFFD.

Config:

  • index (string):

    Name of the ES index into which the messages will be inserted. Supports interpolation of message field values (from ‘Type’, ‘Hostname’, ‘Pid’, ‘UUID’, ‘Logger’, ‘EnvVersion’, ‘Severity’, a field name, or a timestamp format) with the use of ‘%{}’ chars, so ‘%{Hostname}-%{Logger}-data’ would add the records to an ES index called ‘some.example.com-processname-data’. Defaults to ‘heka-%{2006.01.02}’.

  • type_name (string):

    Name of ES record type to create. Supports interpolation of message field values (from ‘Type’, ‘Hostname’, ‘Pid’, ‘UUID’, ‘Logger’, ‘EnvVersion’, ‘Severity’, field name, or a timestamp format) with the use of ‘%{}’ chars, so ‘%{Hostname}-stat’ would create an ES record with a type of ‘some.example.com-stat’. Defaults to ‘message’.

  • fields ([]string):

    The ‘fields’ parameter specifies that only specific message data should be indexed into ElasticSearch. Available fields to choose are “Uuid”, “Timestamp”, “Type”, “Logger”, “Severity”, “Payload”, “EnvVersion”, “Pid”, “Hostname”, and “Fields” (where “Fields” causes the inclusion of any and all dynamically specified message fields. Defaults to including all of the supported message fields.

  • timestamp (string):

    Format to use for timestamps in generated ES documents. Defaults to “2006-01-02T15:04:05.000Z”.

  • es_index_from_timestamp (bool):

    When generating the index name use the timestamp from the message instead of the current time. Defaults to false.

  • id (string):

    Allows you to optionally specify the document id for ES to use. Useful for overwriting existing ES documents. If the value specified is placed within %{}, it will be interpolated to its Field value. Default is allow ES to auto-generate the id.

  • raw_bytes_fields ([]string):

    This specifies a set of fields which will be passed through to the encoded JSON output without any processing or escaping. This is useful for fields which contain embedded JSON objects to prevent the embedded JSON from being escaped as normal strings. Only supports dynamically specified message fields.

Example

[ESJsonEncoder]
index = "%{Type}-%{2006.01.02}"
es_index_from_timestamp = true
type_name = "%{Type}"

[ElasticSearchOutput]
message_matcher = "Type == 'nginx.access'"
encoder = "ESJsonEncoder"
flush_interval = 50

ESLogstashV0Encoder

This encoder serializes a Heka message into a JSON format, preceded by a separate JSON structure containing information required for ElasticSearch BulkAPI indexing. The message JSON structure uses the original (i.e. “v0”) schema popularized by Logstash. Using this schema can aid integration with existing Logstash deployments. This schema also plays nicely with the default Logstash dashboard provided by Kibana.

The JSON serialization is done by hand, without using Go’s stdlib JSON marshalling. This is so serialization can succeed even if the message contains invalid UTF-8 characters, which will be encoded as U+FFFD.

Config:

  • index (string):

    Name of the ES index into which the messages will be inserted. Supports interpolation of message field values (from ‘Type’, ‘Hostname’, ‘Pid’, ‘UUID’, ‘Logger’, ‘EnvVersion’, ‘Severity’, a field name, or a timestamp format) with the use of ‘%{}’ chars, so ‘%{Hostname}-%{Logger}-data’ would add the records to an ES index called ‘some.example.com-processname-data’. Defaults to ‘logstash-%{2006.01.02}’.

  • type_name (string):

    Name of ES record type to create. Supports interpolation of message field values (from ‘Type’, ‘Hostname’, ‘Pid’, ‘UUID’, ‘Logger’, ‘EnvVersion’, ‘Severity’, field name, or a timestamp format) with the use of ‘%{}’ chars, so ‘%{Hostname}-stat’ would create an ES record with a type of ‘some.example.com-stat’. Defaults to ‘message’.

  • use_message_type (bool):

    If false, the generated JSON’s @type value will match the ES record type specified in the type_name setting. If true, the message’s Type value will be used as the @type value instead. Defaults to false.

  • fields ([]string):

    The ‘fields’ parameter specifies that only specific message data should be indexed into ElasticSearch. Available fields to choose are “Uuid”, “Timestamp”, “Type”, “Logger”, “Severity”, “Payload”, “EnvVersion”, “Pid”, “Hostname”, and “Fields” (where “Fields” causes the inclusion of any and all dynamically specified message fields. Defaults to including all of the supported message fields. The “Payload” field is sent to ElasticSearch as “@message”.

  • es_index_from_timestamp (bool):

    When generating the index name use the timestamp from the message instead of the current time. Defaults to false.

  • id (string):

    Allows you to optionally specify the document id for ES to use. Useful for overwriting existing ES documents. If the value specified is placed within %{}, it will be interpolated to its Field value. Default is allow ES to auto-generate the id.

  • raw_bytes_fields ([]string):

    This specifies a set of fields which will be passed through to the encoded JSON output without any processing or escaping. This is useful for fields which contain embedded JSON objects to prevent the embedded JSON from being escaped as normal strings. Only supports dynamically specified message fields.

Example

[ESLogstashV0Encoder]
es_index_from_timestamp = true
type_name = "%{Type}"

[ElasticSearchOutput]
message_matcher = "Type == 'nginx.access'"
encoder = "ESLogstashV0Encoder"
flush_interval = 50

ESPayloadEncoder

Prepends ElasticSearch BulkAPI index JSON to a message payload.

Config:

  • index (string, optional, default “heka-%{%Y.%m.%d}”)

    String to use as the _index key’s value in the generated JSON. Supports field interpolation as described below.

  • type_name (string, optional, default “message”)

    String to use as the _type key’s value in the generated JSON. Supports field interpolation as described below.

  • id (string, optional)

    String to use as the _id key’s value in the generated JSON. Supports field interpolation as described below.

  • es_index_from_timestamp (boolean, optional)

    If true, then any time interpolation (often used to generate the ElasticSeach index) will use the timestamp from the processed message rather than the system time.

Field interpolation:

Data from the current message can be interpolated into any of the string arguments listed above. A %{} enclosed field name will be replaced by the field value from the current message. Supported default field names are “Type”, “Hostname”, “Pid”, “UUID”, “Logger”, “EnvVersion”, and “Severity”. Any other values will be checked against the defined dynamic message fields. If no field matches, then a C strftime (on non-Windows platforms) or C89 strftime (on Windows) time substitution will be attempted.

Example Heka Configuration

[es_payload]
type = "SandboxEncoder"
filename = "lua_encoders/es_payload.lua"
    [es_payload.config]
    es_index_from_timestamp = true
    index = "%{Logger}-%{%Y.%m.%d}"
    type_name = "%{Type}-%{Hostname}"

[ElasticSearchOutput]
message_matcher = "Type == 'mytype'"
encoder = "es_payload"

Example Output

{"index":{"_index":"mylogger-2014.06.05","_type":"mytype-host.domain.com"}}
{"json":"data","extracted":"from","message":"payload"}

PayloadEncoder

The PayloadEncoder simply extracts the payload from the provided Heka message and converts it into a byte stream for delivery to an external resource.

Config:

  • append_newlines (bool, optional):

    Specifies whether or not a newline character (i.e. n) will be appended to the captured message payload before serialization. Defaults to true.

  • prefix_ts (bool, optional):

    Specifies whether a timestamp will be prepended to the captured message payload before serialization. Defaults to false.

  • ts_from_message (bool, optional):

    If true, the prepended timestamp will be extracted from the message that is being processed. If false, the prepended timestamp will be generated by the system clock at the time of message processing. Defaults to true. This setting has no impact if prefix_ts is set to false.

  • ts_format (string, optional):

    Specifies the format that should be used for prepended timestamps, using Go’s standard time format specification strings. Defaults to [2006/Jan/02:15:04:05 -0700]. If the specified format string does not end with a space character, then a space will be inserted between the formatted timestamp and the payload.

Example

[PayloadEncoder]
append_newlines = false
prefix_ts = true
ts_format = "2006/01/02 3:04:05PM MST"

ProtobufEncoder

The ProtobufEncoder is used to serialize Heka message objects back into Heka’s standard protocol buffers format. This is the format that Heka uses to communicate with other Heka instances, so one will always be included in your Heka configuration using the default “ProtobufEncoder” name whether specified or not.

The hekad protocol buffers message schema is defined in the message.proto file in the message package.

Config:

<none>

Example:

[ProtobufEncoder]

RstEncoder

The RstEncoder generates a reStructuredText rendering of a Heka message, including all fields and attributes. It is useful for debugging, especially when coupled with a LogOutput.

Config:

<none>

Example:

[RstEncoder]

[LogOutput]
message_matcher = "TRUE"
encoder = "RstEncoder"

SandboxEncoder

The SandboxEncoder provides an isolated execution environment for converting messages into binary data without the need to recompile Heka. See Sandbox.

Config:

Example

[custom_json_encoder]
type = "SandboxEncoder"
filename = "path/to/custom_json_encoder.lua"

    [custom_json_encoder.config]
    msg_fields = ["field1", "field2"]

Schema InfluxDB Encoder

New in version 0.8.

Converts full Heka message contents to JSON for InfluxDB HTTP API. Includes all standard message fields and iterates through all of the dynamically specified fields, skipping any bytes fields or any fields explicitly omitted using the skip_fields config option.

Config:

  • series (string, optional, default “series”)

    String to use as the series key’s value in the generated JSON. Supports interpolation of field values from the processed message, using %{fieldname}. Any fieldname values of “Type”, “Payload”, “Hostname”, “Pid”, “Logger”, “Severity”, or “EnvVersion” will be extracted from the the base message schema, any other values will be assumed to refer to a dynamic message field. Only the first value of the first instance of a dynamic message field can be used for series name interpolation. If the dynamic field doesn’t exist, the uninterpolated value will be left in the series name. Note that it is not possible to interpolate either the “Timestamp” or the “Uuid” message fields into the series name, those values will be interpreted as referring to dynamic message fields.

  • skip_fields (string, optional, default “”)

    Space delimited set of fields that should not be included in the InfluxDB records being generated. Any fieldname values of “Type”, “Payload”, “Hostname”, “Pid”, “Logger”, “Severity”, or “EnvVersion” will be assumed to refer to the corresponding field from the base message schema, any other values will be assumed to refer to a dynamic message field.

Example Heka Configuration

[influxdb]
type = "SandboxEncoder"
filename = "lua_encoders/influxdb.lua"
    [influxdb.config]
    series = "heka.%{Logger}"
    skip_fields = "Pid EnvVersion"

[InfluxOutput]
message_matcher = "Type == 'influxdb'"
encoder = "influxdb"
type = "HttpOutput"
address = "http://influxdbserver.example.com:8086/db/databasename/series"
username = "influx_username"
password = "influx_password"

Example Output

[{"points":[[1.409378221e+21,"log","test","systemName","TcpInput",5,1,"test"]],"name":"heka.MyLogger","columns":["Time","Type","Payload","Hostname","Logger","Severity","syslogfacility","programname"]}]

StatMetric InfluxDB Encoder

New in version 0.7.

Extracts data from message fields in heka.statmetric messages generated by a StatAccumInput and generates JSON suitable for use with InfluxDB’s HTTP API. StatAccumInput must be configured with emit_in_fields = true for this encoder to work correctly.

Config:

<none>

Example Heka Configuration

[statmetric-influx-encoder]
type = "SandboxEncoder"
filename = "lua_encoders/statmetric_influx.lua"

[influx]
type = "HttpOutput"
message_matcher = "Type == 'heka.statmetric'"
address = "http://myinfluxserver.example.com:8086/db/stats/series"
encoder = "statmetric-influx-encoder"
username = "influx_username"
password = "influx_password"

Example Output

[{"points":[[1408404848,78271]],"name":"stats.counters.000000.rate","columns":["time","value"]},{"points":[[1408404848,78271]],"name":"stats.counters.000000.count","columns":["time","value"]},{"points":[[1408404848,17420]],"name":"stats.timers.000001.count","columns":["time","value"]},{"points":[[1408404848,17420]],"name":"stats.timers.000001.count_ps","columns":["time","value"]},{"points":[[1408404848,1]],"name":"stats.timers.000001.lower","columns":["time","value"]},{"points":[[1408404848,1024]],"name":"stats.timers.000001.upper","columns":["time","value"]},{"points":[[1408404848,8937851]],"name":"stats.timers.000001.sum","columns":["time","value"]},{"points":[[1408404848,513.07985074627]],"name":"stats.timers.000001.mean","columns":["time","value"]},{"points":[[1408404848,461.72356167879]],"name":"stats.timers.000001.mean_90","columns":["time","value"]},{"points":[[1408404848,925]],"name":"stats.timers.000001.upper_90","columns":["time","value"]},{"points":[[1408404848,2]],"name":"stats.statsd.numStats","columns":["time","value"]}]