mozilla

New in version 0.6.

Encoders

Alert Encoder

Produces more human readable alert messages.

Config:

<none>

Example Heka Configuration

[FxaAlert]
type = "SmtpOutput"
message_matcher = "((Type == 'heka.sandbox-output' && Fields[payload_type] == 'alert') || Type == 'heka.sandbox-terminated') && Logger =~ /^Fxa/"
send_from = "heka@example.com"
send_to = ["alert@example.com"]
auth = "Plain"
user = "test"
password = "testpw"
host = "localhost:25"
encoder = "AlertEncoder"

[AlertEncoder]
type = "SandboxEncoder"
filename = "lua_encoders/alert.lua"

Example Output

Timestamp:2014-05-14T14:20:18Z
Hostname:ip-10-226-204-51
Plugin:FxaBrowserIdHTTPStatus
Alert:HTTP Status - algorithm: roc col: 1 msg: detected anomaly, standard deviation exceeds 1.5

ESJsonEncoder

This encoder serializes a Heka message into a clean JSON format, preceded by a separate JSON structure containing information required for ElasticSearch BulkAPI indexing. The JSON serialization is done by hand, without the use of Go’s stdlib JSON marshalling. This is so serialization can succeed even if the message contains invalid UTF-8 characters, which will be encoded as U+FFFD.

Config:

  • index (string):

    Name of the ES index into which the messages will be inserted. Supports interpolation of message field values (from ‘Type’, ‘Hostname’, ‘Pid’, ‘UUID’, ‘Logger’, ‘EnvVersion’, ‘Severity’, a field name, or a timestamp format) with the use of ‘%{}’ chars, so ‘%{Hostname}-%{Logger}-data’ would add the records to an ES index called ‘some.example.com-processname-data’. Defaults to ‘heka-%{2006.01.02}’.

  • type_name (string):

    Name of ES record type to create. Supports interpolation of message field values (from ‘Type’, ‘Hostname’, ‘Pid’, ‘UUID’, ‘Logger’, ‘EnvVersion’, ‘Severity’, field name, or a timestamp format) with the use of ‘%{}’ chars, so ‘%{Hostname}-stat’ would create an ES record with a type of ‘some.example.com-stat’. Defaults to ‘message’.

  • fields ([]string):

    The ‘fields’ parameter specifies that only specific message data should be indexed into ElasticSearch. Available fields to choose are “Uuid”, “Timestamp”, “Type”, “Logger”, “Severity”, “Payload”, “EnvVersion”, “Pid”, “Hostname”, and “Fields” (where “Fields” causes the inclusion of any and all dynamically specified message fields. Defaults to including all of the supported message fields.

  • timestamp (string):

    Format to use for timestamps in generated ES documents. Defaults to “2006-01-02T15:04:05.000Z”.

  • es_index_from_timestamp (bool):

    When generating the index name use the timestamp from the message instead of the current time. Defaults to false.

  • id (string):

    Allows you to optionally specify the document id for ES to use. Useful for overwriting existing ES documents. If the value specified is placed within %{}, it will be interpolated to its Field value. Default is allow ES to auto-generate the id.

  • raw_bytes_fields ([]string):

    This specifies a set of fields which will be passed through to the encoded JSON output without any processing or escaping. This is useful for fields which contain embedded JSON objects to prevent the embedded JSON from being escaped as normal strings. Only supports dynamically specified message fields.

Example

[ESJsonEncoder]
index = "%{Type}-%{2006.01.02}"
es_index_from_timestamp = true
type_name = "%{Type}"

[ElasticSearchOutput]
message_matcher = "Type == 'nginx.access'"
encoder = "ESJsonEncoder"
flush_interval = 50

ESLogstashV0Encoder

This encoder serializes a Heka message into a JSON format, preceded by a separate JSON structure containing information required for ElasticSearch BulkAPI indexing. The message JSON structure uses the original (i.e. “v0”) schema popularized by Logstash. Using this schema can aid integration with existing Logstash deployments. This schema also plays nicely with the default Logstash dashboard provided by Kibana.

The JSON serialization is done by hand, without using Go’s stdlib JSON marshalling. This is so serialization can succeed even if the message contains invalid UTF-8 characters, which will be encoded as U+FFFD.

Config:

  • index (string):

    Name of the ES index into which the messages will be inserted. Supports interpolation of message field values (from ‘Type’, ‘Hostname’, ‘Pid’, ‘UUID’, ‘Logger’, ‘EnvVersion’, ‘Severity’, a field name, or a timestamp format) with the use of ‘%{}’ chars, so ‘%{Hostname}-%{Logger}-data’ would add the records to an ES index called ‘some.example.com-processname-data’. Defaults to ‘logstash-%{2006.01.02}’.

  • type_name (string):

    Name of ES record type to create. Supports interpolation of message field values (from ‘Type’, ‘Hostname’, ‘Pid’, ‘UUID’, ‘Logger’, ‘EnvVersion’, ‘Severity’, field name, or a timestamp format) with the use of ‘%{}’ chars, so ‘%{Hostname}-stat’ would create an ES record with a type of ‘some.example.com-stat’. Defaults to ‘message’.

  • fields ([]string):

    The ‘fields’ parameter specifies that only specific message data should be indexed into ElasticSearch. Available fields to choose are “Uuid”, “Timestamp”, “Type”, “Logger”, “Severity”, “Payload”, “EnvVersion”, “Pid”, “Hostname”, and “Fields” (where “Fields” causes the inclusion of any and all dynamically specified message fields. Defaults to including all of the supported message fields.

  • es_index_from_timestamp (bool):

    When generating the index name use the timestamp from the message instead of the current time. Defaults to false.

  • id (string):

    Allows you to optionally specify the document id for ES to use. Useful for overwriting existing ES documents. If the value specified is placed within %{}, it will be interpolated to its Field value. Default is allow ES to auto-generate the id.

  • raw_bytes_fields ([]string):

    This specifies a set of fields which will be passed through to the encoded JSON output without any processing or escaping. This is useful for fields which contain embedded JSON objects to prevent the embedded JSON from being escaped as normal strings. Only supports dynamically specified message fields.

Example

[ESLogstashV0Encoder]
es_index_from_timestamp = true
type_name = "%{Type}"

[ElasticSearchOutput]
message_matcher = "Type == 'nginx.access'"
encoder = "ESLogstashV0Encoder"
flush_interval = 50

ESPayloadEncoder

Prepends ElasticSearch BulkAPI index JSON to a message payload.

Config:

  • index (string, optional, default “heka-%{%Y.%m.%d}”)

    String to use as the _index key’s value in the generated JSON. Supports field interpolation as described below.

  • type_name (string, optional, default “message”)

    String to use as the _type key’s value in the generated JSON. Supports field interpolation as described below.

  • id (string, optional)

    String to use as the _id key’s value in the generated JSON. Supports field interpolation as described below.

  • es_index_from_timestamp (boolean, optional)

    If true, then any time interpolation (often used to generate the ElasticSeach index) will use the timestamp from the processed message rather than the system time.

Field interpolation:

Data from the current message can be interpolated into any of the string arguments listed above. A %{} enclosed field name will be replaced by the field value from the current message. Supported default field names are “Type”, “Hostname”, “Pid”, “UUID”, “Logger”, “EnvVersion”, and “Severity”. Any other values will be checked against the defined dynamic message fields. If no field matches, then a C strftime (on non-Windows platforms) or C89 strftime (on Windows) time substitution will be attempted.

Example Heka Configuration

[es_payload]
type = "SandboxEncoder"
filename = "lua_encoders/es_payload.lua"
    [es_payload.config]
    es_index_from_timestamp = true
    index = "%{Logger}-%{%Y.%m.%d}"
    type_name = "%{Type}-%{Hostname}"

[ElasticSearchOutput]
message_matcher = "Type == 'mytype'"
encoder = "es_payload"

Example Output

{"index":{"_index":"mylogger-2014.06.05","_type":"mytype-host.domain.com"}}
{"json":"data","extracted":"from","message":"payload"}

PayloadEncoder

The PayloadEncoder simply extracts the payload from the provided Heka message and converts it into a byte stream for delivery to an external resource.

Config:

  • append_newlines (bool, optional):

    Specifies whether or not a newline character (i.e. n) will be appended to the captured message payload before serialization. Defaults to true.

  • prefix_ts (bool, optional):

    Specifies whether a timestamp will be prepended to the captured message payload before serialization. Defaults to false.

  • ts_from_message (bool, optional):

    If true, the prepended timestamp will be extracted from the message that is being processed. If false, the prepended timestamp will be generated by the system clock at the time of message processing. Defaults to true. This setting has no impact if prefix_ts is set to false.

  • ts_format (string, optional):

    Specifies the format that should be used for prepended timestamps, using Go’s standard time format specification strings. Defaults to [2006/Jan/02:15:04:05 -0700]. If the specified format string does not end with a space character, then a space will be inserted between the formatted timestamp and the payload.

Example

[PayloadEncoder]
append_newlines = false
prefix_ts = true
ts_format = "2006/01/02 3:04:05PM MST"

ProtobufEncoder

The ProtobufEncoder is used to serialize Heka message objects back into Heka’s standard protocol buffers format. This is the format that Heka uses to communicate with other Heka instances, so one will always be included in your Heka configuration using the default “ProtobufEncoder” name whether specified or not.

The hekad protocol buffers message schema is defined in the message.proto file in the message package.

Config:

<none>

Example:

[ProtobufEncoder]

RstEncoder

The RstEncoder generates a reStructuredText rendering of a Heka message, including all fields and attributes. It is useful for debugging, especially when coupled with a LogOutput.

Config:

<none>

Example:

[RstEncoder]

[LogOutput]
message_matcher = "TRUE"
encoder = "RstEncoder"

SandboxEncoder

The SandboxEncoder provides an isolated execution environment for converting messages into binary data without the need to recompile Heka. See Sandbox.

Config:

Example

[custom_json_encoder]
type = "SandboxEncoder"
filename = "path/to/custom_json_encoder.lua"

    [custom_json_encoder.config]
    msg_fields = ["field1", "field2"]

StatMetric Influx Encoder

Extracts data from message fields in heka.statmetric messages generated by a StatAccumInput and generates JSON suitable for use with InfluxDB’s HTTP API. StatAccumInput must be configured with emit_in_fields = true for this encoder to work correctly.

Example Heka Configuration

[statmetric-influx-encoder]
type = "SandboxEncoder"
filename = "lua_encoders/statmetric_influx.lua"

[influx]
type = "HttpOutput"
message_matcher = "Type == 'heka.statmetric'"
address = "http://myinfluxserver.example.com:8086/db/stats/series"
encoder = "statmetric-influx-encoder"
username = "influx_username"
password = "influx_password"

Example Output

[{"points":[[1408404848,78271]],"name":"stats.counters.000000.rate","columns":["time","value"]},{"points":[[1408404848,78271]],"name":"stats.counters.000000.count","columns":["time","value"]},{"points":[[1408404848,17420]],"name":"stats.timers.000001.count","columns":["time","value"]},{"points":[[1408404848,17420]],"name":"stats.timers.000001.count_ps","columns":["time","value"]},{"points":[[1408404848,1]],"name":"stats.timers.000001.lower","columns":["time","value"]},{"points":[[1408404848,1024]],"name":"stats.timers.000001.upper","columns":["time","value"]},{"points":[[1408404848,8937851]],"name":"stats.timers.000001.sum","columns":["time","value"]},{"points":[[1408404848,513.07985074627]],"name":"stats.timers.000001.mean","columns":["time","value"]},{"points":[[1408404848,461.72356167879]],"name":"stats.timers.000001.mean_90","columns":["time","value"]},{"points":[[1408404848,925]],"name":"stats.timers.000001.upper_90","columns":["time","value"]},{"points":[[1408404848,2]],"name":"stats.statsd.numStats","columns":["time","value"]}]