Status | |
---|---|
Stability | alpha: traces, metrics, logs |
Distributions | contrib, k8s |
Warnings | Unsound Transformations, Identity Conflict, Orphaned Telemetry, Other |
Issues | |
Code Owners | @TylerHelmuth, @kentquirk, @bogdandrutu, @evan-bradley, @edmocosta |
The transform processor modifies telemetry based on configuration using the OpenTelemetry Transformation Language.
For each signal type, the processor takes a list of conditions and statements associated to a Context type and executes the conditions and statements against the incoming telemetry in the order specified in the config. Each condition and statement can access and transform telemetry using functions and allow the use of a condition to help decide whether the function should be executed.
The transform processor allows configuring multiple context statements for traces, metrics, and logs.
The value of context
specifies which OTTL Context to use when interpreting the associated statements.
The global conditions and statement strings, which must be OTTL compatible, will be passed to OTTL and interpreted using the associated context.
The condition string should contain a Where clause body without the where
keyword at the beginning.
Each context will be processed in the order specified. Within a context, each global condition is checked and if any evaluates to true, the statements are executed in order. If a context doesn't meet any of the conditions, then the associated statement will be skipped.
Each statement may have a Where clause that acts as an additional check for whether to execute the statement.
The transform processor also allows configuring an optional field, error_mode
, which will determine how the processor reacts to errors that occur while processing a statement.
error_mode | description |
---|---|
ignore | The processor ignores errors returned by statements, logs the error, and continues on to the next statement. This is the recommended mode. |
silent | The processor ignores errors returned by statements, does not log the error, and continues on to the next statement. |
propagate | The processor returns the error up the pipeline. This will result in the payload being dropped from the collector. |
If not specified, propagate
will be used.
transform:
error_mode: ignore
<trace|metric|log>_statements:
- context: string
conditions:
- string
- string
statements:
- string
- string
- string
- context: string
statements:
- string
- string
- string
Proper use of contexts will provide increased performance and capabilities. See Contexts for more details.
Valid values for context
are:
Signal | Context Values |
---|---|
trace_statements | resource , scope , span , and spanevent |
metric_statements | resource , scope , metric , and datapoint |
log_statements | resource , scope , and log |
conditions
is a list comprised of multiple where clauses, which will be processed as global conditions for the accompanying set of statements. The conditions are ORed together, which means only one condition needs to evaluate to true in order for the statements (including their individual Where clauses) to be executed.
transform:
error_mode: ignore
metric_statements:
- context: metric
conditions:
- type == METRIC_DATA_TYPE_SUM
statements:
- set(description, "Sum")
log_statements:
- context: log
conditions:
- IsMap(body) and body["object"] != nil
statements:
- set(body, attributes["http.route"])
The example takes advantage of context efficiency by grouping transformations with the context which it intends to transform. See Contexts for more details.
Example configuration:
transform:
error_mode: ignore
trace_statements:
- context: resource
statements:
- keep_keys(attributes, ["service.name", "service.namespace", "cloud.region", "process.command_line"])
- replace_pattern(attributes["process.command_line"], "password\\=[^\\s]*(\\s?)", "password=***")
- limit(attributes, 100, [])
- truncate_all(attributes, 4096)
- context: span
statements:
- set(status.code, 1) where attributes["http.path"] == "/health"
- set(name, attributes["http.route"])
- replace_match(attributes["http.target"], "/user/*/list/*", "/user/{userId}/list/{listId}")
- limit(attributes, 100, [])
- truncate_all(attributes, 4096)
metric_statements:
- context: resource
statements:
- keep_keys(attributes, ["host.name"])
- truncate_all(attributes, 4096)
- context: metric
statements:
- set(description, "Sum") where type == "Sum"
- convert_sum_to_gauge() where name == "system.processes.count"
- convert_gauge_to_sum("cumulative", false) where name == "prometheus_metric"
- context: datapoint
statements:
- limit(attributes, 100, ["host.name"])
- truncate_all(attributes, 4096)
log_statements:
- context: resource
statements:
- keep_keys(attributes, ["service.name", "service.namespace", "cloud.region"])
- context: log
statements:
- set(severity_text, "FAIL") where body == "request failed"
- replace_all_matches(attributes, "/user/*/list/*", "/user/{userId}/list/{listId}")
- replace_all_patterns(attributes, "value", "/account/\\d{4}", "/account/{accountId}")
- set(body, attributes["http.route"])
You can learn more in-depth details on the capabilities and limitations of the OpenTelemetry Transformation Language used by the transform processor by reading about its grammar.
The transform processor utilizes the OTTL's contexts to transform Resource, Scope, Span, SpanEvent, Metric, DataPoint, and Log telemetry. The contexts allow the OTTL to interact with the underlying telemetry data in its pdata form.
- Resource Context
- Scope Context
- Span Context
- SpanEvent Context
- Metric Context
- DataPoint Context
- Log Context
Each context allows transformation of its type of telemetry.
For example, statements associated to a resource
context will be able to transform the resource's attributes
and dropped_attributes_count
.
Contexts NEVER supply access to individual items "lower" in the protobuf definition.
- This means statements associated to a
resource
WILL NOT be able to access the underlying instrumentation scopes. - This means statements associated to a
scope
WILL NOT be able to access the underlying telemetry slices (spans, metrics, or logs). - Similarly, statements associated to a
metric
WILL NOT be able to access individual datapoints, but can access the entire datapoints slice. - Similarly, statements associated to a
span
WILL NOT be able to access individual SpanEvents, but can access the entire SpanEvents slice.
For practical purposes, this means that a context cannot make decisions on its telemetry based on telemetry "lower" in the structure.
For example, the following context statement is not possible because it attempts to use individual datapoint attributes in the condition of a statements that is associated to a metric
metric_statements:
- context: metric
statements:
- set(description, "test passed") where datapoints.attributes["test"] == "pass"
Context ALWAYS supply access to the items "higher" in the protobuf definition that are associated to the telemetry being transformed.
- This means that statements associated to a
datapoint
have access to a datapoint's metric, instrumentation scope, and resource. - This means that statements associated to a
spanevent
have access to a spanevent's span, instrumentation scope, and resource. - This means that statements associated to a
span
/metric
/log
have access to the telemetry's instrumentation scope, and resource. - This means that statements associated to a
scope
have access to the scope's resource.
For example, the following context statement is possible because datapoint
statements can access the datapoint's metric.
metric_statements:
- context: datapoint
statements:
- set(metric.description, "test passed") where attributes["test"] == "pass"
Whenever possible, associate your statements to the context that the statement intend to transform.
Although you can modify resource attributes associated to a span using the span
context, it is more efficient to use the resource
context.
This is because contexts are nested: the efficiency comes because higher-level contexts can avoid iterating through any of the contexts at a lower level.
Since the transform processor utilizes the OTTL's contexts for Traces, Metrics, and Logs, it is able to utilize functions that expect pdata in addition to any common functions. These common functions can be used for any signal.
In addition to OTTL functions, the processor defines its own functions to help with transformations specific to this processor:
Metrics only functions
- convert_sum_to_gauge
- convert_gauge_to_sum
- convert_summary_count_val_to_sum
- convert_summary_sum_val_to_sum
- copy_metric
- scale_metric
- aggregate_on_attributes
- convert_exponential_histogram_to_histogram
- aggregate_on_attribute_value
convert_sum_to_gauge()
Converts incoming metrics of type "Sum" to type "Gauge", retaining the metric's datapoints. Noop for metrics that are not of type "Sum".
NOTE: This function may cause a metric to break semantics for Gauge metrics. Use at your own risk.
Examples:
convert_sum_to_gauge()
convert_gauge_to_sum(aggregation_temporality, is_monotonic)
Converts incoming metrics of type "Gauge" to type "Sum", retaining the metric's datapoints and setting its aggregation temporality and monotonicity accordingly. Noop for metrics that are not of type "Gauge".
aggregation_temporality
is a string ("cumulative"
or "delta"
) that specifies the resultant metric's aggregation temporality. is_monotonic
is a boolean that specifies the resultant metric's monotonicity.
NOTE: This function may cause a metric to break semantics for Sum metrics. Use at your own risk.
Examples:
-
convert_gauge_to_sum("cumulative", false)
-
convert_gauge_to_sum("delta", true)
Note
This function supports Histograms, ExponentialHistograms and Summaries.
extract_count_metric(is_monotonic)
The extract_count_metric
function creates a new Sum metric from a Histogram, ExponentialHistogram or Summary's count value. A metric will only be created if there is at least one data point.
is_monotonic
is a boolean representing the monotonicity of the new metric.
The name for the new metric will be <original metric name>_count
. The fields that are copied are: timestamp
, starttimestamp
, attributes
, description
, and aggregation_temporality
. As metrics of type Summary don't have an aggregation_temporality
field, this field will be set to AGGREGATION_TEMPORALITY_CUMULATIVE
for those metrics.
The new metric that is created will be passed to all subsequent statements in the metrics statements list.
Warning
This function may cause a metric to break semantics for Sum metrics. Use only if you're confident you know what the resulting monotonicity should be.
Examples:
-
extract_count_metric(true)
-
extract_count_metric(false)
Note
This function supports Histograms, ExponentialHistograms and Summaries.
extract_sum_metric(is_monotonic)
The extract_sum_metric
function creates a new Sum metric from a Histogram, ExponentialHistogram or Summary's sum value. If the sum value of a Histogram or ExponentialHistogram data point is missing, no data point is added to the output metric. A metric will only be created if there is at least one data point.
is_monotonic
is a boolean representing the monotonicity of the new metric.
The name for the new metric will be <original metric name>_sum
. The fields that are copied are: timestamp
, starttimestamp
, attributes
, description
, and aggregation_temporality
. As metrics of type Summary don't have an aggregation_temporality
field, this field will be set to AGGREGATION_TEMPORALITY_CUMULATIVE
for those metrics.
The new metric that is created will be passed to all subsequent statements in the metrics statements list.
Warning
This function may cause a metric to break semantics for Sum metrics. Use only if you're confident you know what the resulting monotonicity should be.
Examples:
-
extract_sum_metric(true)
-
extract_sum_metric(false)
convert_summary_count_val_to_sum(aggregation_temporality, is_monotonic)
The convert_summary_count_val_to_sum
function creates a new Sum metric from a Summary's count value.
aggregation_temporality
is a string ("cumulative"
or "delta"
) representing the desired aggregation temporality of the new metric. is_monotonic
is a boolean representing the monotonicity of the new metric.
The name for the new metric will be <summary metric name>_count
. The fields that are copied are: timestamp
, starttimestamp
, attributes
, and description
. The new metric that is created will be passed to all functions in the metrics statements list. Function conditions will apply.
NOTE: This function may cause a metric to break semantics for Sum metrics. Use at your own risk.
Examples:
-
convert_summary_count_val_to_sum("delta", true)
-
convert_summary_count_val_to_sum("cumulative", false)
convert_summary_sum_val_to_sum(aggregation_temporality, is_monotonic)
The convert_summary_sum_val_to_sum
function creates a new Sum metric from a Summary's sum value.
aggregation_temporality
is a string ("cumulative"
or "delta"
) representing the desired aggregation temporality of the new metric. is_monotonic
is a boolean representing the monotonicity of the new metric.
The name for the new metric will be <summary metric name>_sum
. The fields that are copied are: timestamp
, starttimestamp
, attributes
, and description
. The new metric that is created will be passed to all functions in the metrics statements list. Function conditions will apply.
NOTE: This function may cause a metric to break semantics for Sum metrics. Use at your own risk.
Examples:
-
convert_summary_sum_val_to_sum("delta", true)
-
convert_summary_sum_val_to_sum("cumulative", false)
copy_metric(Optional[name], Optional[description], Optional[unit])
The copy_metric
function copies the current metric, adding it to the end of the metric slice.
name
is an optional string. description
is an optional string. unit
is an optional string.
The new metric will be exactly the same as the current metric. You can use the optional parameters to set the new metric's name, description, and unit.
NOTE: The new metric is appended to the end of the metric slice and therefore will be included in all the metric statements. It is a best practice to ALWAYS include a Where clause when copying a metric that WILL NOT match the new metric.
Examples:
-
copy_metric(name="http.request.status_code", unit="s") where name == "http.status_code
-
copy_metric(desc="new desc") where description == "old desc"
Warning: The approach used in this function to convert exponential histograms to explicit histograms is not part of the OpenTelemetry Specification.
convert_exponential_histogram_to_histogram(distribution, [ExplicitBounds])
The convert_exponential_histogram_to_histogram
function converts an ExponentialHistogram to an Explicit (normal) Histogram.
This function requires 2 arguments:
-
distribution
- This argument defines the distribution algorithm used to allocate the exponential histogram datapoints into a new Explicit Histogram. There are 4 options:-
upper - This approach identifies the highest possible value of each exponential bucket (the upper bound) and uses it to distribute the datapoints by comparing the upper bound of each bucket with the ExplicitBounds provided. This approach works better for small/narrow exponential histograms where the difference between the upper bounds and lower bounds are small.
For example, Given:
- count = 10
- Boundaries: [5, 10, 15, 20, 25]
- Upper Bound: 15 Process:
- Start with zeros: [0, 0, 0, 0, 0]
- Iterate the boundaries and compare
$upper = 15$ with each boundary: -$15>5$ (skip) -$15>10$ (skip) -$15<=15$ (allocate count to this boundary) - Allocate count: [0, 0, 10, 0, 0]
- Final Counts: [0, 0, 10, 0, 0]
-
midpoint - This approach works in a similar way to the upper approach, but instead of using the upper bound, it uses the midpoint of each exponential bucket. The midpoint is identified by calculating the average of the upper and lower bounds. This approach also works better for small/narrow exponential histograms.
The uniform and random distribution algorithms both utilise the concept of intersecting boundaries. Intersecting boundaries are any boundary in the
boundaries array
that falls between or on the lower and upper values of the Exponential Histogram boundaries. For Example: if you have an Exponential Histogram bucket with a lower bound of 10 and upper of 20, and your boundaries array is [5, 10, 15, 20, 25], the intersecting boundaries are 10, 15, and 20 because they lie within the range [10, 20]. -
uniform - This approach distributes the datapoints for each bucket uniformly across the intersecting ExplicitBounds. The algorithm works as follows:
- If there are valid intersecting boundaries, the function evenly distributes the count across these boundaries.
- Calculate the count to be allocated to each boundary.
- If there is a remainder after dividing the count equally, it distributes the remainder by incrementing the count for some of the boundaries until the remainder is exhausted.
For example Given:
- count = 10
- Exponential Histogram Bounds: [10, 20]
- Boundaries: [5, 10, 15, 20, 25]
- Intersecting Boundaries: [10, 15, 20]
- Number of Intersecting Boundaries: 3
- Using the formula:
$count/numOfIntersections=10/3=3r1$
Uniform Allocation:
- Start with zeros: [0, 0, 0, 0, 0]
- Allocate 3 to each: [0, 3, 3, 3, 0]
- Distribute remainder
$r$ 1: [0, 4, 3, 3, 0] - Final Counts: [0, 4, 3, 3, 0]
-
random - This approach distributes the datapoints for each bucket randomly across the intersecting ExplicitBounds. This approach works in a similar manner to the uniform distribution algorithm with the main difference being that points are distributed randomly instead of uniformly. This works as follows:
- If there are valid intersecting boundaries, calculate the proportion of the count that should be allocated to each boundary based on the overlap of the boundary with the provided range (lower to upper).
- For each boundary, a random fraction of the calculated proportion is allocated.
- Any remaining count (due to rounding or random distribution) is then distributed randomly among the intersecting boundaries.
- If the bucket range does not intersect with any boundaries, the entire count is assigned to the start boundary.
-
-
ExplicitBounds
represents the list of bucket boundaries for the new histogram. This argument is required and cannot be empty.
WARNINGS:
-
The process of converting an ExponentialHistogram to an Explicit Histogram is not perfect and may result in a loss of precision. It is important to define an appropriate set of bucket boundaries and identify the best distribution approach for your data in order to minimize this loss.
For example, selecting Boundaries that are too high or too low may result histogram buckets that are too wide or too narrow, respectively.
-
Negative Bucket Counts are not supported in Explicit Histograms, as such negative bucket counts are ignored.
-
ZeroCounts are only allocated if the ExplicitBounds array contains a zero boundary. That is, if the Explicit Boundaries that you provide does not start with
0
, the function will not allocate any zero counts from the Exponential Histogram.
This function should only be used when Exponential Histograms are not suitable for the downstream consumers or if upstream metric sources are unable to generate Explicit Histograms.
Example:
convert_exponential_histogram_to_histogram("random", [0.0, 10.0, 100.0, 1000.0, 10000.0])
scale_metric(factor, Optional[unit])
The scale_metric
function multiplies the values in the data points in the metric by the float value factor
.
If the optional string unit
is provided, the metric's unit will be set to this value.
The supported data types are:
Supported metric types are Gauge
, Sum
, Histogram
, and Summary
.
Examples:
scale_metric(0.1)
: Scale the metric by a factor of0.1
. The unit of the metric will not be modified.scale_metric(10.0, "kWh")
: Scale the metric by a factor of10.0
and sets the unit tokWh
.
aggregate_on_attributes(function, Optional[attributes])
The aggregate_on_attributes
function aggregates all datapoints in the metric based on the supplied attributes. function
is a case-sensitive string that represents the aggregation function and attributes
is an optional list of attribute keys of type string to aggregate upon.
aggregate_on_attributes
function removes all attributes that are present in datapoints except the ones that are specified in the attributes
parameter. If attributes
parameter is not set, all attributes are removed from datapoints. Afterwards all datapoints are aggregated depending on the attributes left (none or the ones present in the list).
NOTE: This function is supported only in metric
context.
The following metric types can be aggregated:
- sum
- gauge
- histogram
- exponential histogram
Supported aggregation functions are:
- sum
- max
- min
- mean
- median
- count
NOTE: Only the sum
aggregation function is supported for histogram and exponential histogram datatypes.
Examples:
aggregate_on_attributes("sum", ["attr1", "attr2"]) where name == "system.memory.usage"
aggregate_on_attributes("max") where name == "system.memory.usage"
The aggregate_on_attributes
function can also be used in conjunction with
keep_matching_keys or
delete_matching_keys.
For example, to remove attribute keys matching a regex and aggregate the metrics on the remaining attributes, you can perform the following statement sequence:
statements:
- delete_matching_keys(attributes, "(?i).*myRegex.*") where name == "system.memory.usage"
- aggregate_on_attributes("sum") where name == "system.memory.usage"
To aggregate only using a specified set of attributes, you can use keep_matching_keys
.
aggregate_on_attribute_value(function, attribute, values, newValue)
The aggregate_on_attribute_value
function aggregates all datapoints in the metric containing the attribute attribute
(type string) with one of the values present in the values
parameter (list of strings) into a single datapoint where the attribute has the value newValue
(type string). function
is a case-sensitive string that represents the aggregation function.
NOTE: This function is supported only in metric
context.
The following metric types can be aggregated:
- sum
- gauge
- histogram
- exponential histogram
Supported aggregation functions are:
- sum
- max
- min
- mean
- median
- count
NOTE: Only the sum
agregation function is supported for histogram and exponential histogram datatypes.
Examples:
aggregate_on_attribute_value("sum", "attr1", ["val1", "val2"], "new_val") where name == "system.memory.usage"
The aggregate_on_attribute_value
function can also be used in conjunction with
keep_matching_keys or
delete_matching_keys.
For example, to remove attribute keys matching a regex and aggregate the metrics on the remaining attributes, you can perform the following statement sequence:
statements:
- delete_matching_keys(attributes, "(?i).*myRegex.*") where name == "system.memory.usage"
- aggregate_on_attribute_value("sum", "attr1", ["val1", "val2"], "new_val") where name == "system.memory.usage"
To aggregate only using a specified set of attributes, you can use keep_matching_keys
.
Set attribute test
to "pass"
if the attribute test
does not exist:
transform:
error_mode: ignore
trace_statements:
- context: span
statements:
# accessing a map with a key that does not exist will return nil.
- set(attributes["test"], "pass") where attributes["test"] == nil
There are 2 ways to rename an attribute key:
You can either set a new attribute and delete the old:
transform:
error_mode: ignore
trace_statements:
- context: resource
statements:
- set(attributes["namespace"], attributes["k8s.namespace.name"])
- delete_key(attributes, "k8s.namespace.name")
Or you can update the key using regex:
transform:
error_mode: ignore
trace_statements:
- context: resource
statements:
- replace_all_patterns(attributes, "key", "k8s\\.namespace\\.name", "namespace")
Set attribute body
to the value of the log body:
transform:
error_mode: ignore
log_statements:
- context: log
statements:
- set(attributes["body"], body)
Set attribute test
to the value of attributes "foo"
and "bar"
combined.
transform:
error_mode: ignore
trace_statements:
- context: resource
statements:
# Use Concat function to combine any number of string, separated by a delimiter.
- set(attributes["test"], Concat([attributes["foo"], attributes["bar"]], " "))
Given the following json body
{
"name": "log",
"attr1": "foo",
"attr2": "bar",
"nested": {
"attr3": "example"
}
}
add specific fields as attributes on the log:
transform:
error_mode: ignore
log_statements:
- context: log
statements:
# Parse body as JSON and merge the resulting map with the cache map, ignoring non-json bodies.
# cache is a field exposed by OTTL that is a temporary storage place for complex operations.
- merge_maps(cache, ParseJSON(body), "upsert") where IsMatch(body, "^\\{")
# Set attributes using the values merged into cache.
# If the attribute doesn't exist in cache then nothing happens.
- set(attributes["attr1"], cache["attr1"])
- set(attributes["attr2"], cache["attr2"])
# To access nested maps you can chain index ([]) operations.
# If nested or attr3 do no exist in cache then nothing happens.
- set(attributes["nested.attr3"], cache["nested"]["attr3"])
Given the following unstructured log body
[2023-09-22 07:38:22,570] INFO [Something]: some interesting log
You can find the severity using IsMatch:
transform:
error_mode: ignore
log_statements:
- context: log
statements:
- set(severity_number, SEVERITY_NUMBER_INFO) where IsString(body) and IsMatch(body, "\\sINFO\\s")
- set(severity_number, SEVERITY_NUMBER_WARN) where IsString(body) and IsMatch(body, "\\sWARN\\s")
- set(severity_number, SEVERITY_NUMBER_ERROR) where IsString(body) and IsMatch(body, "\\sERROR\\s")
If you want to move resource attributes, which keys are matching the regular expression pod_labels_.*
to a new attribute
location kubernetes.labels
, use the following configuration:
transform:
error_mode: ignore
trace_statements:
- context: resource
statements:
- set(cache["attrs"], attributes)
- keep_matching_keys(cache["attrs"], "pod_labels_.*")
- set(attributes["kubernetes.labels"], cache["attrs"])
The configuration can be used also with delete_matching_keys()
to copy the attributes that do not match the regular expression.
When using OTTL you can enable debug logging in the collector to print out useful information, such as the current Statement and the current TransformContext, to help you troubleshoot why a statement is not behaving as you expect. This feature is very verbose, but provides you an accurate view into how OTTL views the underlying data.
receivers:
filelog:
start_at: beginning
include: [ test.log ]
processors:
transform:
error_mode: ignore
log_statements:
- context: log
statements:
- set(resource.attributes["test"], "pass")
- set(instrumentation_scope.attributes["test"], ["pass"])
- set(attributes["test"], true)
exporters:
debug:
service:
telemetry:
logs:
level: debug
pipelines:
logs:
receivers:
- filelog
processors:
- transform
exporters:
- debug
2024-05-29T16:38:09.600-0600 debug [email protected]/parser.go:265 initial TransformContext {"kind": "processor", "name": "transform", "pipeline": "logs", "TransformContext": {"resource": {"attributes": {}, "dropped_attribute_count": 0}, "scope": {"attributes": {}, "dropped_attribute_count": 0, "name": "", "version": ""}, "log_record": {"attributes": {"log.file.name": "test.log"}, "body": "test", "dropped_attribute_count": 0, "flags": 0, "observed_time_unix_nano": 1717022289500721000, "severity_number": 0, "severity_text": "", "span_id": "", "time_unix_nano": 0, "trace_id": ""}, "cache": {}}}
2024-05-29T16:38:09.600-0600 debug [email protected]/parser.go:268 TransformContext after statement execution {"kind": "processor", "name": "transform", "pipeline": "logs", "statement": "set(resource.attributes[\"test\"], \"pass\")", "condition matched": true, "TransformContext": {"resource": {"attributes": {"test": "pass"}, "dropped_attribute_count": 0}, "scope": {"attributes": {}, "dropped_attribute_count": 0, "name": "", "version": ""}, "log_record": {"attributes": {"log.file.name": "test.log"}, "body": "test", "dropped_attribute_count": 0, "flags": 0, "observed_time_unix_nano": 1717022289500721000, "severity_number": 0, "severity_text": "", "span_id": "", "time_unix_nano": 0, "trace_id": ""}, "cache": {}}}
2024-05-29T16:38:09.600-0600 debug [email protected]/parser.go:268 TransformContext after statement execution {"kind": "processor", "name": "transform", "pipeline": "logs", "statement": "set(instrumentation_scope.attributes[\"test\"], [\"pass\"])", "condition matched": true, "TransformContext": {"resource": {"attributes": {"test": "pass"}, "dropped_attribute_count": 0}, "scope": {"attributes": {"test": ["pass"]}, "dropped_attribute_count": 0, "name": "", "version": ""}, "log_record": {"attributes": {"log.file.name": "test.log"}, "body": "test", "dropped_attribute_count": 0, "flags": 0, "observed_time_unix_nano": 1717022289500721000, "severity_number": 0, "severity_text": "", "span_id": "", "time_unix_nano": 0, "trace_id": ""}, "cache": {}}}
2024-05-29T16:38:09.601-0600 debug [email protected]/parser.go:268 TransformContext after statement execution {"kind": "processor", "name": "transform", "pipeline": "logs", "statement": "set(attributes[\"test\"], true)", "condition matched": true, "TransformContext": {"resource": {"attributes": {"test": "pass"}, "dropped_attribute_count": 0}, "scope": {"attributes": {"test": ["pass"]}, "dropped_attribute_count": 0, "name": "", "version": ""}, "log_record": {"attributes": {"log.file.name": "test.log", "test": true}, "body": "test", "dropped_attribute_count": 0, "flags": 0, "observed_time_unix_nano": 1717022289500721000, "severity_number": 0, "severity_text": "", "span_id": "", "time_unix_nano": 0, "trace_id": ""}, "cache": {}}}
See CONTRIBUTING.md.
The transform processor uses the OpenTelemetry Transformation Language (OTTL) which allows users to modify all aspects of their telemetry. Some specific risks are listed below, but this is not an exhaustive list. In general, understand your data before using the transform processor.
- Unsound Transformations: Several Metric-only functions allow you to transform one metric data type to another or create new metrics from an existing metrics. Transformations between metric data types are not defined in the metrics data model. These functions have the expectation that you understand the incoming data and know that it can be meaningfully converted to a new metric data type or can meaningfully be used to create new metrics.
- Although the OTTL allows the
set
function to be used withmetric.data_type
, its implementation in the transform processor is NOOP. To modify a data type you must use a function specific to that purpose.
- Although the OTTL allows the
- Identity Conflict: Transformation of metrics have the potential to affect the identity of a metric leading to an Identity Crisis. Be especially cautious when transforming metric name and when reducing/changing existing attributes. Adding new attributes is safe.
- Orphaned Telemetry: The processor allows you to modify
span_id
,trace_id
, andparent_span_id
for traces andspan_id
, andtrace_id
logs. Modifying these fields could lead to orphaned spans or logs.
The transform.flatten.logs
feature gate enables the flatten_data
configuration option (default false
). With flatten_data: true
, the processor provides each log record with a distinct copy of its resource and scope. Then, after applying all transformations, the log records are regrouped by resource and scope.
This option is useful when applying transformations which alter the resource or scope. e.g. set(resource.attributes["to"], attributes["from"])
, which may otherwise result in unexpected behavior. Using this option typically incurs a performance penalty as the processor must compute many hashes and create copies of resource and scope information for every log record.
The feature is currently only available for log processing.
config.yaml
:
transform:
flatten_data: true
log_statements:
- context: log
statements:
- set(resource.attributes["to"], attributes["from"])
Run collector: ./otelcol --config config.yaml --feature-gates=transform.flatten.logs