Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider adding short keys for JSON encoding #412

Closed
Tracked by #1957
tigrannajaryan opened this issue Jul 4, 2022 · 3 comments
Closed
Tracked by #1957

Consider adding short keys for JSON encoding #412

tigrannajaryan opened this issue Jul 4, 2022 · 3 comments
Labels
wontfix This will not be worked on

Comments

@tigrannajaryan
Copy link
Member

json_name option can be used to shorten keys for JSON encoding. This can result in significant size savings for uncompressed JSON, which appears to be important for e.g. open-telemetry/oteps#208 (comment)

@tigrannajaryan
Copy link
Member Author

Here is size comparison using one or two letter JSON keys:

===== Encoded sizes
Encoding                        Uncomp/json[Improved]   zlib/json[Improved]
OTLP/Logs                       55005 by   [1.000]      4627 by [1.000]
ShortKeys/Logs                  41768 by   [1.317]      4386 by [1.055]

Encoding                        Uncomp/json[Improved]   zlib/json[Improved]
OTLP/Trace/Attribs              41738 by   [1.000]      3513 by [1.000]
ShortKeys/Trace/Attribs         31998 by   [1.304]      3356 by [1.047]

Encoding                        Uncomp/json[Improved]   zlib/json[Improved]
OTLP/Trace/Events               49336 by   [1.000]      2572 by [1.000]
ShortKeys/Trace/Events          34396 by   [1.434]      2435 by [1.056]

Encoding                        Uncomp/json[Improved]   zlib/json[Improved]
OTLP/Metric/Histogram           37993 by   [1.000]      1809 by [1.000]
ShortKeys/Metric/Histogram      25171 by   [1.509]      1641 by [1.102]

Encoding                        Uncomp/json[Improved]   zlib/json[Improved]
OTLP/Metric/MixOne             149353 by   [1.000]      4185 by [1.000]
ShortKeys/Metric/MixOne        104231 by   [1.433]      3894 by [1.075]

Encoding                        Uncomp/json[Improved]   zlib/json[Improved]
OTLP/Metric/MixSeries          549540 by   [1.000]     13819 by [1.000]
ShortKeys/Metric/MixSeries     395682 by   [1.389]     13103 by [1.055]

The uncompressed JSON improvement with short keys is around 1.3-1.5 times compared to the regular keys, which use full field names.

tigrannajaryan added a commit to tigrannajaryan/opentelemetry-proto that referenced this issue Jul 5, 2022
Resolves open-telemetry#412

This change sets short one or two letter keys for all fields when JSON
encoding is used. This results in about 1.3-1.5 times smaller uncompressed
payload.

Unfortunately this is a breaking change for default configuration of
Protobuf/JSON marshaller, which marshals field names in lowerCamelCase.
This is not a breaking change for marshals which use the "OrigName=true"
JSON marshaling option. Nothing changes in the output when "OrigName=true"
is used.

I do not see an easy way to make this change gracefully. It will require
duplicating the entire proto, give the duplicate messages different names,
then handle the duplicates in the receivers. It is quite a lot of work
that can be also error-prone. I think in this particular case we should
not attempt to handle it gracefully and simply still to our formal
stability guarantees, which for JSON are "Alpha" and allow any changes
any time.

### Example Outputs

Current JSON output, using default lowerCamelCase marshaller:

```json
{
    "resource_spans": [
        {
            "resource": {
                "attributes": [
                    {
                        "key": "StartTimeUnixnano",
                        "value": {
                            "int_value": "12345678"
                        }
                    },
                    {
                        "key": "Pid",
                        "value": {
                            "int_value": "1234"
                        }
                    },
                    {
                        "key": "HostName",
                        "value": {
                            "string_value": "fakehost"
                        }
                    },
                    {
                        "key": "ServiceName",
                        "value": {
                            "string_value": "generator"
                        }
                    }
                ]
            },
            "scope_spans": [
                {
                    "scope": {
                        "name": "io.opentelemetry"
                    },
                    "spans": [
                        {
                            "trace_id": "AQAAAAAAAADw3rwKeFY0Eg==",
                            "span_id": "AQAAAAAAAAA=",
                            "name": "load-generator-span",
                            "kind": "SPAN_KIND_CLIENT",
                            "start_time_unix_nano": "1572516672000000013",
                            "end_time_unix_nano": "1572516672000000013",
                            "attributes": [
                                {
                                    "key": "db.mongodb.collection",
                                    "value": {
                                        "string_value": "!##$"
                                    }
                                }
                            ],
                            "events": [
                                {
                                    "time_unix_nano": "1572516672000000013",
                                    "attributes": [
                                        {
                                            "key": "te",
                                            "value": {
                                                "int_value": "1"
                                            }
                                        }
                                    ]
                                }
                            ]
                        }
                    ]
                }
            ]
        }
    ]
}
```

Current JSON output, using OrigName=true marshaller.

```json
{
    "resource_spans": [
        {
            "resource": {
                "attributes": [
                    {
                        "key": "StartTimeUnixnano",
                        "value": {
                            "int_value": "12345678"
                        }
                    },
                    {
                        "key": "Pid",
                        "value": {
                            "int_value": "1234"
                        }
                    },
                    {
                        "key": "HostName",
                        "value": {
                            "string_value": "fakehost"
                        }
                    },
                    {
                        "key": "ServiceName",
                        "value": {
                            "string_value": "generator"
                        }
                    }
                ]
            },
            "scope_spans": [
                {
                    "scope": {
                        "name": "io.opentelemetry"
                    },
                    "spans": [
                        {
                            "trace_id": "AQAAAAAAAADw3rwKeFY0Eg==",
                            "span_id": "AQAAAAAAAAA=",
                            "name": "load-generator-span",
                            "kind": "SPAN_KIND_CLIENT",
                            "start_time_unix_nano": "1572516672000000013",
                            "end_time_unix_nano": "1572516672000000013",
                            "attributes": [
                                {
                                    "key": "db.mongodb.collection",
                                    "value": {
                                        "string_value": "!##$"
                                    }
                                }
                            ],
                            "events": [
                                {
                                    "time_unix_nano": "1572516672000000013",
                                    "attributes": [
                                        {
                                            "key": "te",
                                            "value": {
                                                "int_value": "1"
                                            }
                                        }
                                    ]
                                }
                            ]
                        }
                    ]
                }
            ]
        }
    ]
}
```

New format, using short keys proposed in this change:

```json
{
    "s": [
        {
            "r": {
                "a": [
                    {
                        "k": "StartTimeUnixnano",
                        "v": {
                            "i": "12345678"
                        }
                    },
                    {
                        "k": "Pid",
                        "v": {
                            "i": "1234"
                        }
                    },
                    {
                        "k": "HostName",
                        "v": {
                            "s": "fakehost"
                        }
                    },
                    {
                        "k": "ServiceName",
                        "v": {
                            "s": "generator"
                        }
                    }
                ]
            },
            "s": [
                {
                    "i": {
                        "n": "io.opentelemetry"
                    },
                    "s": [
                        {
                            "ti": "AQAAAAAAAADw3rwKeFY0Eg==",
                            "si": "AQAAAAAAAAA=",
                            "n": "load-generator-span",
                            "k": "SPAN_KIND_CLIENT",
                            "s": "1572516672000000013",
                            "e": "1572516672000000013",
                            "a": [
                                {
                                    "k": "db.mongodb.collection",
                                    "v": {
                                        "s": "!##$"
                                    }
                                }
                            ],
                            "ev": [
                                {
                                    "t": "1572516672000000013",
                                    "a": [
                                        {
                                            "k": "te",
                                            "v": {
                                                "i": "1"
                                            }
                                        }
                                    ]
                                }
                            ]
                        }
                    ]
                }
            ]
        }
    ]
}
```
tigrannajaryan added a commit to tigrannajaryan/opentelemetry-proto that referenced this issue Jul 5, 2022
Resolves open-telemetry#412

This change sets short one or two letter keys for all fields when JSON
encoding is used. This results in about 1.3-1.5 times smaller uncompressed
payload.

Unfortunately this is a breaking change for default configuration of
Protobuf/JSON marshaler, which marshals field names in lowerCamelCase.
This is not a breaking change for marshalers which use the "OrigName=true"
JSON marshaling option. Nothing changes in the output when "OrigName=true"
is used.

I do not see an easy way to make this change gracefully. It will require
duplicating the entire proto, give the duplicate messages diffent names,
then handle the duplicates in the receivers. It is quite a lot of work
that can be also error prone. I think in this pasrticular case we should
not attenmpt to handle it gracefully and simply still to our formal
stability guarantees, which for JSON are "Alpha" and allow any changes
any time.

### Examlple Outputs

Current JSON output before this change, using default lowerCamelCase marshaler:

```json
{
    "resourceSpans": [
        {
            "resource": {
                "attributes": [
                    {
                        "key": "StartTimeUnixnano",
                        "value": {
                            "intValue": "12345678"
                        }
                    },
                    {
                        "key": "Pid",
                        "value": {
                            "intValue": "1234"
                        }
                    },
                    {
                        "key": "HostName",
                        "value": {
                            "stringValue": "fakehost"
                        }
                    },
                    {
                        "key": "ServiceName",
                        "value": {
                            "stringValue": "generator"
                        }
                    }
                ]
            },
            "scopeSpans": [
                {
                    "scope": {
                        "name": "io.opentelemetry"
                    },
                    "spans": [
                        {
                            "traceId": "AQAAAAAAAADw3rwKeFY0Eg==",
                            "spanId": "AQAAAAAAAAA=",
                            "name": "load-generator-span",
                            "kind": "SPAN_KIND_CLIENT",
                            "startTimeUnixNano": "1572516672000000013",
                            "endTimeUnixNano": "1572516672000000013",
                            "attributes": [
                                {
                                    "key": "db.mongodb.collection",
                                    "value": {
                                        "stringValue": "!##$"
                                    }
                                }
                            ],
                            "events": [
                                {
                                    "timeUnixNano": "1572516672000000013",
                                    "attributes": [
                                        {
                                            "key": "te",
                                            "value": {
                                                "intValue": "1"
                                            }
                                        }
                                    ]
                                }
                            ]
                        }
                    ]
                }
            ]
        }
    ]
}
```

JSON output before and after this change, using OrigName=true marshaler.

```json
{
    "resource_spans": [
        {
            "resource": {
                "attributes": [
                    {
                        "key": "StartTimeUnixnano",
                        "value": {
                            "int_value": "12345678"
                        }
                    },
                    {
                        "key": "Pid",
                        "value": {
                            "int_value": "1234"
                        }
                    },
                    {
                        "key": "HostName",
                        "value": {
                            "string_value": "fakehost"
                        }
                    },
                    {
                        "key": "ServiceName",
                        "value": {
                            "string_value": "generator"
                        }
                    }
                ]
            },
            "scope_spans": [
                {
                    "scope": {
                        "name": "io.opentelemetry"
                    },
                    "spans": [
                        {
                            "trace_id": "AQAAAAAAAADw3rwKeFY0Eg==",
                            "span_id": "AQAAAAAAAAA=",
                            "name": "load-generator-span",
                            "kind": "SPAN_KIND_CLIENT",
                            "start_time_unix_nano": "1572516672000000013",
                            "end_time_unix_nano": "1572516672000000013",
                            "attributes": [
                                {
                                    "key": "db.mongodb.collection",
                                    "value": {
                                        "string_value": "!##$"
                                    }
                                }
                            ],
                            "events": [
                                {
                                    "time_unix_nano": "1572516672000000013",
                                    "attributes": [
                                        {
                                            "key": "te",
                                            "value": {
                                                "int_value": "1"
                                            }
                                        }
                                    ]
                                }
                            ]
                        }
                    ]
                }
            ]
        }
    ]
}
```

JSON output after this change, using proposed short keys and default
lowerCamelCase marshaler:

```json
{
    "s": [
        {
            "r": {
                "a": [
                    {
                        "k": "StartTimeUnixnano",
                        "v": {
                            "i": "12345678"
                        }
                    },
                    {
                        "k": "Pid",
                        "v": {
                            "i": "1234"
                        }
                    },
                    {
                        "k": "HostName",
                        "v": {
                            "s": "fakehost"
                        }
                    },
                    {
                        "k": "ServiceName",
                        "v": {
                            "s": "generator"
                        }
                    }
                ]
            },
            "s": [
                {
                    "i": {
                        "n": "io.opentelemetry"
                    },
                    "s": [
                        {
                            "ti": "AQAAAAAAAADw3rwKeFY0Eg==",
                            "si": "AQAAAAAAAAA=",
                            "n": "load-generator-span",
                            "k": "SPAN_KIND_CLIENT",
                            "s": "1572516672000000013",
                            "e": "1572516672000000013",
                            "a": [
                                {
                                    "k": "db.mongodb.collection",
                                    "v": {
                                        "s": "!##$"
                                    }
                                }
                            ],
                            "ev": [
                                {
                                    "t": "1572516672000000013",
                                    "a": [
                                        {
                                            "k": "te",
                                            "v": {
                                                "i": "1"
                                            }
                                        }
                                    ]
                                }
                            ]
                        }
                    ]
                }
            ]
        }
    ]
}
```
tigrannajaryan added a commit to tigrannajaryan/opentelemetry-proto that referenced this issue Jul 5, 2022
Resolves open-telemetry#412

This change sets short one or two letter keys for all fields when JSON
encoding is used. This results in about 1.3-1.5 times smaller uncompressed
payload.

Unfortunately this is a breaking change for default configuration of
Protobuf/JSON marshaler, which marshals field names in lowerCamelCase.
This is not a breaking change for marshalers which use the "OrigName=true"
JSON marshaling option. Nothing changes in the output when "OrigName=true"
is used.

I do not see an easy way to make this change gracefully. It will require
duplicating the entire proto, give the duplicate messages diffent names,
then handle the duplicates in the receivers. It is quite a lot of work
that can be also error prone. I think in this pasrticular case we should
not attenmpt to handle it gracefully and simply still to our formal
stability guarantees, which for JSON are "Alpha" and allow any changes
any time.

### Examlple Outputs

Current JSON output before this change, using default lowerCamelCase marshaler:

```json
{
    "resourceSpans": [
        {
            "resource": {
                "attributes": [
                    {
                        "key": "StartTimeUnixnano",
                        "value": {
                            "intValue": "12345678"
                        }
                    },
                    {
                        "key": "Pid",
                        "value": {
                            "intValue": "1234"
                        }
                    },
                    {
                        "key": "HostName",
                        "value": {
                            "stringValue": "fakehost"
                        }
                    },
                    {
                        "key": "ServiceName",
                        "value": {
                            "stringValue": "generator"
                        }
                    }
                ]
            },
            "scopeSpans": [
                {
                    "scope": {
                        "name": "io.opentelemetry"
                    },
                    "spans": [
                        {
                            "traceId": "AQAAAAAAAADw3rwKeFY0Eg==",
                            "spanId": "AQAAAAAAAAA=",
                            "name": "load-generator-span",
                            "kind": "SPAN_KIND_CLIENT",
                            "startTimeUnixNano": "1572516672000000013",
                            "endTimeUnixNano": "1572516672000000013",
                            "attributes": [
                                {
                                    "key": "db.mongodb.collection",
                                    "value": {
                                        "stringValue": "!##$"
                                    }
                                }
                            ],
                            "events": [
                                {
                                    "timeUnixNano": "1572516672000000013",
                                    "attributes": [
                                        {
                                            "key": "te",
                                            "value": {
                                                "intValue": "1"
                                            }
                                        }
                                    ]
                                }
                            ]
                        }
                    ]
                }
            ]
        }
    ]
}
```

JSON output before and after this change, using OrigName=true marshaler.

```json
{
    "resource_spans": [
        {
            "resource": {
                "attributes": [
                    {
                        "key": "StartTimeUnixnano",
                        "value": {
                            "int_value": "12345678"
                        }
                    },
                    {
                        "key": "Pid",
                        "value": {
                            "int_value": "1234"
                        }
                    },
                    {
                        "key": "HostName",
                        "value": {
                            "string_value": "fakehost"
                        }
                    },
                    {
                        "key": "ServiceName",
                        "value": {
                            "string_value": "generator"
                        }
                    }
                ]
            },
            "scope_spans": [
                {
                    "scope": {
                        "name": "io.opentelemetry"
                    },
                    "spans": [
                        {
                            "trace_id": "AQAAAAAAAADw3rwKeFY0Eg==",
                            "span_id": "AQAAAAAAAAA=",
                            "name": "load-generator-span",
                            "kind": "SPAN_KIND_CLIENT",
                            "start_time_unix_nano": "1572516672000000013",
                            "end_time_unix_nano": "1572516672000000013",
                            "attributes": [
                                {
                                    "key": "db.mongodb.collection",
                                    "value": {
                                        "string_value": "!##$"
                                    }
                                }
                            ],
                            "events": [
                                {
                                    "time_unix_nano": "1572516672000000013",
                                    "attributes": [
                                        {
                                            "key": "te",
                                            "value": {
                                                "int_value": "1"
                                            }
                                        }
                                    ]
                                }
                            ]
                        }
                    ]
                }
            ]
        }
    ]
}
```

JSON output after this change, using proposed short keys and default
lowerCamelCase marshaler:

```json
{
    "s": [
        {
            "r": {
                "a": [
                    {
                        "k": "StartTimeUnixnano",
                        "v": {
                            "i": "12345678"
                        }
                    },
                    {
                        "k": "Pid",
                        "v": {
                            "i": "1234"
                        }
                    },
                    {
                        "k": "HostName",
                        "v": {
                            "s": "fakehost"
                        }
                    },
                    {
                        "k": "ServiceName",
                        "v": {
                            "s": "generator"
                        }
                    }
                ]
            },
            "s": [
                {
                    "i": {
                        "n": "io.opentelemetry"
                    },
                    "s": [
                        {
                            "ti": "AQAAAAAAAADw3rwKeFY0Eg==",
                            "si": "AQAAAAAAAAA=",
                            "n": "load-generator-span",
                            "k": "SPAN_KIND_CLIENT",
                            "s": "1572516672000000013",
                            "e": "1572516672000000013",
                            "a": [
                                {
                                    "k": "db.mongodb.collection",
                                    "v": {
                                        "s": "!##$"
                                    }
                                }
                            ],
                            "ev": [
                                {
                                    "t": "1572516672000000013",
                                    "a": [
                                        {
                                            "k": "te",
                                            "v": {
                                                "i": "1"
                                            }
                                        }
                                    ]
                                }
                            ]
                        }
                    ]
                }
            ]
        }
    ]
}
```
tigrannajaryan added a commit to tigrannajaryan/opentelemetry-proto that referenced this issue Jul 5, 2022
Resolves open-telemetry#412

This change sets short one or two letter keys for all fields when JSON
encoding is used. This results in about 1.3-1.5 times smaller uncompressed
payload.

Unfortunately this is a breaking change for default configuration of
Protobuf/JSON marshaller, which marshals field names in lowerCamelCase.
This is not a breaking change for marshallers which use the "OrigName=true"
JSON marshaling option. Nothing changes in the output when "OrigName=true"
is used.

I do not see an easy way to make this change gracefully. It will require
duplicating the entire proto, give the duplicate messages different names,
then handle the duplicates in the receivers. It is quite a lot of work
that can be also error-prone. I think in this particular case we should
not attempt to handle it gracefully and simply still to our formal
stability guarantees, which for JSON are "Alpha" and allow any changes
any time.

### Example Outputs

Current JSON output before this change, using default lowerCamelCase marshaller:

```json
{
    "resourceSpans": [
        {
            "resource": {
                "attributes": [
                    {
                        "key": "StartTimeUnixnano",
                        "value": {
                            "intValue": "12345678"
                        }
                    },
                    {
                        "key": "Pid",
                        "value": {
                            "intValue": "1234"
                        }
                    },
                    {
                        "key": "HostName",
                        "value": {
                            "stringValue": "fakehost"
                        }
                    },
                    {
                        "key": "ServiceName",
                        "value": {
                            "stringValue": "generator"
                        }
                    }
                ]
            },
            "scopeSpans": [
                {
                    "scope": {
                        "name": "io.opentelemetry"
                    },
                    "spans": [
                        {
                            "traceId": "AQAAAAAAAADw3rwKeFY0Eg==",
                            "spanId": "AQAAAAAAAAA=",
                            "name": "load-generator-span",
                            "kind": "SPAN_KIND_CLIENT",
                            "startTimeUnixNano": "1572516672000000013",
                            "endTimeUnixNano": "1572516672000000013",
                            "attributes": [
                                {
                                    "key": "db.mongodb.collection",
                                    "value": {
                                        "stringValue": "!##$"
                                    }
                                }
                            ],
                            "events": [
                                {
                                    "timeUnixNano": "1572516672000000013",
                                    "attributes": [
                                        {
                                            "key": "te",
                                            "value": {
                                                "intValue": "1"
                                            }
                                        }
                                    ]
                                }
                            ]
                        }
                    ]
                }
            ]
        }
    ]
}
```

JSON output before and after this change, using OrigName=true marshaller.

```json
{
    "resource_spans": [
        {
            "resource": {
                "attributes": [
                    {
                        "key": "StartTimeUnixnano",
                        "value": {
                            "int_value": "12345678"
                        }
                    },
                    {
                        "key": "Pid",
                        "value": {
                            "int_value": "1234"
                        }
                    },
                    {
                        "key": "HostName",
                        "value": {
                            "string_value": "fakehost"
                        }
                    },
                    {
                        "key": "ServiceName",
                        "value": {
                            "string_value": "generator"
                        }
                    }
                ]
            },
            "scope_spans": [
                {
                    "scope": {
                        "name": "io.opentelemetry"
                    },
                    "spans": [
                        {
                            "trace_id": "AQAAAAAAAADw3rwKeFY0Eg==",
                            "span_id": "AQAAAAAAAAA=",
                            "name": "load-generator-span",
                            "kind": "SPAN_KIND_CLIENT",
                            "start_time_unix_nano": "1572516672000000013",
                            "end_time_unix_nano": "1572516672000000013",
                            "attributes": [
                                {
                                    "key": "db.mongodb.collection",
                                    "value": {
                                        "string_value": "!##$"
                                    }
                                }
                            ],
                            "events": [
                                {
                                    "time_unix_nano": "1572516672000000013",
                                    "attributes": [
                                        {
                                            "key": "te",
                                            "value": {
                                                "int_value": "1"
                                            }
                                        }
                                    ]
                                }
                            ]
                        }
                    ]
                }
            ]
        }
    ]
}
```

JSON output after this change, using proposed short keys and default
lowerCamelCase marshaller:

```json
{
    "s": [
        {
            "r": {
                "a": [
                    {
                        "k": "StartTimeUnixnano",
                        "v": {
                            "i": "12345678"
                        }
                    },
                    {
                        "k": "Pid",
                        "v": {
                            "i": "1234"
                        }
                    },
                    {
                        "k": "HostName",
                        "v": {
                            "s": "fakehost"
                        }
                    },
                    {
                        "k": "ServiceName",
                        "v": {
                            "s": "generator"
                        }
                    }
                ]
            },
            "s": [
                {
                    "i": {
                        "n": "io.opentelemetry"
                    },
                    "s": [
                        {
                            "ti": "AQAAAAAAAADw3rwKeFY0Eg==",
                            "si": "AQAAAAAAAAA=",
                            "n": "load-generator-span",
                            "k": "SPAN_KIND_CLIENT",
                            "s": "1572516672000000013",
                            "e": "1572516672000000013",
                            "a": [
                                {
                                    "k": "db.mongodb.collection",
                                    "v": {
                                        "s": "!##$"
                                    }
                                }
                            ],
                            "ev": [
                                {
                                    "t": "1572516672000000013",
                                    "a": [
                                        {
                                            "k": "te",
                                            "v": {
                                                "i": "1"
                                            }
                                        }
                                    ]
                                }
                            ]
                        }
                    ]
                }
            ]
        }
    ]
}
```
tigrannajaryan added a commit to tigrannajaryan/opentelemetry-proto that referenced this issue Jul 5, 2022
Resolves open-telemetry#412

This change sets short one or two letter keys for all fields when JSON
encoding is used. This results in about 1.3-1.5 times smaller uncompressed
payload.

Here is size comparison using some sample data from exp-otelproto test bench:

```
===== Encoded sizes
Encoding                       Uncomp/json[Improved]   zlib/json[Improved]
OTLP 0.18/Logs                 52577 by   [1.000]      4601 by [1.000]
ShortKeys/Logs                 39668 by   [1.325]      4344 by [1.059]

Encoding                       Uncomp/json[Improved]   zlib/json[Improved]
OTLP 0.18/Trace/Attribs        41704 by   [1.000]      3189 by [1.000]
ShortKeys/Trace/Attribs        31998 by   [1.303]      3353 by [0.951]

Encoding                       Uncomp/json[Improved]   zlib/json[Improved]
OTLP 0.18/Trace/Events         49302 by   [1.000]      1917 by [1.000]
ShortKeys/Trace/Events         34396 by   [1.433]      2430 by [0.789]

Encoding                       Uncomp/json[Improved]   zlib/json[Improved]
OTLP 0.18/Metric/Histogram     42376 by   [1.000]      1067 by [1.000]
ShortKeys/Metric/Histogram     27071 by   [1.565]       839 by [1.272]

Encoding                       Uncomp/json[Improved]   zlib/json[Improved]
OTLP 0.18/Metric/MixOne       184836 by   [1.000]      2778 by [1.000]
ShortKeys/Metric/MixOne       119031 by   [1.553]      2143 by [1.296]

Encoding                       Uncomp/json[Improved]   zlib/json[Improved]
OTLP 0.18/Metric/MixSeries    707615 by   [1.000]     11482 by [1.000]
ShortKeys/Metric/MixSeries    457010 by   [1.548]      9829 by [1.168]
```

Unfortunately this is a breaking change for default configuration of
Protobuf/JSON marshaler, which marshals field names in lowerCamelCase.
This is not a breaking change for marshalers which use the "OrigName=true"
JSON marshaling option. Nothing changes in the output when "OrigName=true"
is used.

I do not see an easy way to make this change gracefully. It will require
duplicating the entire proto, give the duplicate messages diffent names,
then handle the duplicates in the receivers. It is quite a lot of work
that can be also error prone. I think in this pasrticular case we should
not attenmpt to handle it gracefully and simply still to our formal
stability guarantees, which for JSON are "Alpha" and allow any changes
any time.

### Examlple Outputs

Current JSON output before this change, using default lowerCamelCase marshaler:

```json
{
    "resourceSpans": [
        {
            "resource": {
                "attributes": [
                    {
                        "key": "StartTimeUnixnano",
                        "value": {
                            "intValue": "12345678"
                        }
                    },
                    {
                        "key": "Pid",
                        "value": {
                            "intValue": "1234"
                        }
                    },
                    {
                        "key": "HostName",
                        "value": {
                            "stringValue": "fakehost"
                        }
                    },
                    {
                        "key": "ServiceName",
                        "value": {
                            "stringValue": "generator"
                        }
                    }
                ]
            },
            "scopeSpans": [
                {
                    "scope": {
                        "name": "io.opentelemetry"
                    },
                    "spans": [
                        {
                            "traceId": "AQAAAAAAAADw3rwKeFY0Eg==",
                            "spanId": "AQAAAAAAAAA=",
                            "name": "load-generator-span",
                            "kind": "SPAN_KIND_CLIENT",
                            "startTimeUnixNano": "1572516672000000013",
                            "endTimeUnixNano": "1572516672000000013",
                            "attributes": [
                                {
                                    "key": "db.mongodb.collection",
                                    "value": {
                                        "stringValue": "!##$"
                                    }
                                }
                            ],
                            "events": [
                                {
                                    "timeUnixNano": "1572516672000000013",
                                    "attributes": [
                                        {
                                            "key": "te",
                                            "value": {
                                                "intValue": "1"
                                            }
                                        }
                                    ]
                                }
                            ]
                        }
                    ]
                }
            ]
        }
    ]
}
```

JSON output before and after this change, using OrigName=true marshaler.

```json
{
    "resource_spans": [
        {
            "resource": {
                "attributes": [
                    {
                        "key": "StartTimeUnixnano",
                        "value": {
                            "int_value": "12345678"
                        }
                    },
                    {
                        "key": "Pid",
                        "value": {
                            "int_value": "1234"
                        }
                    },
                    {
                        "key": "HostName",
                        "value": {
                            "string_value": "fakehost"
                        }
                    },
                    {
                        "key": "ServiceName",
                        "value": {
                            "string_value": "generator"
                        }
                    }
                ]
            },
            "scope_spans": [
                {
                    "scope": {
                        "name": "io.opentelemetry"
                    },
                    "spans": [
                        {
                            "trace_id": "AQAAAAAAAADw3rwKeFY0Eg==",
                            "span_id": "AQAAAAAAAAA=",
                            "name": "load-generator-span",
                            "kind": "SPAN_KIND_CLIENT",
                            "start_time_unix_nano": "1572516672000000013",
                            "end_time_unix_nano": "1572516672000000013",
                            "attributes": [
                                {
                                    "key": "db.mongodb.collection",
                                    "value": {
                                        "string_value": "!##$"
                                    }
                                }
                            ],
                            "events": [
                                {
                                    "time_unix_nano": "1572516672000000013",
                                    "attributes": [
                                        {
                                            "key": "te",
                                            "value": {
                                                "int_value": "1"
                                            }
                                        }
                                    ]
                                }
                            ]
                        }
                    ]
                }
            ]
        }
    ]
}
```

JSON output after this change, using proposed short keys and default
lowerCamelCase marshaler:

```json
{
    "s": [
        {
            "r": {
                "a": [
                    {
                        "k": "StartTimeUnixnano",
                        "v": {
                            "i": "12345678"
                        }
                    },
                    {
                        "k": "Pid",
                        "v": {
                            "i": "1234"
                        }
                    },
                    {
                        "k": "HostName",
                        "v": {
                            "s": "fakehost"
                        }
                    },
                    {
                        "k": "ServiceName",
                        "v": {
                            "s": "generator"
                        }
                    }
                ]
            },
            "s": [
                {
                    "i": {
                        "n": "io.opentelemetry"
                    },
                    "s": [
                        {
                            "ti": "AQAAAAAAAADw3rwKeFY0Eg==",
                            "si": "AQAAAAAAAAA=",
                            "n": "load-generator-span",
                            "k": "SPAN_KIND_CLIENT",
                            "s": "1572516672000000013",
                            "e": "1572516672000000013",
                            "a": [
                                {
                                    "k": "db.mongodb.collection",
                                    "v": {
                                        "s": "!##$"
                                    }
                                }
                            ],
                            "ev": [
                                {
                                    "t": "1572516672000000013",
                                    "a": [
                                        {
                                            "k": "te",
                                            "v": {
                                                "i": "1"
                                            }
                                        }
                                    ]
                                }
                            ]
                        }
                    ]
                }
            ]
        }
    ]
}
```
tigrannajaryan added a commit to tigrannajaryan/opentelemetry-proto that referenced this issue Jul 5, 2022
Resolves open-telemetry#412

This change sets short one or two letter keys for all fields when JSON
encoding is used. This results in about 1.3-1.5 times smaller uncompressed
payload.

Here is size comparison using some sample data from exp-otelproto test bench:

```
===== Encoded sizes
Encoding                       Uncomp/json[Improved]   zlib/json[Improved]
OTLP 0.18/Logs                 52577 by   [1.000]      4601 by [1.000]
ShortKeys/Logs                 39668 by   [1.325]      4344 by [1.059]

Encoding                       Uncomp/json[Improved]   zlib/json[Improved]
OTLP 0.18/Trace/Attribs        41704 by   [1.000]      3189 by [1.000]
ShortKeys/Trace/Attribs        31998 by   [1.303]      3060 by [1.042]

Encoding                       Uncomp/json[Improved]   zlib/json[Improved]
OTLP 0.18/Trace/Events         49302 by   [1.000]      1917 by [1.000]
ShortKeys/Trace/Events         34396 by   [1.433]      1806 by [1.061]

Encoding                       Uncomp/json[Improved]   zlib/json[Improved]
OTLP 0.18/Metric/Histogram     42376 by   [1.000]      1067 by [1.000]
ShortKeys/Metric/Histogram     27071 by   [1.565]       839 by [1.272]

Encoding                       Uncomp/json[Improved]   zlib/json[Improved]
OTLP 0.18/Metric/MixOne       184836 by   [1.000]      2778 by [1.000]
ShortKeys/Metric/MixOne       119031 by   [1.553]      2143 by [1.296]

Encoding                       Uncomp/json[Improved]   zlib/json[Improved]
OTLP 0.18/Metric/MixSeries    707615 by   [1.000]     11482 by [1.000]
ShortKeys/Metric/MixSeries    457010 by   [1.548]      9829 by [1.168]
```

Unfortunately **this is a breaking change** for default configuration of
Protobuf/JSON marshaler, which marshals field names in lowerCamelCase.
This is not a breaking change for marshalers which use the "OrigName=true"
JSON marshalling option. Nothing changes in the output when "OrigName=true"
is used.

I do not see an easy way to make this change gracefully. It will require
duplicating the entire proto, give the duplicate messages different names,
then handle the duplicates in the receivers. It is quite a lot of work
that can be also error prone. I think in this particular case we should
not attempt to handle it gracefully and simply still to our formal
stability guarantees, which for JSON are "Alpha" and allow any changes
any time.

### Example Outputs

Current JSON output before this change, using default lowerCamelCase marshaler:

```json
{
    "resourceSpans": [
        {
            "resource": {
                "attributes": [
                    {
                        "key": "StartTimeUnixnano",
                        "value": {
                            "intValue": "12345678"
                        }
                    },
                    {
                        "key": "Pid",
                        "value": {
                            "intValue": "1234"
                        }
                    },
                    {
                        "key": "HostName",
                        "value": {
                            "stringValue": "fakehost"
                        }
                    },
                    {
                        "key": "ServiceName",
                        "value": {
                            "stringValue": "generator"
                        }
                    }
                ]
            },
            "scopeSpans": [
                {
                    "scope": {
                        "name": "io.opentelemetry"
                    },
                    "spans": [
                        {
                            "traceId": "AQAAAAAAAADw3rwKeFY0Eg==",
                            "spanId": "AQAAAAAAAAA=",
                            "name": "load-generator-span",
                            "kind": "SPAN_KIND_CLIENT",
                            "startTimeUnixNano": "1572516672000000013",
                            "endTimeUnixNano": "1572516672000000013",
                            "attributes": [
                                {
                                    "key": "db.mongodb.collection",
                                    "value": {
                                        "stringValue": "!##$"
                                    }
                                }
                            ],
                            "events": [
                                {
                                    "timeUnixNano": "1572516672000000013",
                                    "attributes": [
                                        {
                                            "key": "te",
                                            "value": {
                                                "intValue": "1"
                                            }
                                        }
                                    ]
                                }
                            ]
                        }
                    ]
                }
            ]
        }
    ]
}
```

JSON output before and after this change, using OrigName=true marshaler.

```json
{
    "resource_spans": [
        {
            "resource": {
                "attributes": [
                    {
                        "key": "StartTimeUnixnano",
                        "value": {
                            "int_value": "12345678"
                        }
                    },
                    {
                        "key": "Pid",
                        "value": {
                            "int_value": "1234"
                        }
                    },
                    {
                        "key": "HostName",
                        "value": {
                            "string_value": "fakehost"
                        }
                    },
                    {
                        "key": "ServiceName",
                        "value": {
                            "string_value": "generator"
                        }
                    }
                ]
            },
            "scope_spans": [
                {
                    "scope": {
                        "name": "io.opentelemetry"
                    },
                    "spans": [
                        {
                            "trace_id": "AQAAAAAAAADw3rwKeFY0Eg==",
                            "span_id": "AQAAAAAAAAA=",
                            "name": "load-generator-span",
                            "kind": "SPAN_KIND_CLIENT",
                            "start_time_unix_nano": "1572516672000000013",
                            "end_time_unix_nano": "1572516672000000013",
                            "attributes": [
                                {
                                    "key": "db.mongodb.collection",
                                    "value": {
                                        "string_value": "!##$"
                                    }
                                }
                            ],
                            "events": [
                                {
                                    "time_unix_nano": "1572516672000000013",
                                    "attributes": [
                                        {
                                            "key": "te",
                                            "value": {
                                                "int_value": "1"
                                            }
                                        }
                                    ]
                                }
                            ]
                        }
                    ]
                }
            ]
        }
    ]
}
```

JSON output after this change, using proposed short keys and default
lowerCamelCase marshaler:

```json
{
    "s": [
        {
            "r": {
                "a": [
                    {
                        "k": "StartTimeUnixnano",
                        "v": {
                            "i": "12345678"
                        }
                    },
                    {
                        "k": "Pid",
                        "v": {
                            "i": "1234"
                        }
                    },
                    {
                        "k": "HostName",
                        "v": {
                            "s": "fakehost"
                        }
                    },
                    {
                        "k": "ServiceName",
                        "v": {
                            "s": "generator"
                        }
                    }
                ]
            },
            "s": [
                {
                    "i": {
                        "n": "io.opentelemetry"
                    },
                    "s": [
                        {
                            "ti": "AQAAAAAAAADw3rwKeFY0Eg==",
                            "si": "AQAAAAAAAAA=",
                            "n": "load-generator-span",
                            "k": "SPAN_KIND_CLIENT",
                            "s": "1572516672000000013",
                            "e": "1572516672000000013",
                            "a": [
                                {
                                    "k": "db.mongodb.collection",
                                    "v": {
                                        "s": "!##$"
                                    }
                                }
                            ],
                            "ev": [
                                {
                                    "t": "1572516672000000013",
                                    "a": [
                                        {
                                            "k": "te",
                                            "v": {
                                                "i": "1"
                                            }
                                        }
                                    ]
                                }
                            ]
                        }
                    ]
                }
            ]
        }
    ]
}
```
@Oberon00
Copy link
Member

Oberon00 commented Jul 6, 2022

We could even use the numeric tag values from the binary protocol (stringized) as JSON keys (decimal or hex or base64). Then we would also get more stable / renaming-resistant keys.

@tigrannajaryan
Copy link
Member Author

To summarize the findings in the PR:

  • There is an option to do gzip compression using CompressionStreams API available in Chromium only.
  • There is an option to use binary Protobuf, and https://github.com/protobufjs/protobuf.js/ is 6.5KB and claims performance faster than native JSON. Protobuf is going to be more compact than JSON with short keys (about 1.6-2.2 in my benchmarks).

We also discussed this in the Spec meeting yesterday and no-one in the call believed that we need the JSON with short keys given that other options are available. JSON with short keys loses its readability, while not gaining as much in the performance as the other options we have.

Given the above, I am marking this issue as "won't do". If there are new arguments the issue can be re-opened (but it must happen before we declare JSON format Stable).

@tigrannajaryan tigrannajaryan added the wontfix This will not be worked on label Jul 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
wontfix This will not be worked on
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants