Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(eap): fix bug with data_present #6726

Merged
merged 3 commits into from
Jan 8, 2025
Merged

Conversation

davidtsuk
Copy link
Contributor

@davidtsuk davidtsuk commented Jan 8, 2025

Fixes https://github.com/getsentry/eap-planning/issues/144

Additional Context

When we perform aggregations over attributes, the function is converted into function_nameIf (e.g. countIf) which returns a default value if no rows match the condition. This means that there is no way to distinguish between an aggregate result that is 0 because it's for example a sum of values that add up to 0, and a result that is 0 because no values matched the given condition. To deal with this, we compute the number of events being aggregated even when we aren't extrapolating so we can determine if data was present or not.

@davidtsuk davidtsuk requested review from a team as code owners January 8, 2025 17:38
Copy link

codecov bot commented Jan 8, 2025

❌ 1 Tests Failed:

Tests completed Failed Passed Skipped
2741 1 2740 6
View the top 1 failed tests by shortest run time
tests.web.rpc.v1.test_endpoint_time_series.test_endpoint_time_series.TestTimeSeriesApi::test_with_no_data_present
Stack Traces | 0.277s run time
Traceback (most recent call last):
  File ".../v1/test_endpoint_time_series/test_endpoint_time_series.py", line 417, in test_with_no_data_present
    assert sorted(response.result_timeseries, key=lambda x: x.label) == [
AssertionError: assert [label: "avg"\nbuckets {\n  seconds: 1736236800\n}\nbuckets {\n  seconds: 1736237100\n}\nbuckets {\n  seconds: 1736237400\n}\nbuckets {\n  seconds: 1736237700\n}\nbuckets {\n  seconds: 1736238000\n}\nbuckets {\n  seconds: 1736238300\n}\ndata_points {\n}\ndata_points {\n}\ndata_points {\n}\ndata_points {\n}\ndata_points {\n}\ndata_points {\n}\n,\n label: "sum"\nbuckets {\n  seconds: 1736236800\n}\nbuckets {\n  seconds: 1736237100\n}\nbuckets {\n  seconds: 1736237400\n}\nbuckets {\n  seconds: 1736237700\n}\nbuckets {\n  seconds: 1736238000\n}\nbuckets {\n  seconds: 1736238300\n}\ndata_points {\n}\ndata_points {\n}\ndata_points {\n}\ndata_points {\n}\ndata_points {\n}\ndata_points {\n}\n] == [label: "avg"\nbuckets {\n  seconds: 1736236800\n}\nbuckets {\n  seconds: 1736237100\n}\nbuckets {\n  seconds: 1736237400\n}\nbuckets {\n  seconds: 1736237700\n}\nbuckets {\n  seconds: 1736238000\n}\nbuckets {\n  seconds: 1736238300\n}\ndata_points {\n  data: 1\n  data_present: true\n}\ndata_points {\n}\ndata_points {\n}\ndata_points {\n}\ndata_points {\n}\ndata_points {\n}\n,\n label: "sum"\nbuckets {\n  seconds: 1736236800\n}\nbuckets {\n  seconds: 1736237100\n}\nbuckets {\n  seconds: 1736237400\n}\nbuckets {\n  seconds: 1736237700\n}\nbuckets {\n  seconds: 1736238000\n}\nbuckets {\n  seconds: 1736238300\n}\ndata_points {\n  data: 1\n  data_present: true\n}\ndata_points {\n}\ndata_points {\n}\ndata_points {\n}\ndata_points {\n}\ndata_points {\n}\n]
  At index 0 diff: label: "avg"\nbuckets {\n  seconds: 1736236800\n}\nbuckets {\n  seconds: 1736237100\n}\nbuckets {\n  seconds: 1736237400\n}\nbuckets {\n  seconds: 1736237700\n}\nbuckets {\n  seconds: 1736238000\n}\nbuckets {\n  seconds: 1736238300\n}\ndata_points {\n}\ndata_points {\n}\ndata_points {\n}\ndata_points {\n}\ndata_points {\n}\ndata_points {\n}\n != label: "avg"\nbuckets {\n  seconds: 1736236800\n}\nbuckets {\n  seconds: 1736237100\n}\nbuckets {\n  seconds: 1736237400\n}\nbuckets {\n  seconds: 1736237700\n}\nbuckets {\n  seconds: 1736238000\n}\nbuckets {\n  seconds: 1736238300\n}\ndata_points {\n  data: 1\n  data_present: true\n}\ndata_points {\n}\ndata_points {\n}\ndata_points {\n}\ndata_points {\n}\ndata_points {\n}\n
  Full diff:
    [
     label: "avg"
    buckets {
      seconds: 1736236800
    }
    buckets {
      seconds: 1736237100
    }
    buckets {
      seconds: 1736237400
    }
    buckets {
      seconds: 1736237700
    }
    buckets {
      seconds: 1736238000
    }
    buckets {
      seconds: 1736238300
    }
    data_points {
  -   data: 1
  -   data_present: true
    }
    data_points {
    }
    data_points {
    }
    data_points {
    }
    data_points {
    }
    data_points {
    }
    ,
     label: "sum"
    buckets {
      seconds: 1736236800
    }
    buckets {
      seconds: 1736237100
    }
    buckets {
      seconds: 1736237400
    }
    buckets {
      seconds: 1736237700
    }
    buckets {
      seconds: 1736238000
    }
    buckets {
      seconds: 1736238300
    }
    data_points {
  -   data: 1
  -   data_present: true
    }
    data_points {
    }
    data_points {
    }
    data_points {
    }
    data_points {
    }
    data_points {
    }
    ,
    ]

To view more test analytics, go to the Test Analytics Dashboard
📢 Thoughts on this report? Let us know!

Copy link
Member

@onkar onkar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -46,7 +46,7 @@ class ExtrapolationContext(ABC):
sample_count: int

@property
def extrapolated_data_present(self) -> bool:
def data_present(self) -> bool:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: should this be renamed for is_data_present since it returns an answer to this question?

Comment on lines 112 to 116
return GenericExtrapolationContext(
value=value,
confidence_interval=None,
average_sample_rate=0,
sample_count=0,
sample_count=sample_count,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering if this return can be collapsed with the return on L130 for better readability. All we are checking is if confidence_interval is None and if so, we pass None.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah you are right, it can be collapsed, nice catch

@davidtsuk davidtsuk merged commit 33453a0 into master Jan 8, 2025
31 checks passed
@davidtsuk davidtsuk deleted the david/fix/data-present-bug branch January 8, 2025 21:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants