Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Concurrent map read/write in common/ast/expr.go #881

Open
1 of 3 tasks
tonyhb opened this issue Jan 9, 2024 · 5 comments · Fixed by #887
Open
1 of 3 tasks

Concurrent map read/write in common/ast/expr.go #881

tonyhb opened this issue Jan 9, 2024 · 5 comments · Fixed by #887
Assignees

Comments

@tonyhb
Copy link

tonyhb commented Jan 9, 2024

Describe the bug
Evaluating a program concurrently, even in different envs, may fail, when using a cached AST:

==================
WARNING: DATA RACE
Write at 0x00c000d97c58 by goroutine 6297:
  github.com/google/cel-go/common/ast.(*expr).SetKindCase()
      /home/eng/prj-vm/inngest/inngest/vendor/github.com/google/cel-go/common/ast/expr.go:389 +0x208
  github.com/google/cel-go/checker.(*checker).checkCall()
      /home/eng/prj-vm/inngest/inngest/vendor/github.com/google/cel-go/checker/checker.go:233 +0x634
  github.com/google/cel-go/checker.(*checker).check()
      /home/eng/prj-vm/inngest/inngest/vendor/github.com/google/cel-go/checker/checker.go:90 +0x70
  github.com/google/cel-go/checker.(*checker).checkCall()
      /home/eng/prj-vm/inngest/inngest/vendor/github.com/google/cel-go/checker/checker.go:220 +0xf4
  github.com/google/cel-go/checker.(*checker).check()
      /home/eng/prj-vm/inngest/inngest/vendor/github.com/google/cel-go/checker/checker.go:90 +0x70
  github.com/google/cel-go/checker.(*checker).checkComprehension()
      /home/eng/prj-vm/inngest/inngest/vendor/github.com/google/cel-go/checker/checker.go:531 +0xcbc
  github.com/google/cel-go/checker.(*checker).check()
      /home/eng/prj-vm/inngest/inngest/vendor/github.com/google/cel-go/checker/checker.go:98 +0x84
  github.com/google/cel-go/checker.(*checker).checkCall()
      /home/eng/prj-vm/inngest/inngest/vendor/github.com/google/cel-go/checker/checker.go:220 +0xf4
  github.com/google/cel-go/checker.(*checker).check()
      /home/eng/prj-vm/inngest/inngest/vendor/github.com/google/cel-go/checker/checker.go:90 +0x70
  github.com/google/cel-go/checker.Check()
      /home/eng/prj-vm/inngest/inngest/vendor/github.com/google/cel-go/checker/checker.go:61 +0x3cc
  github.com/google/cel-go/cel.(*Env).Check()
      /home/eng/prj-vm/inngest/inngest/vendor/github.com/google/cel-go/cel/env.go:224 +0x2d0
  github.com/inngest/expr.(*cachingCompiler).Compile()
      /home/eng/prj-vm/inngest/inngest/vendor/github.com/inngest/expr/caching_compiler.go:80 +0x94
  github.com/inngest/inngest/pkg/expressions.NewExpressionEvaluator()
      /home/eng/prj-vm/inngest/inngest/pkg/expressions/expressions.go:134 +0x78
  github.com/inngest/inngest/pkg/expressions.Evaluate()
      /home/eng/prj-vm/inngest/inngest/pkg/expressions/expressions.go:110 +0x48
  github.com/inngest/inngest/pkg/expressions.TestEvaluateExpression.func1.1()
      /home/eng/prj-vm/inngest/inngest/pkg/expressions/expressions_test.go:1006 +0x6c

Previous read at 0x00c000d97c58 by goroutine 6295:
  github.com/google/cel-go/common/ast.(*expr).Kind()
      /home/eng/prj-vm/inngest/inngest/vendor/github.com/google/cel-go/common/ast/expr.go:316 +0x38
  github.com/google/cel-go/interpreter.(*planner).Plan()
      /home/eng/prj-vm/inngest/inngest/vendor/github.com/google/cel-go/interpreter/planner.go:75 +0x38
  github.com/google/cel-go/interpreter.(*planner).planCall()
      /home/eng/prj-vm/inngest/inngest/vendor/github.com/google/cel-go/interpreter/planner.go:230 +0x1cc
  github.com/google/cel-go/interpreter.(*planner).Plan()
      /home/eng/prj-vm/inngest/inngest/vendor/github.com/google/cel-go/interpreter/planner.go:77 +0x64
  github.com/google/cel-go/interpreter.(*planner).planComprehension()
      /home/eng/prj-vm/inngest/inngest/vendor/github.com/google/cel-go/interpreter/planner.go:593 +0xe4
  github.com/google/cel-go/interpreter.(*planner).Plan()
      /home/eng/prj-vm/inngest/inngest/vendor/github.com/google/cel-go/interpreter/planner.go:91 +0xbc
  github.com/google/cel-go/interpreter.(*planner).planCall()
      /home/eng/prj-vm/inngest/inngest/vendor/github.com/google/cel-go/interpreter/planner.go:230 +0x1cc
  github.com/google/cel-go/interpreter.(*planner).Plan()
      /home/eng/prj-vm/inngest/inngest/vendor/github.com/google/cel-go/interpreter/planner.go:77 +0x64
  github.com/google/cel-go/interpreter.(*exprInterpreter).NewInterpretable()
      /home/eng/prj-vm/inngest/inngest/vendor/github.com/google/cel-go/interpreter/interpreter.go:184 +0x1fc
  github.com/google/cel-go/cel.(*prog).initInterpretable()
      /home/eng/prj-vm/inngest/inngest/vendor/github.com/google/cel-go/cel/program.go:257 +0x90
  github.com/google/cel-go/cel.newProgram.func1()
      /home/eng/prj-vm/inngest/inngest/vendor/github.com/google/cel-go/cel/program.go:248 +0xb3c
  github.com/google/cel-go/cel.(*progGen).Eval()
      /home/eng/prj-vm/inngest/inngest/vendor/github.com/google/cel-go/cel/program.go:365 +0x158
  github.com/inngest/inngest/pkg/expressions.eval()
      /home/eng/prj-vm/inngest/inngest/pkg/expressions/cel.go:85 +0x84
  github.com/inngest/inngest/pkg/expressions.(*expressionEvaluator).Evaluate()
      /home/eng/prj-vm/inngest/inngest/pkg/expressions/expressions.go:248 +0x1d0
  github.com/inngest/inngest/pkg/expressions.Evaluate()
      /home/eng/prj-vm/inngest/inngest/pkg/expressions/expressions.go:115 +0xc4
  github.com/inngest/inngest/pkg/expressions.TestEvaluateExpression.func1.1()
      /home/eng/prj-vm/inngest/inngest/pkg/expressions/expressions_test.go:1006 +0x6c

Goroutine 6297 (running) created at:
  github.com/inngest/inngest/pkg/expressions.TestEvaluateExpression.func1()
      /home/eng/prj-vm/inngest/inngest/pkg/expressions/expressions_test.go:1005 +0x48
  testing.tRunner()
      /home/eng/go/src/testing/testing.go:1595 +0x194
  testing.(*T).Run.func1()
      /home/eng/go/src/testing/testing.go:1648 +0x40

Goroutine 6295 (finished) created at:
  github.com/inngest/inngest/pkg/expressions.TestEvaluateExpression.func1()
      /home/eng/prj-vm/inngest/inngest/pkg/expressions/expressions_test.go:1005 +0x48
  testing.tRunner()
      /home/eng/go/src/testing/testing.go:1595 +0x194
  testing.(*T).Run.func1()
      /home/eng/go/src/testing/testing.go:1648 +0x40
==================

To Reproduce

  1. Parse an AST
  2. Concurrently evaluate the AST in a new env 100 times in a simple loop:
  3. See the above error.
  • parser
  • checker
  • interpreter

Alternatively, to repro:

git clone [email protected]:inngest/inngest.git
git checkout 1219a37b902649ba87e3d7f1d58aa0f8c224a2c7
go test ./pkg/expressions/ -race

Any expression fails: 5+4 or event.data.name + '-' + event.data.foo.

Expected behavior

Data races don't happen, and we can execute cached ASTs concurrently. Parsing into ASTs via antlr is slow, and we cache strings -> ASTs for speed.

@tonyhb
Copy link
Author

tonyhb commented Jan 9, 2024

Looks as though calling Check concurrently on the same AST causes the issue. Changing the CEL call from Compile to Parse fixes the race (which exposes another in the test rig 😅).

Edit:

For more context, we run billions of expressions per hour (which is being optimized away). We've noticed that calling Parse() is slow; we do two optimizations and cache the resulting ASTs.

Because we cache the ASTs, the resulting AST may be used with Compile() concurrently. Compile causes Check, which is not thread safe.

The only way we can validate expressions is to use non-caching compilation to call Check, omitting the optimization. This is okay for us: validation happens at save time and can slow down API requests, while evaluation is the hot path.

@TristonianJones
Copy link
Collaborator

TristonianJones commented Jan 9, 2024

@tonyhb can I ask why you're doing concurrent checks against the same AST? This is pretty unusual. You should just be able to cache the checked output rather than rechecking.

@tonyhb
Copy link
Author

tonyhb commented Jan 9, 2024

Nothing other than history! We were calling Compile instead of Parse prior to caching, and that was kept around. It's now changed, and we do exactly as you say.

@TristonianJones
Copy link
Collaborator

@tonyhb I can make the check concurrency safe. I just hadn't expected that people would be concurrently checking the same AST. That said, if you cached the checked AST, then that should side-step the issue.

@TristonianJones
Copy link
Collaborator

It looks like some users were relying on the AST being mutable. I'll have to find another way to isolate the checked AST creation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants