-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql: add inverted indexes on arrays #45157
Conversation
74d25be
to
afc6456
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis, @RaduBerinde, and @rohany)
pkg/sql/logictest/testdata/logic_test/inverted_index, line 696 at r2 (raw file):
statement ok DROP TABLE table_with_nulls
Can we also add a few relevant optimizer tests in opt/xform/testdata/rules/select
in the GenerateInvertedIndexScans
section? There's overlap with the execbuilder
tests, but in the optimizer tests we're specifically testing that the logic in GenerateInvertedIndexScans
handles Array inverted indexes in addition to JSON. So would probably require just a couple sanity check cases at that level.
pkg/sql/opt/idxconstraint/index_constraints.go, line 759 at r2 (raw file):
} func (c *indexConstraintCtx) makeInvertedIndexSpansForJSONExpr(
Would be good to have at least a simple comment on this method.
pkg/sql/opt/idxconstraint/index_constraints.go, line 873 at r2 (raw file):
case types.JsonFamily: return c.makeInvertedIndexSpansForJSONExpr(nd, constraints, allPaths, rightDatum) case types.ArrayFamily:
I'd expect idxconstraint/testdata
tests for this new functionality. "Onion testing" - test at each layer of abstraction.
pkg/sql/opt/idxconstraint/index_constraints.go, line 874 at r2 (raw file):
return c.makeInvertedIndexSpansForJSONExpr(nd, constraints, allPaths, rightDatum) case types.ArrayFamily: // We're going to make one span to search for every value inside of the
I think this code should be broken out into a helper method, analogous to the JSON method.
pkg/sql/sqlbase/index_encoding.go, line 1237 at r2 (raw file):
// have 0 or >1 entries, though, as well as secondary indexes which store // columns from multiple column families. secondaryIndexEntries = append(secondaryIndexEntries, entries[i])
Wouldn't this just be:
secondaryIndexEntries = append(secondaryIndexEntries, entries)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis, @RaduBerinde, and @rohany)
pkg/cli/testdata/dump/inverted_index, line 6 at r1 (raw file):
a JSON, b JSON, c INT[],
the spacing seems off here (missing tabs or something)
pkg/cli/testdata/dump/inverted_index, line 25 at r1 (raw file):
a JSONB NULL, b JSONB NULL, c INT[] NULL,
same as above
pkg/sql/logictest/testdata/logic_test/inverted_index, line 716 at r2 (raw file):
statement ok CREATE INDEX ON c USING GIN (bar)
add a test where the backfiller runs and has to add non null array rows
pkg/sql/logictest/testdata/logic_test/inverted_index, line 749 at r2 (raw file):
query error unsupported comparison operator SELECT * FROM c WHERE foo @> 0
I'm confused about this syntax. So we can only test containment for singleton arrays? or would be in the future allow for containment checks of non singleton arrays? I find it unintuitive that we have to do array @> array[element]
rather than array @> element
.
pkg/sql/row/updater.go, line 382 at r2 (raw file):
ru.Fks.addCheckForIndex(ru.Helper.TableDesc.PrimaryIndex.ID, ru.Helper.TableDesc.PrimaryIndex.Type) for i := range ru.Helper.Indexes { if ru.Helper.Indexes[i].Type == sqlbase.IndexDescriptor_INVERTED {
Was this missing / or deleted accidentally?
pkg/sql/sem/builtins/builtins.go, line 3634 at r2 (raw file):
}, tree.Overload{ Types: tree.ArgTypes{{"val", types.AnyArray}},
Is it important to test this function explicitly? Also, might be worth to drop a comment in the key encoder to update this if the implementation changes.
pkg/sql/sqlbase/index_encoding.go, line 879 at r2 (raw file):
// prefixed to all returned keys. // N.B.: This won't return any keys for func encodeArrayInvertedIndexTableKeys(val *tree.DArray, inKey []byte) (key [][]byte, err error) {
I'm making a change to EncodeTableKey
in #44771. I don't think it should impact this work, but it's worth a double check.
pkg/sql/sqlbase/index_encoding.go, line 893 at r2 (raw file):
return nil, err } outKey := make([]byte, len(inKey), len(inKey)+len(newKey))
Can we allocate the new outkey, copy inKey into it, and then encodeTableKey into that buffer? Right now we are allocating outkey and newKey for no reason.
pkg/sql/sqlbase/index_encoding.go, line 1235 at r2 (raw file):
for i := range entries { // Normally, each index will have exactly one entry. Inverted indexes can // have 0 or >1 entries, though, as well as secondary indexes which store
This sentence seems wonky
afc6456
to
c27e587
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the reviews, please take another look. I split a bunch of this PR into #45564, so that one needs to be reviewed first.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andy-kimball, @jordanlewis, @RaduBerinde, and @rohany)
pkg/cli/testdata/dump/inverted_index, line 6 at r1 (raw file):
Previously, rohany (Rohan Yadav) wrote…
the spacing seems off here (missing tabs or something)
Done.
pkg/cli/testdata/dump/inverted_index, line 25 at r1 (raw file):
Previously, rohany (Rohan Yadav) wrote…
same as above
Done.
pkg/sql/logictest/testdata/logic_test/inverted_index, line 696 at r2 (raw file):
Previously, andy-kimball (Andy Kimball) wrote…
Can we also add a few relevant optimizer tests in
opt/xform/testdata/rules/select
in theGenerateInvertedIndexScans
section? There's overlap with theexecbuilder
tests, but in the optimizer tests we're specifically testing that the logic inGenerateInvertedIndexScans
handles Array inverted indexes in addition to JSON. So would probably require just a couple sanity check cases at that level.
Done.
pkg/sql/logictest/testdata/logic_test/inverted_index, line 716 at r2 (raw file):
Previously, rohany (Rohan Yadav) wrote…
add a test where the backfiller runs and has to add non null array rows
Done.
pkg/sql/logictest/testdata/logic_test/inverted_index, line 749 at r2 (raw file):
Previously, rohany (Rohan Yadav) wrote…
I'm confused about this syntax. So we can only test containment for singleton arrays? or would be in the future allow for containment checks of non singleton arrays? I find it unintuitive that we have to do
array @> array[element]
rather thanarray @> element
.
Yeah, this is just how @>
works - it's funky. Contains in Postgres world compares the container type to itself. array @> element isn't a supported comparison.
array @> array is also supported for non singletons on the RHS, which means that the LHS must contain all elements in the RHS. Under the hood we turn that into a conjunction of singleton @> (i think) - and it turns into a zigzag join. I'll add some tests for that case as well.
pkg/sql/opt/idxconstraint/index_constraints.go, line 759 at r2 (raw file):
Previously, andy-kimball (Andy Kimball) wrote…
Would be good to have at least a simple comment on this method.
Done.
pkg/sql/opt/idxconstraint/index_constraints.go, line 874 at r2 (raw file):
Previously, andy-kimball (Andy Kimball) wrote…
I think this code should be broken out into a helper method, analogous to the JSON method.
Done.
pkg/sql/row/updater.go, line 382 at r2 (raw file):
Previously, rohany (Rohan Yadav) wrote…
Was this missing / or deleted accidentally?
I think it was already implicitly true via another check that I got rid of.
pkg/sql/sem/builtins/builtins.go, line 3634 at r2 (raw file):
Previously, rohany (Rohan Yadav) wrote…
Is it important to test this function explicitly? Also, might be worth to drop a comment in the key encoder to update this if the implementation changes.
Good idea. This actually caught a bug - thanks for pointing it out.
pkg/sql/sqlbase/index_encoding.go, line 879 at r2 (raw file):
Previously, rohany (Rohan Yadav) wrote…
I'm making a change to
EncodeTableKey
in #44771. I don't think it should impact this work, but it's worth a double check.
Done.
pkg/sql/sqlbase/index_encoding.go, line 893 at r2 (raw file):
Previously, rohany (Rohan Yadav) wrote…
Can we allocate the new outkey, copy inKey into it, and then encodeTableKey into that buffer? Right now we are allocating outkey and newKey for no reason.
Done.
pkg/sql/sqlbase/index_encoding.go, line 1235 at r2 (raw file):
Previously, rohany (Rohan Yadav) wrote…
This sentence seems wonky
Done.
pkg/sql/sqlbase/index_encoding.go, line 1237 at r2 (raw file):
Previously, andy-kimball (Andy Kimball) wrote…
Wouldn't this just be:
secondaryIndexEntries = append(secondaryIndexEntries, entries)
I think you need ...
syntax, but yes, that's a good point - fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andy-kimball, @RaduBerinde, and @rohany)
pkg/sql/opt/idxconstraint/index_constraints.go, line 873 at r2 (raw file):
Previously, andy-kimball (Andy Kimball) wrote…
I'd expect
idxconstraint/testdata
tests for this new functionality. "Onion testing" - test at each layer of abstraction.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs a version gate as well
|
||
# Make sure that empty arrays do not emit any keys at all. | ||
query T kvtrace | ||
INSERT INTO e VALUES(1, ARRAY[]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what happens with array of null -- do we emit a key there?
@@ -3622,6 +3623,33 @@ may increase either contention or retry errors, or both.`, | |||
}, | |||
Info: "This function is used only by CockroachDB's developers for testing purposes.", | |||
}, | |||
tree.Overload{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like it's out of date given we don't emit keys for duplicate entries
45140: sql: use CommandResult for displaying ParameterStatus updates r=andreimatei a=otan We were previously sending ParameterStatus using a listener. This is refactored to talk using CommandResult instead, with results only being buffered if "Close()" is called. To do this, a new `AppendParamStatusUpdate` function to `RestrictedCommandResult`. Since setting vars live in `sessionDataMutator`, we add a new `paramStatusUpdater` interface which uses the `RestrictedCommandResult`. This var is reset each time we execute with an openTxn with the CommandResult used for it. Also renamed "StatusParam" to "ParamStatus", which is closer to the name "ParameterStatus". Release note: None 45530: ui: network no connection r=dhartunian a=elkmaster removed trigger when all the nodes are connected Resolves: #45406 Release note (ui): none 45564: sql: inverted index fixes r=jordanlewis a=jordanlewis This is based on #45563, and split out from #45157. Closes #45154. Closes #32468. Miscellaneous fixes for inverted indexes. 1. Don't produce duplicate keys for inverted indexes. 2. Permit rows to not emit any keys for a particular index, which is required to allow not having any inverted index entries for NULL (or empty container). 3. Don't produce keys for NULL entries in inverted indexes Co-authored-by: Oliver Tan <[email protected]> Co-authored-by: Vlad Los <[email protected]> Co-authored-by: Jordan Lewis <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andy-kimball, @RaduBerinde, and @rohany)
pkg/sql/opt/exec/execbuilder/testdata/inverted_index, line 113 at r6 (raw file):
Previously, rohany (Rohan Yadav) wrote…
what happens with array of null -- do we emit a key there?
No, we don't. Added tests. It also caught another bug, so thanks for asking!
pkg/sql/sem/builtins/builtins.go, line 3626 at r6 (raw file):
Previously, rohany (Rohan Yadav) wrote…
This seems like it's out of date given we don't emit keys for duplicate entries
What do you mean? I think it's still correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it needs a version gate. Schema changes won't be permitted in the mixed version state. Unless there's something else you're thinking of?
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andy-kimball, @RaduBerinde, and @rohany)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, assuming we don't allow schema changes in a mixed version state then its ok. I was just worried about the 20.1 nodes in a mixed version setting creating an inverted index on an array column.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andy-kimball, @jordanlewis, @RaduBerinde, and @rohany)
pkg/sql/sem/builtins/builtins.go, line 3626 at r6 (raw file):
Previously, jordanlewis (Jordan Lewis) wrote…
What do you mean? I think it's still correct.
The count here is counting all elements that aren't null, while we should be counting all unique elements right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andy-kimball, @RaduBerinde, and @rohany)
pkg/sql/sem/builtins/builtins.go, line 3626 at r6 (raw file):
Previously, rohany (Rohan Yadav) wrote…
The count here is counting all elements that aren't null, while we should be counting all unique elements right?
You're right... how did I miss this? Added tests.
SQL parts LGTM -- lets wait for an LGTM from someone on the optimizer team before merging |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andy-kimball, would you have a moment to take another look at this? Thanks!
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andy-kimball, @RaduBerinde, and @rohany)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opt part LGTM (with some comments)
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andy-kimball, @jordanlewis, @RaduBerinde, and @rohany)
pkg/cli/testdata/dump/inverted_index, line 24 at r1 (raw file):
CREATE TABLE t ( a JSONB NULL, b JSONB NULL,
[nit] fix the tabs in the rest of the query
pkg/sql/opt/idxconstraint/index_constraints.go, line 845 at r7 (raw file):
// makeInvertedIndexSpans for array inverted indexes. The input arr is the array // to produce spans for. func (c *indexConstraintCtx) makeInvertedIndexSpansForArrayExpr(
[nit] include a description of allPaths
pkg/sql/opt/idxconstraint/index_constraints.go, line 853 at r7 (raw file):
// array datum. for i := range arr.Array { elt := arr.Array[i]
[nit] elt
is used once, I'd just use arr.Array[i]
pkg/sql/opt/idxconstraint/index_constraints.go, line 856 at r7 (raw file):
if elt == tree.DNull { // In SQL: // SELECT ARRAY[1, NULL, 2] @> ARRAY[NULL]
[nit] this example is confusing, it's not clear which side our NULL corresponds to. Maybe just say that in SQL, testing an array that contains NULL for containment in another array always yields false.
pkg/sql/opt/idxconstraint/index_constraints.go, line 871 at r7 (raw file):
if !allPaths { // The span is tight if we just had 1 path through the index constraint. return len(arr.Array) == 1, constraints
[nit] This can just be break
and we would have a single instance of the tightness logic.
pkg/sql/opt/idxconstraint/index_constraints.go, line 875 at r7 (raw file):
// Reset out for next iteration out = &constraint.Constraint{}
We can just do out := &constraint.Constraint()
above where we use it. I know this is copied but there we had a continue
case.
pkg/sql/opt/idxconstraint/index_constraints.go, line 878 at r7 (raw file):
} if !constrained { c.unconstrained(0 /* offset */, out)
[nit] This is only possible if it's an empty array right? We could just check for that upfront
pkg/sql/opt/idxconstraint/testdata/inverted, line 78 at r7 (raw file):
[ - ] Remaining filter: @1 @> ARRAY[]
[nit] add some tests for the case where the array contains a NULL
pkg/sql/opt/idxconstraint/testdata/inverted, line 83 at r7 (raw file):
---- [ - ] Remaining filter: NULL
This should be a contradiction
pkg/sql/opt/xform/testdata/rules/select, line 932 at r7 (raw file):
└── key: (1) # Write some tests for array inverted indexes.
[nit] remove "write some"
This commit adds inverted index support to arrays. Inverted index entries are created from arrays by simply encoding a key that contains the array element's table key encoding. Nulls are not indexed, since in SQL, ARRAY[1, NULL] @> ARRAY[NULL] returns false. For example, in a table t(int, int[]) with an inverted index with id 3 on the int[] column the row (10, [1, NULL, 2]) produces 2 index keys: ``` /tableId/3/1/10 /tableId/3/2/10 ``` This encoding scheme is much simpler than the one for JSON, since arrays don't have "paths": their elements are simply ordinary datums. Release note (sql change): The inverted index implementation now supports indexing array columns. This permits accelerating containment queries (@> and <@) on array columns by adding an index to them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andy-kimball, @jordanlewis, @RaduBerinde, and @rohany)
pkg/cli/testdata/dump/inverted_index, line 24 at r1 (raw file):
Previously, RaduBerinde wrote…
[nit] fix the tabs in the rest of the query
I think there's nothing wrong with them anymore. What am i missing?
pkg/sql/opt/idxconstraint/index_constraints.go, line 875 at r7 (raw file):
Previously, RaduBerinde wrote…
We can just do
out := &constraint.Constraint()
above where we use it. I know this is copied but there we had acontinue
case.
Good point, fixed.
pkg/sql/opt/idxconstraint/index_constraints.go, line 878 at r7 (raw file):
Previously, RaduBerinde wrote…
[nit] This is only possible if it's an empty array right? We could just check for that upfront
Good point, done.
pkg/sql/opt/idxconstraint/testdata/inverted, line 83 at r7 (raw file):
Previously, RaduBerinde wrote…
This should be a contradiction
Yeah true. I added tests for this for JSON too and it's also broken. I think there's something about the way that this test is set up that makes this not work properly.
pkg/sql/opt/xform/testdata/rules/select, line 932 at r7 (raw file):
Previously, RaduBerinde wrote…
[nit] remove "write some"
Done. this was a weird comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @andy-kimball, @jordanlewis, @RaduBerinde, and @rohany)
pkg/cli/testdata/dump/inverted_index, line 24 at r1 (raw file):
Previously, jordanlewis (Jordan Lewis) wrote…
I think there's nothing wrong with them anymore. What am i missing?
Hm I don't remember even writing this haha
TFTRs all! bors r+ |
Build failed (retrying...) |
Build succeeded |
I guess this closes #17154 as well. |
Closes #43199.
This commit adds inverted index support to arrays. Inverted index
entries are created from arrays by simply encoding a key that contains
the array element's table key encoding. Nulls are not indexed, since in
SQL, ARRAY[1, NULL] @> ARRAY[NULL] returns false.
For example, in a table t(int, int[]) with an inverted index with id 3
on the int[] column the row (10, [1, NULL, 2]) produces 2 index keys:
This encoding scheme is much simpler than the one for JSON, since arrays
don't have "paths": their elements are simply ordinary datums.
Release note (sql change): The inverted index implementation now
supports indexing array columns. This permits accelerating containment
queries (@> and <@) on array columns by adding an index to them.