Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: add inverted indexes on arrays #45157

Merged
merged 1 commit into from
Mar 6, 2020
Merged

Conversation

jordanlewis
Copy link
Member

@jordanlewis jordanlewis commented Feb 18, 2020

Closes #43199.

This commit adds inverted index support to arrays. Inverted index
entries are created from arrays by simply encoding a key that contains
the array element's table key encoding. Nulls are not indexed, since in
SQL, ARRAY[1, NULL] @> ARRAY[NULL] returns false.

For example, in a table t(int, int[]) with an inverted index with id 3
on the int[] column the row (10, [1, NULL, 2]) produces 2 index keys:

/tableId/3/1/10
/tableId/3/2/10

This encoding scheme is much simpler than the one for JSON, since arrays
don't have "paths": their elements are simply ordinary datums.

Release note (sql change): The inverted index implementation now
supports indexing array columns. This permits accelerating containment
queries (@> and <@) on array columns by adding an index to them.

@jordanlewis jordanlewis requested review from rohany and a team February 18, 2020 04:57
@jordanlewis jordanlewis requested review from a team as code owners February 18, 2020 04:57
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@jordanlewis jordanlewis force-pushed the array-inv branch 2 times, most recently from 74d25be to afc6456 Compare February 18, 2020 22:27
Copy link
Contributor

@andy-kimball andy-kimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis, @RaduBerinde, and @rohany)


pkg/sql/logictest/testdata/logic_test/inverted_index, line 696 at r2 (raw file):


statement ok
DROP TABLE table_with_nulls

Can we also add a few relevant optimizer tests in opt/xform/testdata/rules/select in the GenerateInvertedIndexScans section? There's overlap with the execbuilder tests, but in the optimizer tests we're specifically testing that the logic in GenerateInvertedIndexScans handles Array inverted indexes in addition to JSON. So would probably require just a couple sanity check cases at that level.


pkg/sql/opt/idxconstraint/index_constraints.go, line 759 at r2 (raw file):

}

func (c *indexConstraintCtx) makeInvertedIndexSpansForJSONExpr(

Would be good to have at least a simple comment on this method.


pkg/sql/opt/idxconstraint/index_constraints.go, line 873 at r2 (raw file):

		case types.JsonFamily:
			return c.makeInvertedIndexSpansForJSONExpr(nd, constraints, allPaths, rightDatum)
		case types.ArrayFamily:

I'd expect idxconstraint/testdata tests for this new functionality. "Onion testing" - test at each layer of abstraction.


pkg/sql/opt/idxconstraint/index_constraints.go, line 874 at r2 (raw file):

			return c.makeInvertedIndexSpansForJSONExpr(nd, constraints, allPaths, rightDatum)
		case types.ArrayFamily:
			// We're going to make one span to search for every value inside of the

I think this code should be broken out into a helper method, analogous to the JSON method.


pkg/sql/sqlbase/index_encoding.go, line 1237 at r2 (raw file):

			// have 0 or >1 entries, though, as well as secondary indexes which store
			// columns from multiple column families.
			secondaryIndexEntries = append(secondaryIndexEntries, entries[i])

Wouldn't this just be:

secondaryIndexEntries = append(secondaryIndexEntries, entries)

Copy link
Contributor

@rohany rohany left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis, @RaduBerinde, and @rohany)


pkg/cli/testdata/dump/inverted_index, line 6 at r1 (raw file):

	a JSON,
	b JSON,
    c INT[],

the spacing seems off here (missing tabs or something)


pkg/cli/testdata/dump/inverted_index, line 25 at r1 (raw file):

	a JSONB NULL,
	b JSONB NULL,
    c INT[] NULL,

same as above


pkg/sql/logictest/testdata/logic_test/inverted_index, line 716 at r2 (raw file):


statement ok
CREATE INDEX ON c USING GIN (bar)

add a test where the backfiller runs and has to add non null array rows


pkg/sql/logictest/testdata/logic_test/inverted_index, line 749 at r2 (raw file):


query error unsupported comparison operator
SELECT * FROM c WHERE foo @> 0

I'm confused about this syntax. So we can only test containment for singleton arrays? or would be in the future allow for containment checks of non singleton arrays? I find it unintuitive that we have to do array @> array[element] rather than array @> element.


pkg/sql/row/updater.go, line 382 at r2 (raw file):

			ru.Fks.addCheckForIndex(ru.Helper.TableDesc.PrimaryIndex.ID, ru.Helper.TableDesc.PrimaryIndex.Type)
			for i := range ru.Helper.Indexes {
				if ru.Helper.Indexes[i].Type == sqlbase.IndexDescriptor_INVERTED {

Was this missing / or deleted accidentally?


pkg/sql/sem/builtins/builtins.go, line 3634 at r2 (raw file):

		},
		tree.Overload{
			Types:      tree.ArgTypes{{"val", types.AnyArray}},

Is it important to test this function explicitly? Also, might be worth to drop a comment in the key encoder to update this if the implementation changes.


pkg/sql/sqlbase/index_encoding.go, line 879 at r2 (raw file):

// prefixed to all returned keys.
// N.B.: This won't return any keys for
func encodeArrayInvertedIndexTableKeys(val *tree.DArray, inKey []byte) (key [][]byte, err error) {

I'm making a change to EncodeTableKey in #44771. I don't think it should impact this work, but it's worth a double check.


pkg/sql/sqlbase/index_encoding.go, line 893 at r2 (raw file):

			return nil, err
		}
		outKey := make([]byte, len(inKey), len(inKey)+len(newKey))

Can we allocate the new outkey, copy inKey into it, and then encodeTableKey into that buffer? Right now we are allocating outkey and newKey for no reason.


pkg/sql/sqlbase/index_encoding.go, line 1235 at r2 (raw file):

		for i := range entries {
			// Normally, each index will have exactly one entry. Inverted indexes can
			// have 0 or >1 entries, though, as well as secondary indexes which store

This sentence seems wonky

Copy link
Member Author

@jordanlewis jordanlewis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the reviews, please take another look. I split a bunch of this PR into #45564, so that one needs to be reviewed first.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andy-kimball, @jordanlewis, @RaduBerinde, and @rohany)


pkg/cli/testdata/dump/inverted_index, line 6 at r1 (raw file):

Previously, rohany (Rohan Yadav) wrote…

the spacing seems off here (missing tabs or something)

Done.


pkg/cli/testdata/dump/inverted_index, line 25 at r1 (raw file):

Previously, rohany (Rohan Yadav) wrote…

same as above

Done.


pkg/sql/logictest/testdata/logic_test/inverted_index, line 696 at r2 (raw file):

Previously, andy-kimball (Andy Kimball) wrote…

Can we also add a few relevant optimizer tests in opt/xform/testdata/rules/select in the GenerateInvertedIndexScans section? There's overlap with the execbuilder tests, but in the optimizer tests we're specifically testing that the logic in GenerateInvertedIndexScans handles Array inverted indexes in addition to JSON. So would probably require just a couple sanity check cases at that level.

Done.


pkg/sql/logictest/testdata/logic_test/inverted_index, line 716 at r2 (raw file):

Previously, rohany (Rohan Yadav) wrote…

add a test where the backfiller runs and has to add non null array rows

Done.


pkg/sql/logictest/testdata/logic_test/inverted_index, line 749 at r2 (raw file):

Previously, rohany (Rohan Yadav) wrote…

I'm confused about this syntax. So we can only test containment for singleton arrays? or would be in the future allow for containment checks of non singleton arrays? I find it unintuitive that we have to do array @> array[element] rather than array @> element.

Yeah, this is just how @> works - it's funky. Contains in Postgres world compares the container type to itself. array @> element isn't a supported comparison.

array @> array is also supported for non singletons on the RHS, which means that the LHS must contain all elements in the RHS. Under the hood we turn that into a conjunction of singleton @> (i think) - and it turns into a zigzag join. I'll add some tests for that case as well.


pkg/sql/opt/idxconstraint/index_constraints.go, line 759 at r2 (raw file):

Previously, andy-kimball (Andy Kimball) wrote…

Would be good to have at least a simple comment on this method.

Done.


pkg/sql/opt/idxconstraint/index_constraints.go, line 874 at r2 (raw file):

Previously, andy-kimball (Andy Kimball) wrote…

I think this code should be broken out into a helper method, analogous to the JSON method.

Done.


pkg/sql/row/updater.go, line 382 at r2 (raw file):

Previously, rohany (Rohan Yadav) wrote…

Was this missing / or deleted accidentally?

I think it was already implicitly true via another check that I got rid of.


pkg/sql/sem/builtins/builtins.go, line 3634 at r2 (raw file):

Previously, rohany (Rohan Yadav) wrote…

Is it important to test this function explicitly? Also, might be worth to drop a comment in the key encoder to update this if the implementation changes.

Good idea. This actually caught a bug - thanks for pointing it out.


pkg/sql/sqlbase/index_encoding.go, line 879 at r2 (raw file):

Previously, rohany (Rohan Yadav) wrote…

I'm making a change to EncodeTableKey in #44771. I don't think it should impact this work, but it's worth a double check.

Done.


pkg/sql/sqlbase/index_encoding.go, line 893 at r2 (raw file):

Previously, rohany (Rohan Yadav) wrote…

Can we allocate the new outkey, copy inKey into it, and then encodeTableKey into that buffer? Right now we are allocating outkey and newKey for no reason.

Done.


pkg/sql/sqlbase/index_encoding.go, line 1235 at r2 (raw file):

Previously, rohany (Rohan Yadav) wrote…

This sentence seems wonky

Done.


pkg/sql/sqlbase/index_encoding.go, line 1237 at r2 (raw file):

Previously, andy-kimball (Andy Kimball) wrote…

Wouldn't this just be:

secondaryIndexEntries = append(secondaryIndexEntries, entries)

I think you need ... syntax, but yes, that's a good point - fixed.

Copy link
Member Author

@jordanlewis jordanlewis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andy-kimball, @RaduBerinde, and @rohany)


pkg/sql/opt/idxconstraint/index_constraints.go, line 873 at r2 (raw file):

Previously, andy-kimball (Andy Kimball) wrote…

I'd expect idxconstraint/testdata tests for this new functionality. "Onion testing" - test at each layer of abstraction.

Done.

Copy link
Contributor

@rohany rohany left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs a version gate as well


# Make sure that empty arrays do not emit any keys at all.
query T kvtrace
INSERT INTO e VALUES(1, ARRAY[])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happens with array of null -- do we emit a key there?

@@ -3622,6 +3623,33 @@ may increase either contention or retry errors, or both.`,
},
Info: "This function is used only by CockroachDB's developers for testing purposes.",
},
tree.Overload{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like it's out of date given we don't emit keys for duplicate entries

craig bot pushed a commit that referenced this pull request Mar 3, 2020
45140: sql: use CommandResult for displaying ParameterStatus updates r=andreimatei a=otan

We were previously sending ParameterStatus using a listener. This
is refactored to talk using CommandResult instead, with results only
being buffered if "Close()" is called. To do this, a new
`AppendParamStatusUpdate` function to `RestrictedCommandResult`.

Since setting vars live in `sessionDataMutator`, we add a new
`paramStatusUpdater` interface which uses the `RestrictedCommandResult`.
This var is reset each time we execute with an openTxn with the
CommandResult used for it.

Also renamed "StatusParam" to "ParamStatus", which is closer to the name
"ParameterStatus".

Release note: None



45530: ui: network no connection r=dhartunian a=elkmaster

removed trigger when all the nodes are connected

Resolves: #45406

Release note (ui): none

45564: sql: inverted index fixes r=jordanlewis a=jordanlewis

This is based on #45563, and split out from #45157.

Closes #45154.
Closes #32468.

Miscellaneous fixes for inverted indexes.

1. Don't produce duplicate keys for inverted indexes.
2. Permit rows to not emit any keys for a particular index, which is required to allow not having any inverted index entries for NULL (or empty container).
3. Don't produce keys for NULL entries in inverted indexes

Co-authored-by: Oliver Tan <[email protected]>
Co-authored-by: Vlad Los <[email protected]>
Co-authored-by: Jordan Lewis <[email protected]>
Copy link
Member Author

@jordanlewis jordanlewis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andy-kimball, @RaduBerinde, and @rohany)


pkg/sql/opt/exec/execbuilder/testdata/inverted_index, line 113 at r6 (raw file):

Previously, rohany (Rohan Yadav) wrote…

what happens with array of null -- do we emit a key there?

No, we don't. Added tests. It also caught another bug, so thanks for asking!


pkg/sql/sem/builtins/builtins.go, line 3626 at r6 (raw file):

Previously, rohany (Rohan Yadav) wrote…

This seems like it's out of date given we don't emit keys for duplicate entries

What do you mean? I think it's still correct.

Copy link
Member Author

@jordanlewis jordanlewis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it needs a version gate. Schema changes won't be permitted in the mixed version state. Unless there's something else you're thinking of?

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andy-kimball, @RaduBerinde, and @rohany)

Copy link
Contributor

@rohany rohany left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, assuming we don't allow schema changes in a mixed version state then its ok. I was just worried about the 20.1 nodes in a mixed version setting creating an inverted index on an array column.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andy-kimball, @jordanlewis, @RaduBerinde, and @rohany)


pkg/sql/sem/builtins/builtins.go, line 3626 at r6 (raw file):

Previously, jordanlewis (Jordan Lewis) wrote…

What do you mean? I think it's still correct.

The count here is counting all elements that aren't null, while we should be counting all unique elements right?

Copy link
Member Author

@jordanlewis jordanlewis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andy-kimball, @RaduBerinde, and @rohany)


pkg/sql/sem/builtins/builtins.go, line 3626 at r6 (raw file):

Previously, rohany (Rohan Yadav) wrote…

The count here is counting all elements that aren't null, while we should be counting all unique elements right?

You're right... how did I miss this? Added tests.

@rohany
Copy link
Contributor

rohany commented Mar 5, 2020

SQL parts LGTM -- lets wait for an LGTM from someone on the optimizer team before merging

Copy link
Member Author

@jordanlewis jordanlewis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andy-kimball, would you have a moment to take another look at this? Thanks!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andy-kimball, @RaduBerinde, and @rohany)

Copy link
Member

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opt part LGTM (with some comments)

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andy-kimball, @jordanlewis, @RaduBerinde, and @rohany)


pkg/cli/testdata/dump/inverted_index, line 24 at r1 (raw file):

CREATE TABLE t (
	a JSONB NULL,
	b JSONB NULL,

[nit] fix the tabs in the rest of the query


pkg/sql/opt/idxconstraint/index_constraints.go, line 845 at r7 (raw file):

// makeInvertedIndexSpans for array inverted indexes. The input arr is the array
// to produce spans for.
func (c *indexConstraintCtx) makeInvertedIndexSpansForArrayExpr(

[nit] include a description of allPaths


pkg/sql/opt/idxconstraint/index_constraints.go, line 853 at r7 (raw file):

	// array datum.
	for i := range arr.Array {
		elt := arr.Array[i]

[nit] elt is used once, I'd just use arr.Array[i]


pkg/sql/opt/idxconstraint/index_constraints.go, line 856 at r7 (raw file):

		if elt == tree.DNull {
			// In SQL:
			// SELECT ARRAY[1, NULL, 2] @> ARRAY[NULL]

[nit] this example is confusing, it's not clear which side our NULL corresponds to. Maybe just say that in SQL, testing an array that contains NULL for containment in another array always yields false.


pkg/sql/opt/idxconstraint/index_constraints.go, line 871 at r7 (raw file):

		if !allPaths {
			// The span is tight if we just had 1 path through the index constraint.
			return len(arr.Array) == 1, constraints

[nit] This can just be break and we would have a single instance of the tightness logic.


pkg/sql/opt/idxconstraint/index_constraints.go, line 875 at r7 (raw file):

		// Reset out for next iteration
		out = &constraint.Constraint{}

We can just do out := &constraint.Constraint() above where we use it. I know this is copied but there we had a continue case.


pkg/sql/opt/idxconstraint/index_constraints.go, line 878 at r7 (raw file):

	}
	if !constrained {
		c.unconstrained(0 /* offset */, out)

[nit] This is only possible if it's an empty array right? We could just check for that upfront


pkg/sql/opt/idxconstraint/testdata/inverted, line 78 at r7 (raw file):

[ - ]
Remaining filter: @1 @> ARRAY[]

[nit] add some tests for the case where the array contains a NULL


pkg/sql/opt/idxconstraint/testdata/inverted, line 83 at r7 (raw file):

----
[ - ]
Remaining filter: NULL

This should be a contradiction


pkg/sql/opt/xform/testdata/rules/select, line 932 at r7 (raw file):

           └── key: (1)

# Write some tests for array inverted indexes.

[nit] remove "write some"

This commit adds inverted index support to arrays. Inverted index
entries are created from arrays by simply encoding a key that contains
the array element's table key encoding. Nulls are not indexed, since in
SQL, ARRAY[1, NULL] @> ARRAY[NULL] returns false.

For example, in a table t(int, int[]) with an inverted index with id 3
on the int[] column the row (10, [1, NULL, 2]) produces 2 index keys:

```
/tableId/3/1/10
/tableId/3/2/10
```

This encoding scheme is much simpler than the one for JSON, since arrays
don't have "paths": their elements are simply ordinary datums.

Release note (sql change): The inverted index implementation now
supports indexing array columns. This permits accelerating containment
queries (@> and <@) on array columns by adding an index to them.
Copy link
Member Author

@jordanlewis jordanlewis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andy-kimball, @jordanlewis, @RaduBerinde, and @rohany)


pkg/cli/testdata/dump/inverted_index, line 24 at r1 (raw file):

Previously, RaduBerinde wrote…

[nit] fix the tabs in the rest of the query

I think there's nothing wrong with them anymore. What am i missing?


pkg/sql/opt/idxconstraint/index_constraints.go, line 875 at r7 (raw file):

Previously, RaduBerinde wrote…

We can just do out := &constraint.Constraint() above where we use it. I know this is copied but there we had a continue case.

Good point, fixed.


pkg/sql/opt/idxconstraint/index_constraints.go, line 878 at r7 (raw file):

Previously, RaduBerinde wrote…

[nit] This is only possible if it's an empty array right? We could just check for that upfront

Good point, done.


pkg/sql/opt/idxconstraint/testdata/inverted, line 83 at r7 (raw file):

Previously, RaduBerinde wrote…

This should be a contradiction

Yeah true. I added tests for this for JSON too and it's also broken. I think there's something about the way that this test is set up that makes this not work properly.


pkg/sql/opt/xform/testdata/rules/select, line 932 at r7 (raw file):

Previously, RaduBerinde wrote…

[nit] remove "write some"

Done. this was a weird comment

Copy link
Member

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @andy-kimball, @jordanlewis, @RaduBerinde, and @rohany)


pkg/cli/testdata/dump/inverted_index, line 24 at r1 (raw file):

Previously, jordanlewis (Jordan Lewis) wrote…

I think there's nothing wrong with them anymore. What am i missing?

Hm I don't remember even writing this haha

@jordanlewis
Copy link
Member Author

TFTRs all!

bors r+

@craig
Copy link
Contributor

craig bot commented Mar 5, 2020

Build failed (retrying...)

@craig
Copy link
Contributor

craig bot commented Mar 6, 2020

Build succeeded

@craig craig bot merged commit 875052e into cockroachdb:master Mar 6, 2020
@jordanlewis jordanlewis deleted the array-inv branch March 10, 2020 05:28
@bra-fsn
Copy link

bra-fsn commented Mar 11, 2020

I guess this closes #17154 as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

sql: inverted index on array types
6 participants