Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: add inverted indexes on arrays #45157

Merged
merged 1 commit into from
Mar 6, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 4 additions & 2 deletions docs/generated/sql/functions.md
Original file line number Diff line number Diff line change
Expand Up @@ -1095,8 +1095,6 @@ SELECT * FROM crdb_internal.check_consistency(true, ‘\x02’, ‘\x04’)</p>
<tr><td><a name="crdb_internal.get_zone_config"></a><code>crdb_internal.get_zone_config(namespace_id: <a href="int.html">int</a>) &rarr; <a href="bytes.html">bytes</a></code></td><td></td></tr>
<tr><td><a name="crdb_internal.is_admin"></a><code>crdb_internal.is_admin() &rarr; <a href="bool.html">bool</a></code></td><td><span class="funcdesc"><p>Retrieves the current user’s admin status.</p>
</span></td></tr>
<tr><td><a name="crdb_internal.json_num_index_entries"></a><code>crdb_internal.json_num_index_entries(val: jsonb) &rarr; <a href="int.html">int</a></code></td><td><span class="funcdesc"><p>This function is used only by CockroachDB’s developers for testing purposes.</p>
</span></td></tr>
<tr><td><a name="crdb_internal.lease_holder"></a><code>crdb_internal.lease_holder(key: <a href="bytes.html">bytes</a>) &rarr; <a href="int.html">int</a></code></td><td><span class="funcdesc"><p>This function is used to fetch the leaseholder corresponding to a request key</p>
</span></td></tr>
<tr><td><a name="crdb_internal.locality_value"></a><code>crdb_internal.locality_value(key: <a href="string.html">string</a>) &rarr; <a href="string.html">string</a></code></td><td><span class="funcdesc"><p>Returns the value of the specified locality key.</p>
Expand All @@ -1105,6 +1103,10 @@ SELECT * FROM crdb_internal.check_consistency(true, ‘\x02’, ‘\x04’)</p>
</span></td></tr>
<tr><td><a name="crdb_internal.node_executable_version"></a><code>crdb_internal.node_executable_version() &rarr; <a href="string.html">string</a></code></td><td><span class="funcdesc"><p>Returns the version of CockroachDB this node is running.</p>
</span></td></tr>
<tr><td><a name="crdb_internal.num_inverted_index_entries"></a><code>crdb_internal.num_inverted_index_entries(val: anyelement[]) &rarr; <a href="int.html">int</a></code></td><td><span class="funcdesc"><p>This function is used only by CockroachDB’s developers for testing purposes.</p>
</span></td></tr>
<tr><td><a name="crdb_internal.num_inverted_index_entries"></a><code>crdb_internal.num_inverted_index_entries(val: jsonb) &rarr; <a href="int.html">int</a></code></td><td><span class="funcdesc"><p>This function is used only by CockroachDB’s developers for testing purposes.</p>
</span></td></tr>
<tr><td><a name="crdb_internal.pretty_key"></a><code>crdb_internal.pretty_key(raw_key: <a href="bytes.html">bytes</a>, skip_fields: <a href="int.html">int</a>) &rarr; <a href="string.html">string</a></code></td><td><span class="funcdesc"><p>This function is used only by CockroachDB’s developers for testing purposes.</p>
</span></td></tr>
<tr><td><a name="crdb_internal.range_stats"></a><code>crdb_internal.range_stats(key: <a href="bytes.html">bytes</a>) &rarr; jsonb</code></td><td><span class="funcdesc"><p>This function is used to retrieve range statistics information as a JSON object.</p>
Expand Down
18 changes: 13 additions & 5 deletions pkg/cli/testdata/dump/inverted_index
Original file line number Diff line number Diff line change
Expand Up @@ -3,12 +3,16 @@ CREATE DATABASE d;
CREATE TABLE d.t (
a JSON,
b JSON,
INVERTED INDEX idx (a)
c INT[],
d INT[],
INVERTED INDEX idx (a),
INVERTED INDEX idx3 (c)
);

CREATE INVERTED INDEX idx2 ON d.t (b);
CREATE INVERTED INDEX idx4 ON d.t (d);

INSERT INTO d.t VALUES ('{"a": "b"}', '{"c": "d"}');
INSERT INTO d.t VALUES ('{"a": "b"}', '{"c": "d"}', ARRAY[1], ARRAY[2]);
----
INSERT 1

Expand All @@ -18,12 +22,16 @@ dump d t
CREATE TABLE t (
a JSONB NULL,
b JSONB NULL,
c INT8[] NULL,
d INT8[] NULL,
INVERTED INDEX idx (a),
INVERTED INDEX idx3 (c),
INVERTED INDEX idx2 (b),
FAMILY "primary" (a, b, rowid)
INVERTED INDEX idx4 (d),
FAMILY "primary" (a, b, c, d, rowid)
);

INSERT INTO t (a, b) VALUES
('{"a": "b"}', '{"c": "d"}');
INSERT INTO t (a, b, c, d) VALUES
('{"a": "b"}', '{"c": "d"}', ARRAY[1], ARRAY[2]);
----
----
2 changes: 1 addition & 1 deletion pkg/sql/backfill.go
Original file line number Diff line number Diff line change
Expand Up @@ -1243,7 +1243,7 @@ func (sc *SchemaChanger) validateInvertedIndexes(
row, err := ie.QueryRowEx(ctx, "verify-inverted-idx-count", txn,
sqlbase.InternalExecutorSessionDataOverride{},
fmt.Sprintf(
`SELECT coalesce(sum_int(crdb_internal.json_num_index_entries(%q)), 0) FROM [%d AS t]`,
`SELECT coalesce(sum_int(crdb_internal.num_inverted_index_entries(%q)), 0) FROM [%d AS t]`,
col, tableDesc.ID,
),
)
Expand Down
113 changes: 113 additions & 0 deletions pkg/sql/logictest/testdata/logic_test/inverted_index
Original file line number Diff line number Diff line change
Expand Up @@ -701,3 +701,116 @@ CREATE INVERTED INDEX ON table_with_nulls (a)

statement ok
DROP TABLE table_with_nulls

statement ok
DROP TABLE c

subtest arrays

statement ok
CREATE TABLE c (
id INT PRIMARY KEY,
foo INT[],
bar STRING[],
INVERTED INDEX (foo),
FAMILY "primary" (id, foo, bar)
)

statement ok
INSERT INTO c VALUES(0, NULL, NULL)

statement ok
INSERT INTO c VALUES(1, ARRAY[], ARRAY['foo', 'bar', 'baz'])

statement ok
CREATE INDEX ON c USING GIN (bar)

query TT
SHOW CREATE TABLE c
----
c CREATE TABLE c (
id INT8 NOT NULL,
foo INT8[] NULL,
bar STRING[] NULL,
CONSTRAINT "primary" PRIMARY KEY (id ASC),
INVERTED INDEX c_foo_idx (foo),
INVERTED INDEX c_bar_idx (bar),
FAMILY "primary" (id, foo, bar)
)

query ITT
SELECT * from c WHERE bar @> ARRAY['foo']
----
1 {} {foo,bar,baz}

query ITT
SELECT * from c WHERE bar @> ARRAY['bar', 'baz']
----
1 {} {foo,bar,baz}

query ITT
SELECT * from c WHERE bar @> ARRAY['bar', 'qux']
----

statement ok
INSERT INTO c VALUES(2, NULL, NULL)

statement ok
INSERT INTO c VALUES(3, ARRAY[0,1,NULL], ARRAY['a',NULL,'b',NULL])

statement ok
INSERT INTO c VALUES(4, ARRAY[1,2,3], ARRAY['b',NULL,'c'])

statement ok
INSERT INTO c VALUES(5, ARRAY[], ARRAY[NULL, NULL])

# Create a second inverted index on c, to test backfills.
statement ok
CREATE INVERTED INDEX ON c(foo)

statement ok
CREATE INVERTED INDEX ON c(bar)

query ITT
SELECT * FROM c WHERE foo @> ARRAY[0]
----
3 {0,1,NULL} {a,NULL,b,NULL}

query error unsupported comparison operator
SELECT * FROM c WHERE foo @> 0

query ITT
SELECT * FROM c WHERE foo @> ARRAY[1] ORDER BY id
----
3 {0,1,NULL} {a,NULL,b,NULL}
4 {1,2,3} {b,NULL,c}

# This is expected, although it looks odd, because in SQL,
# ARRAY[NULL] @> ARRAY[NULL] returns false.
query ITT
SELECT * FROM c WHERE foo @> ARRAY[NULL]::INT[]
----

query ITT
SELECT * FROM c WHERE bar @> ARRAY['a']
----
3 {0,1,NULL} {a,NULL,b,NULL}

query ITT
SELECT * FROM c WHERE bar @> ARRAY['b'] ORDER BY id
----
3 {0,1,NULL} {a,NULL,b,NULL}
4 {1,2,3} {b,NULL,c}

query ITT
SELECT * FROM c WHERE bar @> ARRAY['c']
----
4 {1,2,3} {b,NULL,c}

query ITT
SELECT * FROM c WHERE bar @> ARRAY[]::TEXT[] ORDER BY id
----
1 {} {foo,bar,baz}
3 {0,1,NULL} {a,NULL,b,NULL}
4 {1,2,3} {b,NULL,c}
5 {} {NULL,NULL}
2 changes: 1 addition & 1 deletion pkg/sql/opt/cat/index.go
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ type Index interface {
// IsUnique returns true if this index is declared as UNIQUE in the schema.
IsUnique() bool

// IsInverted returns true if this is a JSON inverted index.
// IsInverted returns true if this is an inverted index.
IsInverted() bool

// ColumnCount returns the number of columns in the index. This includes
Expand Down
167 changes: 167 additions & 0 deletions pkg/sql/opt/exec/execbuilder/testdata/inverted_index
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,27 @@ CREATE TABLE d (
FAMILY (a,b)
)

statement ok
CREATE TABLE e (
a INT PRIMARY KEY,
b INT[],
FAMILY (a,b),
INVERTED INDEX(b)
)

statement ok
CREATE TABLE f (
a INT PRIMARY KEY,
b DECIMAL[],
FAMILY (a,b),
INVERTED INDEX(b)
)

statement ok
CREATE INVERTED INDEX foo_inv ON d(b)

# Tests for json inverted indexes.

query T kvtrace
INSERT INTO d VALUES(0, '{"a": "b"}')
----
Expand Down Expand Up @@ -78,6 +96,60 @@ DELETE FROM d WHERE a=4
Scan /Table/53/1/4{-/#}
Del /Table/53/1/4/0

# Tests for array inverted indexes.

# Make sure that duplicate entries do not get emitted more than once, and that
# null keys don't get emitted.
query T kvtrace
INSERT INTO e VALUES(0, ARRAY[7,0,0,1,NULL,10,0,1,7,NULL])
----
CPut /Table/54/1/0/0 -> /TUPLE/
InitPut /Table/54/2/0/0/0 -> /BYTES/
InitPut /Table/54/2/1/0/0 -> /BYTES/
InitPut /Table/54/2/7/0/0 -> /BYTES/
InitPut /Table/54/2/10/0/0 -> /BYTES/

# Make sure that empty arrays do not emit any keys at all.
query T kvtrace
INSERT INTO e VALUES(1, ARRAY[])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happens with array of null -- do we emit a key there?

----
CPut /Table/54/1/1/0 -> /TUPLE/

# Make sure that NULL arrays do not emit any keys at all.
query T kvtrace
INSERT INTO e VALUES(2, NULL)
----
CPut /Table/54/1/2/0 -> /TUPLE/

# Make sure that NULL entries within an array don't emit any keys.
query T kvtrace
INSERT INTO e VALUES(3, ARRAY[NULL])
----
CPut /Table/54/1/3/0 -> /TUPLE/

# Test that array inverted indexes work okay with decimals (a type with
# composite encoding). Also, make sure that the composite encoding is
# de-duplicated - 1.0 and 1.00 should just have one entry.

query T kvtrace
INSERT INTO f VALUES(0, ARRAY[7,0,0,1.000,10,0,1,7,1.0,1.00])
----
CPut /Table/55/1/0/0 -> /TUPLE/
InitPut /Table/55/2/0/0/0 -> /BYTES/
InitPut /Table/55/2/1/0/0 -> /BYTES/
InitPut /Table/55/2/7/0/0 -> /BYTES/
InitPut /Table/55/2/1E+1/0/0 -> /BYTES/

query T kvtrace
INSERT INTO f VALUES(1, ARRAY[])
----
CPut /Table/55/1/1/0 -> /TUPLE/

query T kvtrace
INSERT INTO f VALUES(2, NULL)
----
CPut /Table/55/1/2/0 -> /TUPLE/

query TTTTT
EXPLAIN (VERBOSE) SELECT * from d where b @>'{"a": "b"}'
----
Expand Down Expand Up @@ -400,3 +472,98 @@ scan · · (a, b) ·
· table d@primary · ·
· spans ALL · ·
· filter b @> '{"a": {}, "b": {}}' · ·

subtest array

# Tests for array inverted indexes.
query TTTTT
EXPLAIN (VERBOSE) SELECT * from e where b @> ARRAY[1]
----
· distributed false · ·
· vectorized false · ·
index-join · · (a, b) ·
│ table e@primary · ·
│ key columns a · ·
└── scan · · (a) ·
· table e@e_b_idx · ·
· spans /1-/2 · ·

query TTTTT
EXPLAIN (VERBOSE) SELECT * from e where b @> ARRAY[]::INT[]
----
· distributed false · ·
· vectorized false · ·
scan · · (a, b) ·
· table e@primary · ·
· spans ALL · ·
· filter b @> ARRAY[] · ·

# Test that searching for a NULL element using the inverted index.
query TTTTT
EXPLAIN (VERBOSE) SELECT * from e where b @> ARRAY[NULL]::INT[]
----
· distributed false · ·
· vectorized false · ·
index-join · · (a, b) ·
│ table e@primary · ·
│ key columns a · ·
└── norows · · (a) ·

query TTTTT
EXPLAIN (VERBOSE) SELECT * from e where b @> NULL
----
· distributed false · ·
· vectorized false · ·
norows · · (a, b) ·

query TTTTT
EXPLAIN (VERBOSE) SELECT * from e where b IS NULL
----
· distributed false · ·
· vectorized false · ·
scan · · (a, b) ·
· table e@primary · ·
· spans ALL · ·
· filter b IS NULL · ·

query TTTTT
EXPLAIN (VERBOSE) SELECT * from e where b @> ARRAY[1,2]
----
· distributed false · ·
· vectorized false · ·
lookup-join · · (a, b) ·
│ table e@primary · ·
│ type inner · ·
│ equality (a) = (a) · ·
│ equality cols are key · · ·
│ parallel · · ·
│ pred @2 @> ARRAY[1,2] · ·
└── zigzag-join · · (a) ·
│ type inner · ·
├── scan · · (a) ·
│ table e@e_b_idx · ·
│ fixedvals 1 column · ·
└── scan · · () ·
· table e@e_b_idx · ·
· fixedvals 1 column · ·

query TTTTT
EXPLAIN (VERBOSE) SELECT * from e where b @> ARRAY[1] AND b @> ARRAY[2]
----
· distributed false · ·
· vectorized false · ·
lookup-join · · (a, b) ·
│ table e@primary · ·
│ type inner · ·
│ equality (a) = (a) · ·
│ equality cols are key · · ·
│ parallel · · ·
│ pred (@2 @> ARRAY[1]) AND (@2 @> ARRAY[2]) · ·
└── zigzag-join · · (a) ·
│ type inner · ·
├── scan · · (a) ·
│ table e@e_b_idx · ·
│ fixedvals 1 column · ·
└── scan · · () ·
· table e@e_b_idx · ·
· fixedvals 1 column · ·
Loading