Skip to content

Commit

Permalink
sql: add inverted indexes on arrays
Browse files Browse the repository at this point in the history
This commit adds inverted index support to arrays. Inverted index
entries are created from arrays by simply encoding a key that contains
the array element's table key encoding. Nulls are not indexed, since in
SQL, ARRAY[1, NULL] @> ARRAY[NULL] returns false.

For example, in a table t(int, int[]) with an inverted index with id 3
on the int[] column the row (10, [1, NULL, 2]) produces 2 index keys:

```
/tableId/3/1/10
/tableId/3/2/10
```

This encoding scheme is much simpler than the one for JSON, since arrays
don't have "paths": their elements are simply ordinary datums.

Release note (sql change): The inverted index implementation now
supports indexing array columns. This permits accelerating containment
queries (@> and <@) on array columns by adding an index to them.
  • Loading branch information
jordanlewis committed Mar 2, 2020
1 parent 35498f0 commit f3144aa
Show file tree
Hide file tree
Showing 12 changed files with 607 additions and 89 deletions.
6 changes: 4 additions & 2 deletions docs/generated/sql/functions.md
Original file line number Diff line number Diff line change
Expand Up @@ -1095,8 +1095,6 @@ SELECT * FROM crdb_internal.check_consistency(true, ‘\x02’, ‘\x04’)</p>
<tr><td><a name="crdb_internal.get_zone_config"></a><code>crdb_internal.get_zone_config(namespace_id: <a href="int.html">int</a>) &rarr; <a href="bytes.html">bytes</a></code></td><td></td></tr>
<tr><td><a name="crdb_internal.is_admin"></a><code>crdb_internal.is_admin() &rarr; <a href="bool.html">bool</a></code></td><td><span class="funcdesc"><p>Retrieves the current user’s admin status.</p>
</span></td></tr>
<tr><td><a name="crdb_internal.json_num_index_entries"></a><code>crdb_internal.json_num_index_entries(val: jsonb) &rarr; <a href="int.html">int</a></code></td><td><span class="funcdesc"><p>This function is used only by CockroachDB’s developers for testing purposes.</p>
</span></td></tr>
<tr><td><a name="crdb_internal.lease_holder"></a><code>crdb_internal.lease_holder(key: <a href="bytes.html">bytes</a>) &rarr; <a href="int.html">int</a></code></td><td><span class="funcdesc"><p>This function is used to fetch the leaseholder corresponding to a request key</p>
</span></td></tr>
<tr><td><a name="crdb_internal.locality_value"></a><code>crdb_internal.locality_value(key: <a href="string.html">string</a>) &rarr; <a href="string.html">string</a></code></td><td><span class="funcdesc"><p>Returns the value of the specified locality key.</p>
Expand All @@ -1105,6 +1103,10 @@ SELECT * FROM crdb_internal.check_consistency(true, ‘\x02’, ‘\x04’)</p>
</span></td></tr>
<tr><td><a name="crdb_internal.node_executable_version"></a><code>crdb_internal.node_executable_version() &rarr; <a href="string.html">string</a></code></td><td><span class="funcdesc"><p>Returns the version of CockroachDB this node is running.</p>
</span></td></tr>
<tr><td><a name="crdb_internal.num_inverted_index_entries"></a><code>crdb_internal.num_inverted_index_entries(val: anyelement[]) &rarr; <a href="int.html">int</a></code></td><td><span class="funcdesc"><p>This function is used only by CockroachDB’s developers for testing purposes.</p>
</span></td></tr>
<tr><td><a name="crdb_internal.num_inverted_index_entries"></a><code>crdb_internal.num_inverted_index_entries(val: jsonb) &rarr; <a href="int.html">int</a></code></td><td><span class="funcdesc"><p>This function is used only by CockroachDB’s developers for testing purposes.</p>
</span></td></tr>
<tr><td><a name="crdb_internal.pretty_key"></a><code>crdb_internal.pretty_key(raw_key: <a href="bytes.html">bytes</a>, skip_fields: <a href="int.html">int</a>) &rarr; <a href="string.html">string</a></code></td><td><span class="funcdesc"><p>This function is used only by CockroachDB’s developers for testing purposes.</p>
</span></td></tr>
<tr><td><a name="crdb_internal.range_stats"></a><code>crdb_internal.range_stats(key: <a href="bytes.html">bytes</a>) &rarr; jsonb</code></td><td><span class="funcdesc"><p>This function is used to retrieve range statistics information as a JSON object.</p>
Expand Down
18 changes: 13 additions & 5 deletions pkg/cli/testdata/dump/inverted_index
Original file line number Diff line number Diff line change
Expand Up @@ -3,12 +3,16 @@ CREATE DATABASE d;
CREATE TABLE d.t (
a JSON,
b JSON,
INVERTED INDEX idx (a)
c INT[],
d INT[],
INVERTED INDEX idx (a),
INVERTED INDEX idx3 (c)
);

CREATE INVERTED INDEX idx2 ON d.t (b);
CREATE INVERTED INDEX idx4 ON d.t (d);

INSERT INTO d.t VALUES ('{"a": "b"}', '{"c": "d"}');
INSERT INTO d.t VALUES ('{"a": "b"}', '{"c": "d"}', ARRAY[1], ARRAY[2]);
----
INSERT 1

Expand All @@ -18,12 +22,16 @@ dump d t
CREATE TABLE t (
a JSONB NULL,
b JSONB NULL,
c INT8[] NULL,
d INT8[] NULL,
INVERTED INDEX idx (a),
INVERTED INDEX idx3 (c),
INVERTED INDEX idx2 (b),
FAMILY "primary" (a, b, rowid)
INVERTED INDEX idx4 (d),
FAMILY "primary" (a, b, c, d, rowid)
);

INSERT INTO t (a, b) VALUES
('{"a": "b"}', '{"c": "d"}');
INSERT INTO t (a, b, c, d) VALUES
('{"a": "b"}', '{"c": "d"}', ARRAY[1], ARRAY[2]);
----
----
2 changes: 1 addition & 1 deletion pkg/sql/backfill.go
Original file line number Diff line number Diff line change
Expand Up @@ -1243,7 +1243,7 @@ func (sc *SchemaChanger) validateInvertedIndexes(
row, err := ie.QueryRowEx(ctx, "verify-inverted-idx-count", txn,
sqlbase.InternalExecutorSessionDataOverride{},
fmt.Sprintf(
`SELECT coalesce(sum_int(crdb_internal.json_num_index_entries(%s)), 0) FROM [%d AS t]`,
`SELECT coalesce(sum_int(crdb_internal.num_inverted_index_entries(%s)), 0) FROM [%d AS t]`,
col, tableDesc.ID,
),
)
Expand Down
106 changes: 106 additions & 0 deletions pkg/sql/logictest/testdata/logic_test/inverted_index
Original file line number Diff line number Diff line change
Expand Up @@ -694,3 +694,109 @@ CREATE INVERTED INDEX ON table_with_nulls (a)

statement ok
DROP TABLE table_with_nulls

statement ok
DROP TABLE c

subtest arrays

statement ok
CREATE TABLE c (
id INT PRIMARY KEY,
foo INT[],
bar STRING[],
INVERTED INDEX (foo),
FAMILY "primary" (id, foo, bar)
)

statement ok
INSERT INTO c VALUES(0, NULL, NULL)

statement ok
INSERT INTO c VALUES(1, ARRAY[], ARRAY['foo', 'bar', 'baz'])

statement ok
CREATE INDEX ON c USING GIN (bar)

query TT
SHOW CREATE TABLE c
----
c CREATE TABLE c (
id INT8 NOT NULL,
foo INT8[] NULL,
bar STRING[] NULL,
CONSTRAINT "primary" PRIMARY KEY (id ASC),
INVERTED INDEX c_foo_idx (foo),
INVERTED INDEX c_bar_idx (bar),
FAMILY "primary" (id, foo, bar)
)

query ITT
SELECT * from c WHERE bar @> ARRAY['foo']
----
1 {} {foo,bar,baz}

query ITT
SELECT * from c WHERE bar @> ARRAY['bar', 'baz']
----
1 {} {foo,bar,baz}

query ITT
SELECT * from c WHERE bar @> ARRAY['bar', 'qux']
----

statement ok
INSERT INTO c VALUES(2, NULL, NULL)

statement ok
INSERT INTO c VALUES(3, ARRAY[0,1,NULL], ARRAY['a',NULL,'b',NULL])

statement ok
INSERT INTO c VALUES(4, ARRAY[1,2,3], ARRAY['b',NULL,'c'])

statement ok
INSERT INTO c VALUES(5, ARRAY[], ARRAY[NULL, NULL])

query ITT
SELECT * FROM c WHERE foo @> ARRAY[0]
----
3 {0,1,NULL} {a,NULL,b,NULL}

query error unsupported comparison operator
SELECT * FROM c WHERE foo @> 0

query ITT
SELECT * FROM c WHERE foo @> ARRAY[1]
----
3 {0,1,NULL} {a,NULL,b,NULL}
4 {1,2,3} {b,NULL,c}

# This is expected, although it looks odd, because in SQL,
# ARRAY[NULL] @> ARRAY[NULL] returns false.
query ITT
SELECT * FROM c WHERE foo @> ARRAY[NULL]::INT[]
----

query ITT
SELECT * FROM c WHERE bar @> ARRAY['a']
----
3 {0,1,NULL} {a,NULL,b,NULL}

query ITT
SELECT * FROM c WHERE bar @> ARRAY['b']
----
3 {0,1,NULL} {a,NULL,b,NULL}
4 {1,2,3} {b,NULL,c}

query ITT
SELECT * FROM c WHERE bar @> ARRAY['c']
----
4 {1,2,3} {b,NULL,c}

query ITT
SELECT * FROM c WHERE bar @> ARRAY[]::TEXT[]
----
1 {} {foo,bar,baz}
3 {0,1,NULL} {a,NULL,b,NULL}
4 {1,2,3} {b,NULL,c}
5 {} {NULL,NULL}
2 changes: 1 addition & 1 deletion pkg/sql/opt/cat/index.go
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ type Index interface {
// IsUnique returns true if this index is declared as UNIQUE in the schema.
IsUnique() bool

// IsInverted returns true if this is a JSON inverted index.
// IsInverted returns true if this is an inverted index.
IsInverted() bool

// ColumnCount returns the number of columns in the index. This includes
Expand Down
157 changes: 157 additions & 0 deletions pkg/sql/opt/exec/execbuilder/testdata/inverted_index
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,27 @@ CREATE TABLE d (
FAMILY (a,b)
)

statement ok
CREATE TABLE e (
a INT PRIMARY KEY,
b INT[],
FAMILY (a,b),
INVERTED INDEX(b)
)

statement ok
CREATE TABLE f (
a INT PRIMARY KEY,
b DECIMAL[],
FAMILY (a,b),
INVERTED INDEX(b)
)

statement ok
CREATE INVERTED INDEX foo_inv ON d(b)

# Tests for json inverted indexes.

query T kvtrace
INSERT INTO d VALUES(0, '{"a": "b"}')
----
Expand Down Expand Up @@ -46,6 +64,52 @@ INSERT INTO d VALUES(4, NULL)
----
CPut /Table/53/1/4/0 -> /TUPLE/

# Tests for array inverted indexes.

# Make sure that duplicate entries do not get emitted more than once.
query T kvtrace
INSERT INTO e VALUES(0, ARRAY[7,0,0,1,10,0,1,7])
----
CPut /Table/54/1/0/0 -> /TUPLE/
InitPut /Table/54/2/0/0/0 -> /BYTES/
InitPut /Table/54/2/1/0/0 -> /BYTES/
InitPut /Table/54/2/7/0/0 -> /BYTES/
InitPut /Table/54/2/10/0/0 -> /BYTES/

# Make sure that empty arrays do not emit any keys at all.
query T kvtrace
INSERT INTO e VALUES(1, ARRAY[])
----
CPut /Table/54/1/1/0 -> /TUPLE/

# Make sure that NULL arrays do not emit any keys at all.
query T kvtrace
INSERT INTO e VALUES(2, NULL)
----
CPut /Table/54/1/2/0 -> /TUPLE/

# Test that array inverted indexes work okay with decimals (a type with
# composite encoding).

query T kvtrace
INSERT INTO f VALUES(0, ARRAY[7,0,0,1,10,0,1,7])
----
CPut /Table/55/1/0/0 -> /TUPLE/
InitPut /Table/55/2/0/0/0 -> /BYTES/
InitPut /Table/55/2/1/0/0 -> /BYTES/
InitPut /Table/55/2/7/0/0 -> /BYTES/
InitPut /Table/55/2/1E+1/0/0 -> /BYTES/

query T kvtrace
INSERT INTO f VALUES(1, ARRAY[])
----
CPut /Table/55/1/1/0 -> /TUPLE/

query T kvtrace
INSERT INTO f VALUES(2, NULL)
----
CPut /Table/55/1/2/0 -> /TUPLE/

query TTTTT
EXPLAIN (VERBOSE) SELECT * from d where b @>'{"a": "b"}'
----
Expand Down Expand Up @@ -368,3 +432,96 @@ scan · · (a, b) ·
· table d@primary · ·
· spans ALL · ·
· filter b @> '{"a": {}, "b": {}}' · ·

subtest array

query TTTTT
EXPLAIN (VERBOSE) SELECT * from e where b @> ARRAY[1]
----
· distributed false · ·
· vectorized false · ·
index-join · · (a, b) ·
│ table e@primary · ·
│ key columns a · ·
└── scan · · (a) ·
· table e@e_b_idx · ·
· spans /1-/2 · ·

query TTTTT
EXPLAIN (VERBOSE) SELECT * from e where b @> ARRAY[]::INT[]
----
· distributed false · ·
· vectorized false · ·
scan · · (a, b) ·
· table e@primary · ·
· spans ALL · ·
· filter b @> ARRAY[] · ·

query TTTTT
EXPLAIN (VERBOSE) SELECT * from e where b @> ARRAY[NULL]::INT[]
----
· distributed false · ·
· vectorized false · ·
index-join · · (a, b) ·
│ table e@primary · ·
│ key columns a · ·
└── norows · · (a) ·

query TTTTT
EXPLAIN (VERBOSE) SELECT * from e where b @> NULL
----
· distributed false · ·
· vectorized false · ·
norows · · (a, b) ·

query TTTTT
EXPLAIN (VERBOSE) SELECT * from e where b IS NULL
----
· distributed false · ·
· vectorized false · ·
scan · · (a, b) ·
· table e@primary · ·
· spans ALL · ·
· filter b IS NULL · ·

query TTTTT
EXPLAIN (VERBOSE) SELECT * from e where b @> ARRAY[1,2]
----
· distributed false · ·
· vectorized false · ·
lookup-join · · (a, b) ·
│ table e@primary · ·
│ type inner · ·
│ equality (a) = (a) · ·
│ equality cols are key · · ·
│ parallel · · ·
│ pred @2 @> ARRAY[1,2] · ·
└── zigzag-join · · (a) ·
│ type inner · ·
├── scan · · (a) ·
│ table e@e_b_idx · ·
│ fixedvals 1 column · ·
└── scan · · () ·
· table e@e_b_idx · ·
· fixedvals 1 column · ·

query TTTTT
EXPLAIN (VERBOSE) SELECT * from e where b @> ARRAY[1] AND b @> ARRAY[2]
----
· distributed false · ·
· vectorized false · ·
lookup-join · · (a, b) ·
│ table e@primary · ·
│ type inner · ·
│ equality (a) = (a) · ·
│ equality cols are key · · ·
│ parallel · · ·
│ pred (@2 @> ARRAY[1]) AND (@2 @> ARRAY[2]) · ·
└── zigzag-join · · (a) ·
│ type inner · ·
├── scan · · (a) ·
│ table e@e_b_idx · ·
│ fixedvals 1 column · ·
└── scan · · () ·
· table e@e_b_idx · ·
· fixedvals 1 column · ·
Loading

0 comments on commit f3144aa

Please sign in to comment.