Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: allow ARRAYs in forward indexes #17154

Closed
justinj opened this issue Jul 20, 2017 · 15 comments · Fixed by #48045 or #91762
Closed

sql: allow ARRAYs in forward indexes #17154

justinj opened this issue Jul 20, 2017 · 15 comments · Fixed by #48045 or #91762
Labels
A-sql-encoding Relating to the SQL/KV encoding. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) O-community Originated from the community X-anchored-telemetry The issue number is anchored by telemetry references.

Comments

@justinj
Copy link
Contributor

justinj commented Jul 20, 2017

Arrays support inverted indexes, but not forward indexes. If you want to index arrays, starting in CockroachDB 20.1 consider using an inverted index on your column.

This issue tracks adding forward indexes to array types.

Jira issue: CRDB-6049

@bra-fsn
Copy link

bra-fsn commented Aug 18, 2017

You should definitely implement it. :)

@dianasaur323 dianasaur323 added the O-community Originated from the community label Aug 18, 2017
@bra-fsn
Copy link

bra-fsn commented Mar 13, 2018

Any chance that this will be implemented?

@justinj
Copy link
Contributor Author

justinj commented Mar 13, 2018

It's not currently planned for a release, it's on my to-do list to look into when I get some free time - can't promise anything at the moment though.

Out of curiosity, do you have any details on what you would use this for?

@bra-fsn
Copy link

bra-fsn commented Mar 13, 2018

Sure. Currently I have two use cases:

  1. tagging messages with labels (like important, later etc) and searching for messages which have a label or labels. Labels here are mostly strings or enums (a limited set which can grow in time).
  2. tagging objects with location IDs. There can be a limited (but variable in the range of 1 to tens) number of locations for each object. Some have only one, some more. The location IDs here are numbers (integers). I want to search for objects which live on a given location ID. There can be millions-billions of them, so I want to page on the result set (of course maintaining consistency, so between the start and the end of the search, the result set should be consistent, fixed)

@knz knz added the C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) label Apr 27, 2018
@knz knz added A-bulkio-schema-changes A-sql-encoding Relating to the SQL/KV encoding. and removed A-bulkio-schema-changes labels Apr 27, 2018
@ckome
Copy link

ckome commented May 6, 2018

You should definitely implement it. :)

@mzimmerman
Copy link

My use case for this is storing all email addresses associated with each email I load into the DB. I want to be able to search for all messages associated with one email address.

@knz knz added the X-anchored-telemetry The issue number is anchored by telemetry references. label Mar 14, 2019
@jordanlewis
Copy link
Member

Arrays are now indexable via inverted indexes, which supports containment operations on the arrays. We still don't support ordinary forward indexes on arrays.

@jordanlewis jordanlewis changed the title sql: allow ARRAYs in indexes sql: allow ARRAYs in forward indexes Apr 1, 2020
rohany added a commit to rohany/cockroach that referenced this issue Apr 26, 2020
…e types

Fixes cockroachdb#17154.
Fixes cockroachdb#35707.

This PR enables arrays to be ordered and indexed by
introducing an ordered key encoding for arrays.
Once this exists, the rest of the SQL infrastructure
is ready to handle indexing and ordering on arrays.

To encode an array of elements `ARRAY[a, b]`,
we create the following encoding.

Let `AM` = a marker byte for arrays, let `EM` be
an "array element" marker, and let `AT` be a terminator byte.

`enc(ARRAY[a, b]) = [AM, EM, enc(a), EM, enc(b), AT]`

The key is that the terminator is less than the element marker.
This allows for the "prefix matching" style comparison that
arrays support.

Release note (sql change): This PR adds support for indexing
and ordering of arrays of indexable and orderable inner types.
@rohany
Copy link
Contributor

rohany commented Apr 28, 2020

Yup, I've marked that PR as closing this (#48045)

rohany added a commit to rohany/cockroach that referenced this issue Apr 29, 2020
…e types

Fixes cockroachdb#17154.
Fixes cockroachdb#35707.

This PR enables arrays to be ordered and indexed by
introducing an ordered key encoding for arrays.
Once this exists, the rest of the SQL infrastructure
is ready to handle indexing and ordering on arrays.

To encode an array of elements `ARRAY[a, b]`,
we create the following encoding.

Let `AM` = a marker byte for arrays and let `AT` be a terminator byte.

`enc(ARRAY[a, b]) = [AM, enc(a), enc(b), AT]`

The key is that the terminator is less than the element marker.
This allows for the "prefix matching" style comparison that
arrays support.

Release note (sql change): This PR adds support for indexing
and ordering of arrays of indexable and orderable inner types.
craig bot pushed a commit that referenced this issue May 9, 2020
48045: sql: enable indexing and ordering on arrays of orderable and indexable types r=rohany a=rohany

Fixes #17154.
Fixes #35707.

This PR enables arrays to be ordered and indexed by
introducing an ordered key encoding for arrays.
Once this exists, the rest of the SQL infrastructure
is ready to handle indexing and ordering on arrays.

To encode an array of elements `ARRAY[a, b]`,
we create the following encoding.

Let `AM` = a marker byte for arrays and let `AT` be a terminator byte.

`enc(ARRAY[a, b]) = [AM, enc(a), enc(b), AT]`

The key is that the terminator is less than the element marker.
This allows for the "prefix matching" style comparison that
arrays support.

Release note (sql change): This PR adds support for indexing
and ordering of arrays of indexable and orderable inner types.


48608: schemachange: speed up slow schema changes r=spaskob a=spaskob

Touches #45150.
Fixes #47607.
Touches #47790.

Release note (performance improvement):
Before this a simple schema change could take 30s+.
The reason was that if the schema change is not first
in line in the table mutation queue it would return a
re-triable error and the jobs framework will re-adopt and
run it later. The problem is that the job adoption loop
is 30s.

To repro run this for some time:
```
cockroach sql --insecure --watch 1s -e 'drop table if exists users cascade; create table users (id uuid not null, name varchar(255) not null, email varchar(255) not null, password varchar(255) not null, remember_token varchar(100) null, created_at timestamp(0) without time zone null, updated_at timestamp(0) without time zone null, deleted_at timestamp(0) without time zone null); alter table users add primary key (id); alter table users add constraint users_email_unique unique (email);'
```

Instead of returning on re-triable errors we retry with exponential
backoff in the schema change code. This pattern of dealing with
re-triable errors in client job code is encouraged vs relying on the
registry because the latter leads to slowness and additionally to more
complicated test fixtures that rely on hacking with the internals of the
job registry,

Co-authored-by: Rohan Yadav <[email protected]>
Co-authored-by: Spas Bojanov <[email protected]>
@craig craig bot closed this as completed in 60cb535 May 9, 2020
@maddyblue maddyblue reopened this Jun 25, 2020
craig bot pushed a commit that referenced this issue Jun 29, 2020
50662: sql: disable arrays in forward indexes r=mjibson a=mjibson

There are some open questions about inverted index types being used in
forward indexes. Disable those for now.

Fixes #50656
See #50659
See #17154

Release note (sql change): disable arrays in non-inverted indexes.

50742: sql: granular detection of non-immutable constant casts r=RaduBerinde a=RaduBerinde

#### opt: improve tests for parsing timestamps

Add more `sem/tree` unit tests for parsing time-related types. In
particular:
 - relative timestamps ('now', 'tomorrow')
 - TZ variants
 - timestamps that include timezones for DDate, DTime
 - cases where the result depends on the current time and/or timezone.

In a future change, the parsing functions will also return a flag
indicating if the result depends on the context or not; these tests
will be updated to check that flag as well.

Release note: None

#### pgdate: add WithoutTimezone variants

The way we parse DTimestamp (and other non-TZ variants) is convoluted:
we initially parse it just like a DTimestampTZ, and then we strip away
the timezone (adjusting it so the timestamp looks the same as before,
except with UTC location). This means that we incorporate the local
timezone, even though it is inconsequential.

This change adds WithoutTimezone API variants to `pgdate` which avoids
using the local timezone. This will allow extending the parsing APIs
to indicate whether the result depends on the current time or
timezone.

Release note: None

#### sql: use dummy ParseTimeContext during type-checking

We are not supposed to use any context-dependent parsing results
during type-checking. As such, use a dummy ParseTimeContext instead of
having SemaContext implement the interface.

Release note: None

#### sql: granular detection of non-immutable constant casts

Interpreting a timestamp (and similar types) can depend on the current
timezone or even the current time. A recent change prevented these
casts from happening during type-checking.

Unfortunately, this includes a lot of common cases which aren't
actually context-dependent. The most important are dates and
timestamps without time zone (except rare cases like 'now' or
'tomorrow'). In these cases, the type-checked expressions won't seem
constant and won't be able to be used for partitioning, index
predicates, etc.

This change improves the parsing functions to return a
`dependsOnContext` boolean. When this flag is false, the result is
immutable and is replaced during type-checking.

The volatility of the version of date_trunc that returns a TIMESTAMPTZ
is corrected to Stable.

Release note: None

Co-authored-by: Matt Jibson <[email protected]>
Co-authored-by: Radu Berinde <[email protected]>
@ajwerner
Copy link
Contributor

If and when we allow this we need to be careful of enum values in partitioning and how it interacts with dropping enum values.

@saurik
Copy link

saurik commented Jan 5, 2022

Out of curiosity, do you have any details on what you would use this for?

@justinj FWIW, my use case is for storing tree-like data using a path encoding in the primary key (which is the correct way of doing it for the kinds of queries I am going to do). I can (and will) get by using a string-like type--and I haven't yet checked what operates you support for arrays and so if you wouldn't support an indexed "prefix of array matches [0,3,1]" (which I presume you would have to support for strings) it wouldn't really matter anyway--but it feels more "natural" to do this with arrays.

@ajwerner
Copy link
Contributor

This is the cause of prisma/prisma#12282

@jordanlewis
Copy link
Member

If and when we allow this we need to be careful of enum values in partitioning and how it interacts with dropping enum values.

Can you elaborate @ajwerner ?

@ajwerner
Copy link
Contributor

When we drop enum values we go and see if they are used anywhere. There's code to do this for arrays inside rows. There's also code which searches for enum values in partitioning descriptors. There will need to be code added to search for the values inside of enums used as partitioning values, assuming we support that.

Consider a contrived transformation of this test.

CREATE TYPE places AS ENUM ('us', 'eu', 'ap');
CREATE TABLE partitioned_table (
    place places[], id INT8,
    PRIMARY KEY (place, id)
)
    PARTITION BY LIST (place)
        (
            PARTITION us VALUES IN (ARRAY['us']),
            PARTITION eu VALUES IN (ARRAY['eu'])
        );
ALTER TYPE places DROP VALUE 'us';

The above should fail because us is used in a partitioning array. I don't think the logic here would find it.

@jordanlewis
Copy link
Member

Indeed...

F221111 19:09:22.802901 2775 sql/opt_catalog.go:1456  [-] 314  error while decoding partition tuple: tabledesc.immutable: {ID: 106, Version: 1, ModificationTime: "1668193750.882603364,0", ParentID: 100, ParentSchemaID: 101, State: PUBLIC, NextColumnID: 3, Columns: [{ID: 1, TypeID: 100105, Null: false}, {ID: 2, TypeID: 20, Null: false}], NextFamilyID: 1, Families: [{ID: 0, Columns: [1, 2]}], PrimaryIndex: 1, NextIndexID: 2, Indexes: [{ID: 1, Unique: true, KeyColumns: [{ID: 1, Dir: ASC}, {ID: 2, Dir: ASC}]}]} []                                                                                           F221111 19:09:22.802901 2775 sql/opt_catalog.go:1456  [-] 314 !goroutine 2775 [running]:                                         

@craig craig bot closed this as completed in b0daa5f Nov 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-sql-encoding Relating to the SQL/KV encoding. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) O-community Originated from the community X-anchored-telemetry The issue number is anchored by telemetry references.
Projects
None yet