Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change FTS operators to explicitly call to_tsvector() #2255

Open
dsyrstad opened this issue Apr 20, 2022 · 11 comments
Open

Change FTS operators to explicitly call to_tsvector() #2255

dsyrstad opened this issue Apr 20, 2022 · 11 comments
Labels
enhancement a feature, ready for implementation

Comments

@dsyrstad
Copy link

Environment

  • PostgreSQL version: PostgreSQL 13.6 (Ubuntu 13.6-0ubuntu0.21.10.1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.2.0-7ubuntu2) 11.2.0, 64-bit
  • PostgREST version: postgrest/postgrest:v9.0.0
  • Operating system: Ubuntu 21.10

Description of issue

I cannot perform a full-text search on json or jsonb column types.

Steps to reproduce

  1. Perform a query such as GET /contacts?select=jsonb_field&jsonb_field_1=fts.jsonvalue

Actual result

{
    "hint": "No operator matches the given name and argument types. You might need to add explicit type casts.",
    "details": null,
    "code": "42883",
    "message": "operator does not exist: jsonb @@ tsquery"
}

The query produced contains the SQL:

WHERE  "public"."contacts"."jsonb_field" @@ to_tsquery($1) 

Expected Result

I'm able to perform a full-text search on a json/jsonb column. There is no workaround with PostgREST at present.

Possible solution

Instead of generating:

WHERE  "public"."contacts"."jsonb_field" @@ to_tsquery($1) 

...generate:

WHERE  to_tsvector("public"."contacts"."jsonb_field") @@ to_tsquery($1) 

I believe this is to be exactly equivalent. According to the doc, to_tsvector() is implicitly called with text:

text @@ tsquery → boolean

Does text string, after implicit invocation of to_tsvector(), match tsquery?

By explicitly calling to_tsvector(), json/jsonb columns would also be handled.

@steve-chavez
Copy link
Member

According to the doc, to_tsvector() is implicitly called with text:

@dsyrstad Hm, if the above is true, then I believe just doing a ->> on a field of the jsonb should work?

Like:

GET /contacts?select=jsonb_field&jsonb_field_1->>field=fts.jsonvalue

@steve-chavez
Copy link
Member

Or do you want to search across all the jsonb value. Hm, I do see that's supported by tsvector:

postgres=# \df to_tsvector
                            List of functions
   Schema   |    Name     | Result data type | Argument data types | Type
------------+-------------+------------------+---------------------+------
 pg_catalog | to_tsvector | tsvector         | json                | func
 pg_catalog | to_tsvector | tsvector         | jsonb               | func
 pg_catalog | to_tsvector | tsvector         | regconfig, json     | func
 pg_catalog | to_tsvector | tsvector         | regconfig, jsonb    | func
 pg_catalog | to_tsvector | tsvector         | regconfig, text     | func
 pg_catalog | to_tsvector | tsvector         | text                | func

@steve-chavez
Copy link
Member

steve-chavez commented Apr 20, 2022

We've discusssed adding modifiers to operators in #1943 (comment). So perhaps we can have:

GET /contacts?jsonb_field_1=to_tsvector.fts.jsonvalue

@steve-chavez steve-chavez added the idea Needs of discussion to become an enhancement, not ready for implementation label Apr 20, 2022
@dsyrstad
Copy link
Author

@steve-chavez Yes, I want to search across all jsonb values. It's pretty convenient.

Having a to_tsvector modifier would work. But the solution I proposed should be exactly equivalent, plus expands the functionality without have to specify anything more in the query.

Curious - could we also get casts with modifiers? You can currently cast in select, but not in filter operators.

@steve-chavez
Copy link
Member

But the solution I proposed should be exactly equivalent, plus expands the functionality without have to specify anything more in the query.

Ah, I see. So just call to_tsvector for all. We'd have to test if that breaks any tests I guess.

Would you like to give it a shot? Relevant file is https://github.com/PostgREST/postgrest/blob/main/src/PostgREST/Query/SqlFragment.hs#L99

Curious - could we also get casts with modifiers?

Hm, we frown upon doing that because any client could invalidate indexes and generate slow queries.

@dsyrstad
Copy link
Author

dsyrstad commented Apr 20, 2022

@steve-chavez Normally I would raise a PR, but in this case I don't have Haskell skills, nor an environment set up to build it on.

@wolfgangwalther wolfgangwalther added enhancement a feature, ready for implementation and removed idea Needs of discussion to become an enhancement, not ready for implementation labels Apr 21, 2022
@wolfgangwalther
Copy link
Member

Kind of similar to what we did in #2145 - making the calls explicit, opens up new possibilities.

@laurenceisla
Copy link
Member

I tested adding to_tsvector to this line:

pgFmtFieldFts op = pgFmtField table fld <> " " <> SQL.sql (ftsOperator op)

Result:

pgFmtFieldFts op = "to_tsvector(" <> pgFmtField table fld <> ")" <> " " <> SQL.sql (ftsOperator op)

This breaks all the fts tests where the column type is tsvector. For instance, one of the failing tests returns this filter in the query (text_search_vector is a tsvector column), which throws an error:

  WHERE  to_tsvector("api"."tsearch"."text_search_vector") @@ to_tsquery($1) 

Can't find a way to determine the type of the column, maybe the schema cache needs to be used here, or perhaps adding the to_tsvector modifier to the fts operator like Steve mentioned is needed?

@steve-chavez
Copy link
Member

Can't find a way to determine the type of the column, maybe the schema cache needs to be used here

Now that tableColumns is easier to get (already inside our Table in the internal cache) I think we could do that.

Only need to be careful to not search for the column for every other filter - we could also use a load test to see if noticeable perf is lost for fts.

@softmarshmallow
Copy link

Is this planned or tracked?

@wolfgangwalther
Copy link
Member

Is this planned or tracked?

It is tracked in this very issue.

If you want to work on it, feel free.

There is no workaround with PostgREST at present.

Did anyone try creating the missing @@ operator for jsonb with a function that does this to_tsvector call explicitly?

Then it should work out of the box, I guess.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement a feature, ready for implementation
Development

No branches or pull requests

5 participants