Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

limits don't seem to apply directly to the underlying query, causing poor performance #1652

Closed
ric2b opened this issue Nov 17, 2020 · 21 comments · Fixed by #2677
Closed

limits don't seem to apply directly to the underlying query, causing poor performance #1652

ric2b opened this issue Nov 17, 2020 · 21 comments · Fixed by #2677
Assignees
Labels

Comments

@ric2b
Copy link

ric2b commented Nov 17, 2020

Environment

  • PostgreSQL version: 12.5 (docker image postgres:12)
  • PostgREST version: 7.0.1 (docker image postgrest/postgrest:v7.0.1)
  • Operating system: Ubuntu 20.10

Description of issue

(My best guess is that it is related to #621, but I might be misdiagnosing the issue)

Nope, see the following comments, the use of CTE's doesn't seem to be related.

Some context

I have the following simple function to do a text search for products and return the rows ordered by similarity (<-> is from the pg_trgm extenstion) of the text query to the concatenated and unaccented brand, name and description fields:

create or replace function search_products(query text) returns setof product as $$
    select *
    from product
    order by query <-> unaccent_concat_brand_name_desc(brand, name, description)::text asc
$$ stable language sql;

(By the way, the reason this is a function is that I don't think it's possible to represent it in PostgREST's query syntax, but please correct me if I'm wrong)

unaccent_concat_brand_name_desc is a function I created because unaccent and concat aren't immutable (for some very specific reasons I looked into and don't concern me) and were blocking me from creating an index to help with this query. With that function (marked as immutable) I created the following index:

create index product_trigram_idx on product using gist (unaccent_concat_brand_name_desc(brand, name, description) gist_trgm_ops);

With that index I can run the following query in ~100ms:

select store_id, id, brand, name, description
from search_products('potatoes')
where store_id='some_store_id'
limit 10;

I have run the query above with explain analyse and the plan does use the index (plus the performance difference is very noticeable when I remove the index).

The issue

I'm trying to get equivalent results from PostgREST, so I call /rpc/search_products?query=potatoes&store_id=eq.some_store_id&select=store_id,id,brand,name,description&limit=10.

However this takes nearly 2 seconds to run, this is the query that actually hits PostgreSQL:

db_1          | 2020-11-17 11:37:14.524 UTC [31] LOG:  execute 0: BEGIN ISOLATION LEVEL READ COMMITTED READ ONLY
db_1          | 2020-11-17 11:37:14.525 UTC [31] LOG:  statement: SET LOCAL <<<connection setup stuff, includes user agent, referrer, etc. redacted>>>
db_1          | 2020-11-17 11:37:14.526 UTC [31] LOG:  execute 8: 
db_1          | 	      WITH pgrst_source AS (WITH pgrst_payload AS (SELECT $1::json AS json_data), pgrst_body AS ( SELECT CASE WHEN json_typeof(json_data) = 'array' THEN json_data ELSE json_build_array(json_data) END AS val FROM pgrst_payload), pgrst_args AS ( SELECT * FROM json_to_recordset((SELECT val FROM pgrst_body)) AS _("query" text) ) SELECT "search_products"."brand", "search_products"."description", "search_products"."id", "search_products"."name", "search_products"."store_id" FROM "public"."search_products"("query" := (SELECT "query" FROM pgrst_args LIMIT 1)))
db_1          | 	      
db_1          | 	      SELECT
db_1          | 	        null::bigint AS total_result_set,
db_1          | 	        pg_catalog.count(_postgrest_t) AS page_total,
db_1          | 	        coalesce(json_agg(_postgrest_t), '[]')::character varying AS body,
db_1          | 	        coalesce(nullif(current_setting('response.headers', true), ''), '[]') AS response_headers
db_1          | 	      FROM (SELECT "pgrst_source"."store_id", "pgrst_source"."id", "pgrst_source"."brand", "pgrst_source"."name", "pgrst_source"."description" FROM "pgrst_source"  WHERE  "pgrst_source"."store_id" = 'some_store_id'::unknown   LIMIT 10 OFFSET 0) _postgrest_t;
db_1          | 2020-11-17 11:37:14.526 UTC [31] DETAIL:  parameters: $1 = '{"query":"potatoes"}'
db_1          | 2020-11-17 11:37:15.941 UTC [31] LOG:  execute 7: COMMIT

My guess is that the CTE for pgrst_source is acting as an optimization fence and preventing the outer LIMIT 10 from working efficiently.

Is there a way to make the limit on the PostgREST query behave the same as the SQL query example above? I would prefer to not hard-code it into the search_products function, that does fix the issue but is much less flexible.

@wolfgangwalther
Copy link
Member

My guess is that the CTE for pgrst_source is acting as an optimization fence and preventing the outer LIMIT 10 from working efficiently.

Can you try running the query that is used by PostgREST manually and replace WITH pgrst_source AS with WITH pgrst_source AS NOT MATERIALIZED? Does that change anything?

$1 should be something like '{"query":"potatoes"}', I think.

@ric2b
Copy link
Author

ric2b commented Nov 18, 2020

Thanks for the suggestion/helping! :)

Unfortunately it takes the same amount of time, no improvement.

I did try to iteratively simplify that CTE and I found the issue but I don't understand it. Looks like the plan is different when assigning a literal or the result of a query to query.

This runs slow (~1.5s):

SELECT 
    "search_products"."brand", 
    "search_products"."description", 
    "search_products"."id", 
    "search_products"."name", 
    "search_products"."store_id" 
FROM "public"."search_products"("query" := (select 'potatoes'))
LIMIT 10

plan:

Limit  (cost=0.26..0.36 rows=10 width=160) (actual time=1380.621..1380.624 rows=10 loops=1)
  InitPlan 1 (returns $0)
    ->  Result  (cost=0.00..0.01 rows=1 width=32) (actual time=0.000..0.001 rows=1 loops=1)
  ->  Function Scan on search_products  (cost=0.25..10.25 rows=1000 width=160) (actual time=1380.620..1380.622 rows=10 loops=1)
Planning Time: 0.028 ms
Execution Time: 1385.000 ms

But this runs quickly (~140ms):

SELECT 
    "search_products"."brand", 
    "search_products"."description", 
    "search_products"."id", 
    "search_products"."name", 
    "search_products"."store_id" 
FROM "public"."search_products"("query" := 'potatoes')
LIMIT 10

plan:

Limit  (cost=0.41..4.95 rows=10 width=76) (actual time=138.028..138.111 rows=10 loops=1)
  ->  Subquery Scan on search_products  (cost=0.41..233066.69 rows=513251 width=76) (actual time=138.027..138.109 rows=10 loops=1)
        ->  Index Scan using product_trigram_idx on product  (cost=0.41..227934.18 rows=513251 width=80) (actual time=138.026..138.107 rows=10 loops=1)
              Order By: (unaccent_concat_brand_name_desc(brand, name, description) <-> 'potatoes'::text)
Planning Time: 0.099 ms
Execution Time: 141.202 ms

@wolfgangwalther
Copy link
Member

Just to confirm: Is the performance still good when you apply the same change (just the replacement of select -> constant in the argument list) in the big query you started with?

@ric2b
Copy link
Author

ric2b commented Nov 18, 2020

Yup, if I take the entire PostgREST query and make that single change from "query" := (SELECT "query" FROM pgrst_args LIMIT 1) to "query" := 'potatoes', it runs in ~140ms.

If I replace it with "query" := (select 'potatoes') instead, it still runs slow.

edit: Tried casting to see if it helped ("query" := (select 'potatoes')::text) but it didn't.

@wolfgangwalther
Copy link
Member

Ok, great. We might be close. We have this:

, if paramsAsMultipleObjects
then fmtArgs varadicPrefix (\a -> " := pgrst_args." <> pgFmtIdent (pgaName a))
else fmtArgs varadicPrefix (\a -> " := (SELECT " <> pgFmtIdent (pgaName a) <> " FROM pgrst_args LIMIT 1)")

So we should try the other branch now.

Can you do a POST request with a Prefer: params=multiple-objects header and the query in an array json body (http://postgrest.org/en/latest/api.html#bulk-call), just to trigger that other branch there?

@ric2b
Copy link
Author

ric2b commented Nov 19, 2020

Ok, I think we're almost there, it now emits a query similar to this (wrapped with all the other stuff):

WITH pgrst_args AS (SELECT 'potatoes'::text AS "query")
SELECT pgrst_lat_args.* FROM pgrst_args, 
LATERAL ( SELECT "search_products".* FROM "public"."search_products"("query" := pgrst_args."query") ) pgrst_lat_args;

Which runs slow (~3.7s) because it doesn't include a limit, but if I add a limit 10 it runs in about 130ms.

So I think all that's missing is how to improve the placement of the limit. Not sure if that's a problem with the request I'm doing, which looks like this:

await fetch(`${base_url}/rpc/search_products`,
        {
            method: 'POST',
            headers: {
                'Content-Type': 'application/json',
                'Prefer': 'params=multiple-objects',
                'Range-Unit': 'items',
                'Range': '0-9',
            },
            body: JSON.stringify({"query": "potatoes"}),
        },
)

The full version of the PostgREST query, in case it helps:

WITH pgrst_source AS (
    WITH 
        pgrst_payload AS (SELECT '{"query": "potatoes"}'::json AS json_data), 
        pgrst_body AS ( SELECT CASE WHEN json_typeof(json_data) = 'array' THEN json_data ELSE json_build_array(json_data) END AS val FROM pgrst_payload), 
        pgrst_args AS ( SELECT * FROM json_to_recordset((SELECT val FROM pgrst_body)) AS _("query" text) ) 
    SELECT pgrst_lat_args.* FROM pgrst_args, 
    LATERAL ( SELECT "search_products".* FROM "public"."search_products"("query" := pgrst_args."query") ) pgrst_lat_args
)
SELECT
  null::bigint AS total_result_set,
  pg_catalog.count(_postgrest_t) AS page_total,
  coalesce(json_agg(_postgrest_t), '[]')::character varying AS body,
  coalesce(nullif(current_setting('response.headers', true), ''), '[]') AS response_headers
FROM (SELECT "pgrst_source".* FROM "pgrst_source"    LIMIT 10 OFFSET 0) _postgrest_t;

@wolfgangwalther
Copy link
Member

wolfgangwalther commented Nov 19, 2020

Which runs slow (~3.7s) because it doesn't include a limit, but if I add a limit 10 it runs in about 130ms.

Where did you add the limit? On the inner query (SELECT search_products...) or on the outer query (SELECT pgrst_lat_args...)?

So I think all that's missing is how to improve the placement of the limit.

This won't be as easy, because Prefer: count=xxx (http://postgrest.org/en/latest/api.html#exact-count) has to be supported as well, so we need to be able to run a count query without LIMIT on pgrst_source.

The full version of the PostgREST query, in case it helps:

Can you try the WITH pgrst_source AS NOT MATERIALIZED with this new query again?

Maybe it's the CTE that's now acting as the optimization fence.

Please try the following query:

WITH
  pgrst_payload AS (SELECT '{"query": "potatoes"}'::json AS json_data), 
  pgrst_body AS ( SELECT CASE WHEN json_typeof(json_data) = 'array' THEN json_data ELSE json_build_array(json_data) END AS val FROM pgrst_payload), 
  pgrst_args AS ( SELECT * FROM json_to_recordset((SELECT val FROM pgrst_body)) AS _("query" text) ),
  pgrst_source AS (
    SELECT "search_products".* FROM pgrst_args, 
    LATERAL "public"."search_products"("query" := pgrst_args."query")
  )
SELECT
  null::bigint AS total_result_set,
  pg_catalog.count(_postgrest_t) AS page_total,
  coalesce(json_agg(_postgrest_t), '[]')::character varying AS body,
  coalesce(nullif(current_setting('response.headers', true), ''), '[]') AS response_headers
FROM (SELECT "pgrst_source".* FROM "pgrst_source"    LIMIT 10 OFFSET 0) _postgrest_t;

@ric2b
Copy link
Author

ric2b commented Nov 19, 2020

Where did you add the limit?

On the outer query, like this:

WITH pgrst_args AS (SELECT 'potatoes'::text AS "query")
SELECT pgrst_lat_args.* FROM pgrst_args, 
LATERAL ( SELECT "search_products".* FROM "public"."search_products"("query" := pgrst_args."query") ) pgrst_lat_args
LIMIT 10;

Please try the following query:

Took the same 3.7s.

@wolfgangwalther
Copy link
Member

I'm running out of ideas, but I have one more:

WITH
  pgrst_payload AS (SELECT '{"query": "potatoes"}'::json AS json_data), 
  pgrst_body AS ( SELECT CASE WHEN json_typeof(json_data) = 'array' THEN json_data ELSE json_build_array(json_data) END AS val FROM pgrst_payload), 
  pgrst_args AS ( SELECT * FROM json_to_recordset((SELECT val FROM pgrst_body)) AS _("query" text) ),
  pgrst_source AS (
    SELECT "search_products".* FROM pgrst_args, 
    LATERAL "public"."search_products"("query" := pgrst_args."query")
  ),
  pgrst_select AS MATERIALIZED (SELECT "pgrst_source".* FROM "pgrst_source"    LIMIT 10 OFFSET 0)
SELECT
  null::bigint AS total_result_set,
  pg_catalog.count(_postgrest_t) AS page_total,
  coalesce(json_agg(_postgrest_t), '[]')::character varying AS body,
  coalesce(nullif(current_setting('response.headers', true), ''), '[]') AS response_headers
FROM pgrst_select _postgrest_t;

And could you do a comparison between the following two queries, please?

WITH pgrst_args AS (SELECT 'potatoes'::text AS "query")
SELECT pgrst_lat_args.* FROM pgrst_args, 
LATERAL ( SELECT "search_products".* FROM "public"."search_products"("query" := pgrst_args."query") ) pgrst_lat_args
LIMIT 10;
SELECT * FROM (
  WITH pgrst_args AS (SELECT 'potatoes'::text AS "query")
  SELECT pgrst_lat_args.* FROM pgrst_args, 
  LATERAL ( SELECT "search_products".* FROM "public"."search_products"("query" := pgrst_args."query") ) pgrst_lat_args
) t
LIMIT 10;

@ric2b
Copy link
Author

ric2b commented Nov 19, 2020

The first one was slow (took ~3.7s) and the last two were equally fast at ~130ms.

The plan for the first one:

Aggregate  (cost=213139.06..213139.08 rows=1 width=80) (actual time=3912.105..3912.107 rows=1 loops=1)
  CTE pgrst_select
    ->  Limit  (cost=213134.48..213138.81 rows=10 width=76) (actual time=3912.063..3912.068 rows=10 loops=1)
          InitPlan 1 (returns $0)
            ->  Result  (cost=0.00..0.01 rows=1 width=32) (actual time=0.005..0.005 rows=1 loops=1)
          ->  Nested Loop  (cost=213134.46..22468261.88 rows=51325100 width=76) (actual time=3907.284..3907.287 rows=10 loops=1)
                ->  Function Scan on json_to_recordset _  (cost=0.00..1.00 rows=100 width=32) (actual time=0.018..0.019 rows=1 loops=1)
                ->  Sort  (cost=213134.46..214417.59 rows=513251 width=80) (actual time=3907.255..3907.257 rows=10 loops=1)
                      Sort Key: ((_.query <-> unaccent_concat_brand_name_desc(product.brand, product.name, product.description)))
                      Sort Method: external merge  Disk: 46632kB
                      ->  Seq Scan on product  (cost=0.00..141648.39 rows=513251 width=80) (actual time=0.090..3693.209 rows=513251 loops=1)
  ->  CTE Scan on pgrst_select _postgrest_t  (cost=0.00..0.20 rows=10 width=24) (actual time=3912.071..3912.079 rows=10 loops=1)
Planning Time: 0.180 ms
JIT:
  Functions: 11
  Options: Inlining false, Optimization false, Expressions true, Deforming true
  Timing: Generation 0.752 ms, Inlining 0.000 ms, Optimization 0.267 ms, Emission 4.345 ms, Total 5.364 ms
Execution Time: 3917.584 ms

The last 2 have the same plan, which is the following:

Limit  (cost=0.41..4.95 rows=10 width=76) (actual time=147.369..147.510 rows=10 loops=1)
  ->  Subquery Scan on search_products  (cost=0.41..233066.69 rows=513251 width=76) (actual time=147.368..147.508 rows=10 loops=1)
        ->  Index Scan using product_trigram_idx on product  (cost=0.41..227934.18 rows=513251 width=80) (actual time=147.367..147.503 rows=10 loops=1)
              Order By: (unaccent_concat_brand_name_desc(brand, name, description) <-> 'potatoes'::text)
Planning Time: 0.106 ms
Execution Time: 147.531 ms

Thank you so much for continuing to help, by the way! 😸

@wolfgangwalther
Copy link
Member

Ah, I made a mistake with the two short queries. I wanted to write this:

WITH pgrst_args AS (SELECT 'potatoes'::text AS "query")
SELECT "search_products".* FROM pgrst_args, 
LATERAL "public"."search_products"("query" := pgrst_args."query")
LIMIT 10;

and

SELECT * FROM (
  WITH pgrst_args AS (SELECT 'potatoes'::text AS "query")
  SELECT "search_products".* FROM pgrst_args, 
  LATERAL "public"."search_products"("query" := pgrst_args."query")
) t
LIMIT 10;

If that's still fast, you can run this as well:

WITH pgrst_args AS (SELECT 'potatoes'::text AS "query")
SELECT * FROM (
  SELECT "search_products".* FROM pgrst_args, 
  LATERAL "public"."search_products"("query" := pgrst_args."query")
) t
LIMIT 10;

And then:

WITH pgrst_args AS (SELECT 'potatoes'::text AS "query"),
pgrst_source AS (
  SELECT "search_products".* FROM pgrst_args, 
  LATERAL "public"."search_products"("query" := pgrst_args."query")
)
SELECT * FROM pgrst_source LIMIT 10;

If all of those are still fast - can you try adding the "OFFSET 0" and see whether that does slows down any of those?

At some point in this chain the limit is not pushed down.

Ah, and I have one more idea for the bigger query:

WITH
  pgrst_payload AS (SELECT '{"query": "potatoes"}'::json AS json_data), 
  pgrst_body AS ( SELECT CASE WHEN json_typeof(json_data) = 'array' THEN json_data ELSE json_build_array(json_data) END AS val FROM pgrst_payload), 
  pgrst_args AS MATERIALIZED ( SELECT * FROM json_to_recordset((SELECT val FROM pgrst_body)) AS _("query" text) ),
  pgrst_source AS (
    SELECT "search_products".* FROM pgrst_args, 
    LATERAL "public"."search_products"("query" := pgrst_args."query")
  ),
  pgrst_select AS MATERIALIZED (SELECT "pgrst_source".* FROM "pgrst_source"    LIMIT 10 OFFSET 0)
SELECT
  null::bigint AS total_result_set,
  pg_catalog.count(_postgrest_t) AS page_total,
  coalesce(json_agg(_postgrest_t), '[]')::character varying AS body,
  coalesce(nullif(current_setting('response.headers', true), ''), '[]') AS response_headers
FROM pgrst_select _postgrest_t;

Thanks for running all those queries :D

@ric2b
Copy link
Author

ric2b commented Nov 19, 2020

All 4 of the smaller queries ran in about 120 to 140ms, with or without OFFSET 0.

The bigger query was slow, ~3.8s.

Thanks for running all those queries :D

No, thank you for helping and for working on this great project! :)

@wolfgangwalther
Copy link
Member

Meh. Still no clue, where it breaks :/

How about this?

WITH
  pgrst_payload AS (SELECT '{"query": "potatoes"}'::json AS json_data), 
  pgrst_body AS ( SELECT CASE WHEN json_typeof(json_data) = 'array' THEN json_data ELSE json_build_array(json_data) END AS val FROM pgrst_payload), 
  pgrst_args AS ( SELECT * FROM json_to_recordset((SELECT val FROM pgrst_body)) AS _("query" text) ),
  pgrst_source AS (
    SELECT "search_products".* FROM pgrst_args, 
    LATERAL "public"."search_products"("query" := pgrst_args."query")
  )
SELECT "pgrst_source".* FROM "pgrst_source"
LIMIT 10 OFFSET 0

or this?

WITH
pgrst_args AS (SELECT 'potatoes'::text AS "query"),
pgrst_source AS (
  SELECT "search_products".* FROM pgrst_args, 
  LATERAL "public"."search_products"("query" := pgrst_args."query")
)
SELECT
  null::bigint AS total_result_set,
  pg_catalog.count(_postgrest_t) AS page_total,
  coalesce(json_agg(_postgrest_t), '[]')::character varying AS body,
  coalesce(nullif(current_setting('response.headers', true), ''), '[]') AS response_headers
FROM (SELECT "pgrst_source".* FROM "pgrst_source"    LIMIT 10 OFFSET 0)  _postgrest_t;

@ric2b
Copy link
Author

ric2b commented Nov 19, 2020

The second one works, runs in ~130ms :)

(The first one takes ~3.8s)

@wolfgangwalther
Copy link
Member

Ok. I think I understand the problem now. Before I write that up and it turns out my conclusions are wrong... let's test it first.

I expect the following query to run fast:

WITH pgrst_source AS (
  WITH 
    pgrst_payload AS (SELECT '{"query": "potatoes"}'::json AS json_data), 
    pgrst_args AS ( SELECT * FROM json_to_record((SELECT json_data FROM pgrst_payload)) AS _("query" text) )
  SELECT "search_products".* FROM pgrst_args, 
  LATERAL "public"."search_products"("query" := pgrst_args."query")
)
SELECT
  null::bigint AS total_result_set,
  pg_catalog.count(_postgrest_t) AS page_total,
  coalesce(json_agg(_postgrest_t), '[]')::character varying AS body,
  coalesce(nullif(current_setting('response.headers', true), ''), '[]') AS response_headers
FROM (SELECT "pgrst_source".* FROM "pgrst_source"    LIMIT 10 OFFSET 0) _postgrest_t;

@wolfgangwalther
Copy link
Member

wolfgangwalther commented Nov 19, 2020

Summarizing the findings of this thread:

  1. The request in the opening post is slow, because the RPC can't be inlined. Without inlining the function call, pushing through the LIMIT clause is not possible. Therefore a full table scan is run, the result is materialized and then 10 rows are returned from that.

  2. Inlining is not possible, because we call the function like this:

SELECT * FROM "public"."search_products"("query" := (SELECT "query" FROM pgrst_args LIMIT 1))

The problem here is the subquery in the argument list. See https://wiki.postgresql.org/wiki/Inlining_of_SQL_functions for details:

A table function call will be inlined if all of the following conditions are met:
[...]
none of the actual arguments contain volatile expressions or subselects

  1. To call the function without subqueries, we need to use a LATERAL CROSS JOIN. We already do for Prefer: params=multiple-objects. The function call looks like this:
SELECT "search_products".* FROM pgrst_args, 
LATERAL "public"."search_products"("query" := pgrst_args."query")

This call will be inlined.

We can change the query for the regular case (no params=multiple-objects) to use the same syntax. However, in the current implementation, that has a performance penalty. It seems like this was used and then changed in the past and the QueryCost spec has been implemented to avoid a regression:

it "should not exceed cost when calling setof composite proc" $ do
cost <- exec pool [str| {"id": 3} |] $
requestToCallProcQuery (QualifiedIdentifier "test" "get_projects_below") [PgArg "id" "int" True False] False Nothing []
liftIO $
cost `shouldSatisfy` (< Just 40)

With the change to use LATERAL for all calls, the query cost is up to 2001 for me (the expected value for the params=multiple-objects case).

Below is my best guess of what happens next, assuming the query in my last post worked as expected.

  1. This performance drop is because of the following: Even if the call is now inlined, the limit clause is still not pushed down into the resulting subquery. This is because the planner makes a bad choice about the following part of the query:
WITH
  pgrst_args AS ( SELECT * FROM json_to_recordset((SELECT val FROM pgrst_body)) AS _("query" text) ),
SELECT "search_products".* FROM pgrst_args, 
LATERAL "public"."search_products"("query" := pgrst_args."query")

json_to_recordset returns setof record, just like our RPC function. Both have the default setting of 1000 expected rows set. This results in a nested loop plan and will scan the full products table.

It should be possible to get a better choice here, by setting ROWS on the RPC function to a high value.

But we can do better without that for the "no-multiple-objects" case already, by not doing the round trip of converting our json object to an array and then back to a single record here:

pgrst_payload AS (SELECT '{"query": "potatoes"}'::json AS json_data),
pgrst_body AS ( SELECT CASE WHEN json_typeof(json_data) = 'array' THEN json_data ELSE json_build_array(json_data) END AS val FROM pgrst_payload), 
pgrst_args AS ( SELECT * FROM json_to_recordset((SELECT val FROM pgrst_body)) AS _("query" text) )

Instead we can replace that with just a straight json_to_record, which returns a single record and not a recordset. The planner should recognize that. The query would look like this:

pgrst_payload AS (SELECT '{"query": "potatoes"}'::json AS json_data), 
pgrst_args AS ( SELECT * FROM json_to_record((SELECT json_data FROM pgrst_payload)) AS _("query" text) )

This should allow us to properly inline those kind of functions and have good execution plans while doing so.

@wolfgangwalther
Copy link
Member

wolfgangwalther commented Nov 19, 2020

@ric2b, for right now you could also test the following:

Redefine your RPC like this:

create or replace function search_products(query text) returns setof product as $$
    select *
    from product
    order by query <-> unaccent_concat_brand_name_desc(brand, name, description)::text asc
$$ stable language sql rows 2000; -- might have to play with the number and make it higher

Make your request with prefer=multiple-objects, similiar to what you did above:

await fetch(`${base_url}/rpc/search_products`,
        {
            method: 'POST',
            headers: {
                'Content-Type': 'application/json',
                'Prefer': 'params=multiple-objects',
                'Range-Unit': 'items',
                'Range': '0-9',
            },
            // note, that I changed the payload to an array of 1 object (was just an object before)
            body: JSON.stringify([{"query": "potatoes"}]),
        },
)

Does that perform better?

@ric2b
Copy link
Author

ric2b commented Nov 19, 2020

Before I write that up and it turns out my conclusions are wrong... let's test it first.

Sorry for the delay :(

I expect the following query to run fast:

It did not :/

Here's the plan for it:

Aggregate  (cost=213134.95..213134.97 rows=1 width=80) (actual time=3652.377..3652.378 rows=1 loops=1)
  ->  Subquery Scan on _postgrest_t  (cost=213134.47..213134.90 rows=10 width=100) (actual time=3652.346..3652.352 rows=10 loops=1)
        ->  Limit  (cost=213134.47..213134.80 rows=10 width=76) (actual time=3652.342..3652.345 rows=10 loops=1)
              ->  Nested Loop  (cost=213134.47..224682.63 rows=513251 width=76) (actual time=3647.653..3647.656 rows=10 loops=1)
                    InitPlan 1 (returns $0)
                      ->  Result  (cost=0.00..0.01 rows=1 width=32) (actual time=0.001..0.001 rows=1 loops=1)
                    ->  Function Scan on json_to_record _  (cost=0.00..0.01 rows=1 width=32) (actual time=0.016..0.016 rows=1 loops=1)
                    ->  Sort  (cost=213134.46..214417.59 rows=513251 width=80) (actual time=3647.626..3647.628 rows=10 loops=1)
                          Sort Key: ((_.query <-> unaccent_concat_brand_name_desc(product.brand, product.name, product.description)))
                          Sort Method: external merge  Disk: 46632kB
                          ->  Seq Scan on product  (cost=0.00..141648.39 rows=513251 width=80) (actual time=0.093..3446.307 rows=513251 loops=1)
Planning Time: 0.159 ms
JIT:
  Functions: 11
  Options: Inlining false, Optimization false, Expressions true, Deforming true
  Timing: Generation 0.811 ms, Inlining 0.000 ms, Optimization 0.246 ms, Emission 4.267 ms, Total 5.324 ms
Execution Time: 3657.902 ms

Does that perform better?

Unfortunately not, I tried to increase the rows multiple times but it didn't seem to make a difference, I got up to the following:

$$ stable language sql rows 2000000000;

@wolfgangwalther
Copy link
Member

Hm. So the planner knows about json_to_record returning 1 row (Function Scan on json_to_record _ (cost=0.00..0.01 rows=1 width=32) (actual time=0.016..0.016 rows=1 loops=1)), but still is not able to push the limit into the inlined subquery.

If I'm not mistaken this is actually an optimization that should be possible (at least in theory, if not for pg right now). I will test a couple of things and might try to get some feedback from "upstream".

@wolfgangwalther
Copy link
Member

So I have a test-case set up, that's kind of similar in some aspects, but different in others. I can show nicely, that the subquery in the argument list prevents inlining and causes a dramatic performance drop. However all the queries that I suggested afterwards, are very fast with my test case. ;)

So there must be something more specific to your use-case. I didn't use the pg_trgm extension and also not a gist index, but just a lof of integers and a btree. I guess I will have to create some strings next. :)

@ric2b
Copy link
Author

ric2b commented Nov 23, 2020

I think it shouldn't make a difference but the product table happens to be a materialized view, it's just a UNION ALL of 3 tables. Mentioning it just in case it does make the planner behave differently.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

Successfully merging a pull request may close this issue.

2 participants