limits don't seem to apply directly to the underlying query, causing poor performance #1652

ric2b · 2020-11-17T11:46:09Z

Environment

PostgreSQL version: 12.5 (docker image postgres:12)
PostgREST version: 7.0.1 (docker image postgrest/postgrest:v7.0.1)
Operating system: Ubuntu 20.10

Description of issue

~~(My best guess is that it is related to #621, but I might be misdiagnosing the issue)~~

Nope, see the following comments, the use of CTE's doesn't seem to be related.

Some context

I have the following simple function to do a text search for products and return the rows ordered by similarity (<-> is from the pg_trgm extenstion) of the text query to the concatenated and unaccented brand, name and description fields:

create or replace function search_products(query text) returns setof product as $$
    select *
    from product
    order by query <-> unaccent_concat_brand_name_desc(brand, name, description)::text asc
$$ stable language sql;

(By the way, the reason this is a function is that I don't think it's possible to represent it in PostgREST's query syntax, but please correct me if I'm wrong)

unaccent_concat_brand_name_desc is a function I created because unaccent and concat aren't immutable (for some very specific reasons I looked into and don't concern me) and were blocking me from creating an index to help with this query. With that function (marked as immutable) I created the following index:

create index product_trigram_idx on product using gist (unaccent_concat_brand_name_desc(brand, name, description) gist_trgm_ops);

With that index I can run the following query in ~100ms:

select store_id, id, brand, name, description
from search_products('potatoes')
where store_id='some_store_id'
limit 10;

I have run the query above with explain analyse and the plan does use the index (plus the performance difference is very noticeable when I remove the index).

The issue

I'm trying to get equivalent results from PostgREST, so I call /rpc/search_products?query=potatoes&store_id=eq.some_store_id&select=store_id,id,brand,name,description&limit=10.

However this takes nearly 2 seconds to run, this is the query that actually hits PostgreSQL:

db_1          | 2020-11-17 11:37:14.524 UTC [31] LOG:  execute 0: BEGIN ISOLATION LEVEL READ COMMITTED READ ONLY
db_1          | 2020-11-17 11:37:14.525 UTC [31] LOG:  statement: SET LOCAL <<<connection setup stuff, includes user agent, referrer, etc. redacted>>>
db_1          | 2020-11-17 11:37:14.526 UTC [31] LOG:  execute 8: 
db_1          | 	      WITH pgrst_source AS (WITH pgrst_payload AS (SELECT $1::json AS json_data), pgrst_body AS ( SELECT CASE WHEN json_typeof(json_data) = 'array' THEN json_data ELSE json_build_array(json_data) END AS val FROM pgrst_payload), pgrst_args AS ( SELECT * FROM json_to_recordset((SELECT val FROM pgrst_body)) AS _("query" text) ) SELECT "search_products"."brand", "search_products"."description", "search_products"."id", "search_products"."name", "search_products"."store_id" FROM "public"."search_products"("query" := (SELECT "query" FROM pgrst_args LIMIT 1)))
db_1          | 	      
db_1          | 	      SELECT
db_1          | 	        null::bigint AS total_result_set,
db_1          | 	        pg_catalog.count(_postgrest_t) AS page_total,
db_1          | 	        coalesce(json_agg(_postgrest_t), '[]')::character varying AS body,
db_1          | 	        coalesce(nullif(current_setting('response.headers', true), ''), '[]') AS response_headers
db_1          | 	      FROM (SELECT "pgrst_source"."store_id", "pgrst_source"."id", "pgrst_source"."brand", "pgrst_source"."name", "pgrst_source"."description" FROM "pgrst_source"  WHERE  "pgrst_source"."store_id" = 'some_store_id'::unknown   LIMIT 10 OFFSET 0) _postgrest_t;
db_1          | 2020-11-17 11:37:14.526 UTC [31] DETAIL:  parameters: $1 = '{"query":"potatoes"}'
db_1          | 2020-11-17 11:37:15.941 UTC [31] LOG:  execute 7: COMMIT

My guess is that the CTE for pgrst_source is acting as an optimization fence and preventing the outer LIMIT 10 from working efficiently.

Is there a way to make the limit on the PostgREST query behave the same as the SQL query example above? I would prefer to not hard-code it into the search_products function, that does fix the issue but is much less flexible.

The text was updated successfully, but these errors were encountered:

wolfgangwalther · 2020-11-18T09:33:04Z

My guess is that the CTE for pgrst_source is acting as an optimization fence and preventing the outer LIMIT 10 from working efficiently.

Can you try running the query that is used by PostgREST manually and replace WITH pgrst_source AS with WITH pgrst_source AS NOT MATERIALIZED? Does that change anything?

$1 should be something like '{"query":"potatoes"}', I think.

ric2b · 2020-11-18T21:25:24Z

Thanks for the suggestion/helping! :)

Unfortunately it takes the same amount of time, no improvement.

I did try to iteratively simplify that CTE and I found the issue but I don't understand it. Looks like the plan is different when assigning a literal or the result of a query to query.

This runs slow (~1.5s):

SELECT 
    "search_products"."brand", 
    "search_products"."description", 
    "search_products"."id", 
    "search_products"."name", 
    "search_products"."store_id" 
FROM "public"."search_products"("query" := (select 'potatoes'))
LIMIT 10

plan:

Limit  (cost=0.26..0.36 rows=10 width=160) (actual time=1380.621..1380.624 rows=10 loops=1)
  InitPlan 1 (returns $0)
    ->  Result  (cost=0.00..0.01 rows=1 width=32) (actual time=0.000..0.001 rows=1 loops=1)
  ->  Function Scan on search_products  (cost=0.25..10.25 rows=1000 width=160) (actual time=1380.620..1380.622 rows=10 loops=1)
Planning Time: 0.028 ms
Execution Time: 1385.000 ms

But this runs quickly (~140ms):

SELECT 
    "search_products"."brand", 
    "search_products"."description", 
    "search_products"."id", 
    "search_products"."name", 
    "search_products"."store_id" 
FROM "public"."search_products"("query" := 'potatoes')
LIMIT 10

plan:

Limit  (cost=0.41..4.95 rows=10 width=76) (actual time=138.028..138.111 rows=10 loops=1)
  ->  Subquery Scan on search_products  (cost=0.41..233066.69 rows=513251 width=76) (actual time=138.027..138.109 rows=10 loops=1)
        ->  Index Scan using product_trigram_idx on product  (cost=0.41..227934.18 rows=513251 width=80) (actual time=138.026..138.107 rows=10 loops=1)
              Order By: (unaccent_concat_brand_name_desc(brand, name, description) <-> 'potatoes'::text)
Planning Time: 0.099 ms
Execution Time: 141.202 ms

wolfgangwalther · 2020-11-18T21:31:19Z

Just to confirm: Is the performance still good when you apply the same change (just the replacement of select -> constant in the argument list) in the big query you started with?

ric2b · 2020-11-18T21:44:20Z

Yup, if I take the entire PostgREST query and make that single change from "query" := (SELECT "query" FROM pgrst_args LIMIT 1) to "query" := 'potatoes', it runs in ~140ms.

If I replace it with "query" := (select 'potatoes') instead, it still runs slow.

edit: Tried casting to see if it helped ("query" := (select 'potatoes')::text) but it didn't.

wolfgangwalther · 2020-11-19T08:22:38Z

Ok, great. We might be close. We have this:

postgrest/src/PostgREST/QueryBuilder.hs

Lines 142 to 144 in 3f690ec

    
           , if paramsAsMultipleObjects 
        
               then fmtArgs varadicPrefix (\a -> " := pgrst_args." <> pgFmtIdent (pgaName a)) 
        
               else fmtArgs varadicPrefix (\a -> " := (SELECT " <> pgFmtIdent (pgaName a) <> " FROM pgrst_args LIMIT 1)")

So we should try the other branch now.

Can you do a POST request with a Prefer: params=multiple-objects header and the query in an array json body (http://postgrest.org/en/latest/api.html#bulk-call), just to trigger that other branch there?

ric2b · 2020-11-19T12:57:57Z

Ok, I think we're almost there, it now emits a query similar to this (wrapped with all the other stuff):

WITH pgrst_args AS (SELECT 'potatoes'::text AS "query")
SELECT pgrst_lat_args.* FROM pgrst_args, 
LATERAL ( SELECT "search_products".* FROM "public"."search_products"("query" := pgrst_args."query") ) pgrst_lat_args;

Which runs slow (~3.7s) because it doesn't include a limit, but if I add a limit 10 it runs in about 130ms.

So I think all that's missing is how to improve the placement of the limit. Not sure if that's a problem with the request I'm doing, which looks like this:

await fetch(`${base_url}/rpc/search_products`,
        {
            method: 'POST',
            headers: {
                'Content-Type': 'application/json',
                'Prefer': 'params=multiple-objects',
                'Range-Unit': 'items',
                'Range': '0-9',
            },
            body: JSON.stringify({"query": "potatoes"}),
        },
)

The full version of the PostgREST query, in case it helps:

WITH pgrst_source AS (
    WITH 
        pgrst_payload AS (SELECT '{"query": "potatoes"}'::json AS json_data), 
        pgrst_body AS ( SELECT CASE WHEN json_typeof(json_data) = 'array' THEN json_data ELSE json_build_array(json_data) END AS val FROM pgrst_payload), 
        pgrst_args AS ( SELECT * FROM json_to_recordset((SELECT val FROM pgrst_body)) AS _("query" text) ) 
    SELECT pgrst_lat_args.* FROM pgrst_args, 
    LATERAL ( SELECT "search_products".* FROM "public"."search_products"("query" := pgrst_args."query") ) pgrst_lat_args
)
SELECT
  null::bigint AS total_result_set,
  pg_catalog.count(_postgrest_t) AS page_total,
  coalesce(json_agg(_postgrest_t), '[]')::character varying AS body,
  coalesce(nullif(current_setting('response.headers', true), ''), '[]') AS response_headers
FROM (SELECT "pgrst_source".* FROM "pgrst_source"    LIMIT 10 OFFSET 0) _postgrest_t;

wolfgangwalther · 2020-11-19T13:10:23Z

Which runs slow (~3.7s) because it doesn't include a limit, but if I add a limit 10 it runs in about 130ms.

Where did you add the limit? On the inner query (SELECT search_products...) or on the outer query (SELECT pgrst_lat_args...)?

So I think all that's missing is how to improve the placement of the limit.

This won't be as easy, because Prefer: count=xxx (http://postgrest.org/en/latest/api.html#exact-count) has to be supported as well, so we need to be able to run a count query without LIMIT on pgrst_source.

The full version of the PostgREST query, in case it helps:

~~Can you try the WITH pgrst_source AS NOT MATERIALIZED with this new query again?~~

~~Maybe it's the CTE that's now acting as the optimization fence.~~

Please try the following query:

WITH
  pgrst_payload AS (SELECT '{"query": "potatoes"}'::json AS json_data), 
  pgrst_body AS ( SELECT CASE WHEN json_typeof(json_data) = 'array' THEN json_data ELSE json_build_array(json_data) END AS val FROM pgrst_payload), 
  pgrst_args AS ( SELECT * FROM json_to_recordset((SELECT val FROM pgrst_body)) AS _("query" text) ),
  pgrst_source AS (
    SELECT "search_products".* FROM pgrst_args, 
    LATERAL "public"."search_products"("query" := pgrst_args."query")
  )
SELECT
  null::bigint AS total_result_set,
  pg_catalog.count(_postgrest_t) AS page_total,
  coalesce(json_agg(_postgrest_t), '[]')::character varying AS body,
  coalesce(nullif(current_setting('response.headers', true), ''), '[]') AS response_headers
FROM (SELECT "pgrst_source".* FROM "pgrst_source"    LIMIT 10 OFFSET 0) _postgrest_t;

ric2b · 2020-11-19T13:43:48Z

Where did you add the limit?

On the outer query, like this:

WITH pgrst_args AS (SELECT 'potatoes'::text AS "query")
SELECT pgrst_lat_args.* FROM pgrst_args, 
LATERAL ( SELECT "search_products".* FROM "public"."search_products"("query" := pgrst_args."query") ) pgrst_lat_args
LIMIT 10;

Please try the following query:

Took the same 3.7s.

wolfgangwalther · 2020-11-19T14:40:15Z

I'm running out of ideas, but I have one more:

WITH
  pgrst_payload AS (SELECT '{"query": "potatoes"}'::json AS json_data), 
  pgrst_body AS ( SELECT CASE WHEN json_typeof(json_data) = 'array' THEN json_data ELSE json_build_array(json_data) END AS val FROM pgrst_payload), 
  pgrst_args AS ( SELECT * FROM json_to_recordset((SELECT val FROM pgrst_body)) AS _("query" text) ),
  pgrst_source AS (
    SELECT "search_products".* FROM pgrst_args, 
    LATERAL "public"."search_products"("query" := pgrst_args."query")
  ),
  pgrst_select AS MATERIALIZED (SELECT "pgrst_source".* FROM "pgrst_source"    LIMIT 10 OFFSET 0)
SELECT
  null::bigint AS total_result_set,
  pg_catalog.count(_postgrest_t) AS page_total,
  coalesce(json_agg(_postgrest_t), '[]')::character varying AS body,
  coalesce(nullif(current_setting('response.headers', true), ''), '[]') AS response_headers
FROM pgrst_select _postgrest_t;

And could you do a comparison between the following two queries, please?

WITH pgrst_args AS (SELECT 'potatoes'::text AS "query")
SELECT pgrst_lat_args.* FROM pgrst_args, 
LATERAL ( SELECT "search_products".* FROM "public"."search_products"("query" := pgrst_args."query") ) pgrst_lat_args
LIMIT 10;

SELECT * FROM (
  WITH pgrst_args AS (SELECT 'potatoes'::text AS "query")
  SELECT pgrst_lat_args.* FROM pgrst_args, 
  LATERAL ( SELECT "search_products".* FROM "public"."search_products"("query" := pgrst_args."query") ) pgrst_lat_args
) t
LIMIT 10;

ric2b · 2020-11-19T15:36:48Z

The first one was slow (took ~3.7s) and the last two were equally fast at ~130ms.

The plan for the first one:

Aggregate  (cost=213139.06..213139.08 rows=1 width=80) (actual time=3912.105..3912.107 rows=1 loops=1)
  CTE pgrst_select
    ->  Limit  (cost=213134.48..213138.81 rows=10 width=76) (actual time=3912.063..3912.068 rows=10 loops=1)
          InitPlan 1 (returns $0)
            ->  Result  (cost=0.00..0.01 rows=1 width=32) (actual time=0.005..0.005 rows=1 loops=1)
          ->  Nested Loop  (cost=213134.46..22468261.88 rows=51325100 width=76) (actual time=3907.284..3907.287 rows=10 loops=1)
                ->  Function Scan on json_to_recordset _  (cost=0.00..1.00 rows=100 width=32) (actual time=0.018..0.019 rows=1 loops=1)
                ->  Sort  (cost=213134.46..214417.59 rows=513251 width=80) (actual time=3907.255..3907.257 rows=10 loops=1)
                      Sort Key: ((_.query <-> unaccent_concat_brand_name_desc(product.brand, product.name, product.description)))
                      Sort Method: external merge  Disk: 46632kB
                      ->  Seq Scan on product  (cost=0.00..141648.39 rows=513251 width=80) (actual time=0.090..3693.209 rows=513251 loops=1)
  ->  CTE Scan on pgrst_select _postgrest_t  (cost=0.00..0.20 rows=10 width=24) (actual time=3912.071..3912.079 rows=10 loops=1)
Planning Time: 0.180 ms
JIT:
  Functions: 11
  Options: Inlining false, Optimization false, Expressions true, Deforming true
  Timing: Generation 0.752 ms, Inlining 0.000 ms, Optimization 0.267 ms, Emission 4.345 ms, Total 5.364 ms
Execution Time: 3917.584 ms

The last 2 have the same plan, which is the following:

Limit  (cost=0.41..4.95 rows=10 width=76) (actual time=147.369..147.510 rows=10 loops=1)
  ->  Subquery Scan on search_products  (cost=0.41..233066.69 rows=513251 width=76) (actual time=147.368..147.508 rows=10 loops=1)
        ->  Index Scan using product_trigram_idx on product  (cost=0.41..227934.18 rows=513251 width=80) (actual time=147.367..147.503 rows=10 loops=1)
              Order By: (unaccent_concat_brand_name_desc(brand, name, description) <-> 'potatoes'::text)
Planning Time: 0.106 ms
Execution Time: 147.531 ms

Thank you so much for continuing to help, by the way! 😸

wolfgangwalther · 2020-11-19T15:53:21Z

Ah, I made a mistake with the two short queries. I wanted to write this:

WITH pgrst_args AS (SELECT 'potatoes'::text AS "query")
SELECT "search_products".* FROM pgrst_args, 
LATERAL "public"."search_products"("query" := pgrst_args."query")
LIMIT 10;

and

SELECT * FROM (
  WITH pgrst_args AS (SELECT 'potatoes'::text AS "query")
  SELECT "search_products".* FROM pgrst_args, 
  LATERAL "public"."search_products"("query" := pgrst_args."query")
) t
LIMIT 10;

If that's still fast, you can run this as well:

WITH pgrst_args AS (SELECT 'potatoes'::text AS "query")
SELECT * FROM (
  SELECT "search_products".* FROM pgrst_args, 
  LATERAL "public"."search_products"("query" := pgrst_args."query")
) t
LIMIT 10;

And then:

WITH pgrst_args AS (SELECT 'potatoes'::text AS "query"),
pgrst_source AS (
  SELECT "search_products".* FROM pgrst_args, 
  LATERAL "public"."search_products"("query" := pgrst_args."query")
)
SELECT * FROM pgrst_source LIMIT 10;

If all of those are still fast - can you try adding the "OFFSET 0" and see whether that does slows down any of those?

At some point in this chain the limit is not pushed down.

Ah, and I have one more idea for the bigger query:

WITH
  pgrst_payload AS (SELECT '{"query": "potatoes"}'::json AS json_data), 
  pgrst_body AS ( SELECT CASE WHEN json_typeof(json_data) = 'array' THEN json_data ELSE json_build_array(json_data) END AS val FROM pgrst_payload), 
  pgrst_args AS MATERIALIZED ( SELECT * FROM json_to_recordset((SELECT val FROM pgrst_body)) AS _("query" text) ),
  pgrst_source AS (
    SELECT "search_products".* FROM pgrst_args, 
    LATERAL "public"."search_products"("query" := pgrst_args."query")
  ),
  pgrst_select AS MATERIALIZED (SELECT "pgrst_source".* FROM "pgrst_source"    LIMIT 10 OFFSET 0)
SELECT
  null::bigint AS total_result_set,
  pg_catalog.count(_postgrest_t) AS page_total,
  coalesce(json_agg(_postgrest_t), '[]')::character varying AS body,
  coalesce(nullif(current_setting('response.headers', true), ''), '[]') AS response_headers
FROM pgrst_select _postgrest_t;

Thanks for running all those queries :D

ric2b · 2020-11-19T16:56:11Z

All 4 of the smaller queries ran in about 120 to 140ms, with or without OFFSET 0.

The bigger query was slow, ~3.8s.

Thanks for running all those queries :D

No, thank you for helping and for working on this great project! :)

wolfgangwalther · 2020-11-19T17:03:17Z

Meh. Still no clue, where it breaks :/

How about this?

WITH
  pgrst_payload AS (SELECT '{"query": "potatoes"}'::json AS json_data), 
  pgrst_body AS ( SELECT CASE WHEN json_typeof(json_data) = 'array' THEN json_data ELSE json_build_array(json_data) END AS val FROM pgrst_payload), 
  pgrst_args AS ( SELECT * FROM json_to_recordset((SELECT val FROM pgrst_body)) AS _("query" text) ),
  pgrst_source AS (
    SELECT "search_products".* FROM pgrst_args, 
    LATERAL "public"."search_products"("query" := pgrst_args."query")
  )
SELECT "pgrst_source".* FROM "pgrst_source"
LIMIT 10 OFFSET 0

or this?

WITH
pgrst_args AS (SELECT 'potatoes'::text AS "query"),
pgrst_source AS (
  SELECT "search_products".* FROM pgrst_args, 
  LATERAL "public"."search_products"("query" := pgrst_args."query")
)
SELECT
  null::bigint AS total_result_set,
  pg_catalog.count(_postgrest_t) AS page_total,
  coalesce(json_agg(_postgrest_t), '[]')::character varying AS body,
  coalesce(nullif(current_setting('response.headers', true), ''), '[]') AS response_headers
FROM (SELECT "pgrst_source".* FROM "pgrst_source"    LIMIT 10 OFFSET 0)  _postgrest_t;

ric2b · 2020-11-19T18:13:20Z

The second one works, runs in ~130ms :)

(The first one takes ~3.8s)

wolfgangwalther · 2020-11-19T19:53:44Z

Ok. I think I understand the problem now. Before I write that up and it turns out my conclusions are wrong... let's test it first.

I expect the following query to run fast:

WITH pgrst_source AS (
  WITH 
    pgrst_payload AS (SELECT '{"query": "potatoes"}'::json AS json_data), 
    pgrst_args AS ( SELECT * FROM json_to_record((SELECT json_data FROM pgrst_payload)) AS _("query" text) )
  SELECT "search_products".* FROM pgrst_args, 
  LATERAL "public"."search_products"("query" := pgrst_args."query")
)
SELECT
  null::bigint AS total_result_set,
  pg_catalog.count(_postgrest_t) AS page_total,
  coalesce(json_agg(_postgrest_t), '[]')::character varying AS body,
  coalesce(nullif(current_setting('response.headers', true), ''), '[]') AS response_headers
FROM (SELECT "pgrst_source".* FROM "pgrst_source"    LIMIT 10 OFFSET 0) _postgrest_t;

wolfgangwalther · 2020-11-19T20:32:52Z

Summarizing the findings of this thread:

The request in the opening post is slow, because the RPC can't be inlined. Without inlining the function call, pushing through the LIMIT clause is not possible. Therefore a full table scan is run, the result is materialized and then 10 rows are returned from that.
Inlining is not possible, because we call the function like this:

SELECT * FROM "public"."search_products"("query" := (SELECT "query" FROM pgrst_args LIMIT 1))

The problem here is the subquery in the argument list. See https://wiki.postgresql.org/wiki/Inlining_of_SQL_functions for details:

A table function call will be inlined if all of the following conditions are met:
[...]
none of the actual arguments contain volatile expressions or subselects

To call the function without subqueries, we need to use a LATERAL CROSS JOIN. We already do for Prefer: params=multiple-objects. The function call looks like this:

SELECT "search_products".* FROM pgrst_args, 
LATERAL "public"."search_products"("query" := pgrst_args."query")

This call will be inlined.

We can change the query for the regular case (no params=multiple-objects) to use the same syntax. However, in the current implementation, that has a performance penalty. It seems like this was used and then changed in the past and the QueryCost spec has been implemented to avoid a regression:

postgrest/test/QueryCost.hs

Lines 30 to 34 in 3f690ec

    
           it "should not exceed cost when calling setof composite proc" $ do 
        
             cost <- exec pool [str| {"id": 3} |] $ 
        
               requestToCallProcQuery (QualifiedIdentifier "test" "get_projects_below") [PgArg "id" "int" True False] False Nothing [] 
        
             liftIO $ 
        
               cost `shouldSatisfy` (< Just 40)

With the change to use LATERAL for all calls, the query cost is up to 2001 for me (the expected value for the params=multiple-objects case).

Below is my best guess of what happens next, assuming the query in my last post worked as expected.

This performance drop is because of the following: Even if the call is now inlined, the limit clause is still not pushed down into the resulting subquery. This is because the planner makes a bad choice about the following part of the query:

WITH
  pgrst_args AS ( SELECT * FROM json_to_recordset((SELECT val FROM pgrst_body)) AS _("query" text) ),
SELECT "search_products".* FROM pgrst_args, 
LATERAL "public"."search_products"("query" := pgrst_args."query")

json_to_recordset returns setof record, just like our RPC function. Both have the default setting of 1000 expected rows set. This results in a nested loop plan and will scan the full products table.

It should be possible to get a better choice here, by setting ROWS on the RPC function to a high value.

But we can do better without that for the "no-multiple-objects" case already, by not doing the round trip of converting our json object to an array and then back to a single record here:

pgrst_payload AS (SELECT '{"query": "potatoes"}'::json AS json_data),
pgrst_body AS ( SELECT CASE WHEN json_typeof(json_data) = 'array' THEN json_data ELSE json_build_array(json_data) END AS val FROM pgrst_payload), 
pgrst_args AS ( SELECT * FROM json_to_recordset((SELECT val FROM pgrst_body)) AS _("query" text) )

Instead we can replace that with just a straight json_to_record, which returns a single record and not a recordset. The planner should recognize that. The query would look like this:

pgrst_payload AS (SELECT '{"query": "potatoes"}'::json AS json_data), 
pgrst_args AS ( SELECT * FROM json_to_record((SELECT json_data FROM pgrst_payload)) AS _("query" text) )

This should allow us to properly inline those kind of functions and have good execution plans while doing so.

wolfgangwalther · 2020-11-19T20:38:16Z

@ric2b, for right now you could also test the following:

Redefine your RPC like this:

create or replace function search_products(query text) returns setof product as $$
    select *
    from product
    order by query <-> unaccent_concat_brand_name_desc(brand, name, description)::text asc
$$ stable language sql rows 2000; -- might have to play with the number and make it higher

Make your request with prefer=multiple-objects, similiar to what you did above:

await fetch(`${base_url}/rpc/search_products`,
        {
            method: 'POST',
            headers: {
                'Content-Type': 'application/json',
                'Prefer': 'params=multiple-objects',
                'Range-Unit': 'items',
                'Range': '0-9',
            },
            // note, that I changed the payload to an array of 1 object (was just an object before)
            body: JSON.stringify([{"query": "potatoes"}]),
        },
)

Does that perform better?

ric2b · 2020-11-19T23:05:58Z

Before I write that up and it turns out my conclusions are wrong... let's test it first.

Sorry for the delay :(

I expect the following query to run fast:

It did not :/

Here's the plan for it:

Aggregate  (cost=213134.95..213134.97 rows=1 width=80) (actual time=3652.377..3652.378 rows=1 loops=1)
  ->  Subquery Scan on _postgrest_t  (cost=213134.47..213134.90 rows=10 width=100) (actual time=3652.346..3652.352 rows=10 loops=1)
        ->  Limit  (cost=213134.47..213134.80 rows=10 width=76) (actual time=3652.342..3652.345 rows=10 loops=1)
              ->  Nested Loop  (cost=213134.47..224682.63 rows=513251 width=76) (actual time=3647.653..3647.656 rows=10 loops=1)
                    InitPlan 1 (returns $0)
                      ->  Result  (cost=0.00..0.01 rows=1 width=32) (actual time=0.001..0.001 rows=1 loops=1)
                    ->  Function Scan on json_to_record _  (cost=0.00..0.01 rows=1 width=32) (actual time=0.016..0.016 rows=1 loops=1)
                    ->  Sort  (cost=213134.46..214417.59 rows=513251 width=80) (actual time=3647.626..3647.628 rows=10 loops=1)
                          Sort Key: ((_.query <-> unaccent_concat_brand_name_desc(product.brand, product.name, product.description)))
                          Sort Method: external merge  Disk: 46632kB
                          ->  Seq Scan on product  (cost=0.00..141648.39 rows=513251 width=80) (actual time=0.093..3446.307 rows=513251 loops=1)
Planning Time: 0.159 ms
JIT:
  Functions: 11
  Options: Inlining false, Optimization false, Expressions true, Deforming true
  Timing: Generation 0.811 ms, Inlining 0.000 ms, Optimization 0.246 ms, Emission 4.267 ms, Total 5.324 ms
Execution Time: 3657.902 ms

Does that perform better?

Unfortunately not, I tried to increase the rows multiple times but it didn't seem to make a difference, I got up to the following:

$$ stable language sql rows 2000000000;

wolfgangwalther · 2020-11-20T19:23:05Z

Hm. So the planner knows about json_to_record returning 1 row (Function Scan on json_to_record _ (cost=0.00..0.01 rows=1 width=32) (actual time=0.016..0.016 rows=1 loops=1)), but still is not able to push the limit into the inlined subquery.

If I'm not mistaken this is actually an optimization that should be possible (at least in theory, if not for pg right now). I will test a couple of things and might try to get some feedback from "upstream".

wolfgangwalther · 2020-11-21T00:05:50Z

So I have a test-case set up, that's kind of similar in some aspects, but different in others. I can show nicely, that the subquery in the argument list prevents inlining and causes a dramatic performance drop. However all the queries that I suggested afterwards, are very fast with my test case. ;)

So there must be something more specific to your use-case. I didn't use the pg_trgm extension and also not a gist index, but just a lof of integers and a btree. I guess I will have to create some strings next. :)

ric2b · 2020-11-23T00:48:44Z

I think it shouldn't make a difference but the product table happens to be a materialized view, it's just a UNION ALL of 3 tables. Mentioning it just in case it does make the planner behave differently.

wolfgangwalther mentioned this issue Nov 19, 2020

Fix RPC returns #1615

Merged

wolfgangwalther mentioned this issue Nov 19, 2020

Add pgrst.accept setting to RPC #1582

Closed

wolfgangwalther added the perf label Nov 28, 2020

wolfgangwalther self-assigned this Dec 1, 2020

wolfgangwalther mentioned this issue Apr 28, 2021

Requests to volatile RPC with a single json parameter and which fires multiple triggers taking a very long time to complete #1836

Closed

This was referenced Feb 21, 2023

Optimizations for INSERT #2676

Closed

perf: use LATERAL instead of CTE for json body #2677

Merged

steve-chavez closed this as completed in #2677 Feb 25, 2023

ewan-escience mentioned this issue May 2, 2023

Organisation page is slow as a maintainer research-software-directory/RSD-as-a-service#863

Closed

steve-chavez mentioned this issue May 12, 2023

Add function inlining PostgREST/postgrest-docs#623

Merged

steve-chavez mentioned this issue Jul 10, 2023

Postgrest calling function gives very different query plan from console #2858

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

limits don't seem to apply directly to the underlying query, causing poor performance #1652

limits don't seem to apply directly to the underlying query, causing poor performance #1652

ric2b commented Nov 17, 2020 •

edited

Loading

wolfgangwalther commented Nov 18, 2020

ric2b commented Nov 18, 2020

wolfgangwalther commented Nov 18, 2020

ric2b commented Nov 18, 2020 •

edited

Loading

wolfgangwalther commented Nov 19, 2020

ric2b commented Nov 19, 2020

wolfgangwalther commented Nov 19, 2020 •

edited

Loading

ric2b commented Nov 19, 2020 •

edited

Loading

wolfgangwalther commented Nov 19, 2020

ric2b commented Nov 19, 2020 •

edited

Loading

wolfgangwalther commented Nov 19, 2020

ric2b commented Nov 19, 2020

wolfgangwalther commented Nov 19, 2020

ric2b commented Nov 19, 2020 •

edited

Loading

wolfgangwalther commented Nov 19, 2020

wolfgangwalther commented Nov 19, 2020 •

edited

Loading

wolfgangwalther commented Nov 19, 2020 •

edited

Loading

ric2b commented Nov 19, 2020 •

edited

Loading

wolfgangwalther commented Nov 20, 2020

wolfgangwalther commented Nov 21, 2020

ric2b commented Nov 23, 2020

limits don't seem to apply directly to the underlying query, causing poor performance #1652

limits don't seem to apply directly to the underlying query, causing poor performance #1652

Comments

ric2b commented Nov 17, 2020 • edited Loading

Environment

Description of issue

Some context

The issue

wolfgangwalther commented Nov 18, 2020

ric2b commented Nov 18, 2020

wolfgangwalther commented Nov 18, 2020

ric2b commented Nov 18, 2020 • edited Loading

wolfgangwalther commented Nov 19, 2020

ric2b commented Nov 19, 2020

wolfgangwalther commented Nov 19, 2020 • edited Loading

ric2b commented Nov 19, 2020 • edited Loading

wolfgangwalther commented Nov 19, 2020

ric2b commented Nov 19, 2020 • edited Loading

wolfgangwalther commented Nov 19, 2020

ric2b commented Nov 19, 2020

wolfgangwalther commented Nov 19, 2020

ric2b commented Nov 19, 2020 • edited Loading

wolfgangwalther commented Nov 19, 2020

wolfgangwalther commented Nov 19, 2020 • edited Loading

wolfgangwalther commented Nov 19, 2020 • edited Loading

ric2b commented Nov 19, 2020 • edited Loading

wolfgangwalther commented Nov 20, 2020

wolfgangwalther commented Nov 21, 2020

ric2b commented Nov 23, 2020

ric2b commented Nov 17, 2020 •

edited

Loading

ric2b commented Nov 18, 2020 •

edited

Loading

wolfgangwalther commented Nov 19, 2020 •

edited

Loading

ric2b commented Nov 19, 2020 •

edited

Loading

ric2b commented Nov 19, 2020 •

edited

Loading

ric2b commented Nov 19, 2020 •

edited

Loading

wolfgangwalther commented Nov 19, 2020 •

edited

Loading

wolfgangwalther commented Nov 19, 2020 •

edited

Loading

ric2b commented Nov 19, 2020 •

edited

Loading