-
-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
limits don't seem to apply directly to the underlying query, causing poor performance #1652
Comments
Can you try running the query that is used by PostgREST manually and replace
|
Thanks for the suggestion/helping! :) Unfortunately it takes the same amount of time, no improvement. I did try to iteratively simplify that CTE and I found the issue but I don't understand it. Looks like the plan is different when assigning a literal or the result of a query to This runs slow (~1.5s): SELECT
"search_products"."brand",
"search_products"."description",
"search_products"."id",
"search_products"."name",
"search_products"."store_id"
FROM "public"."search_products"("query" := (select 'potatoes'))
LIMIT 10 plan:
But this runs quickly (~140ms): SELECT
"search_products"."brand",
"search_products"."description",
"search_products"."id",
"search_products"."name",
"search_products"."store_id"
FROM "public"."search_products"("query" := 'potatoes')
LIMIT 10 plan:
|
Just to confirm: Is the performance still good when you apply the same change (just the replacement of select -> constant in the argument list) in the big query you started with? |
Yup, if I take the entire PostgREST query and make that single change from If I replace it with edit: Tried casting to see if it helped ( |
Ok, great. We might be close. We have this: postgrest/src/PostgREST/QueryBuilder.hs Lines 142 to 144 in 3f690ec
So we should try the other branch now. Can you do a POST request with a |
Ok, I think we're almost there, it now emits a query similar to this (wrapped with all the other stuff): WITH pgrst_args AS (SELECT 'potatoes'::text AS "query")
SELECT pgrst_lat_args.* FROM pgrst_args,
LATERAL ( SELECT "search_products".* FROM "public"."search_products"("query" := pgrst_args."query") ) pgrst_lat_args; Which runs slow (~3.7s) because it doesn't include a limit, but if I add a So I think all that's missing is how to improve the placement of the limit. Not sure if that's a problem with the request I'm doing, which looks like this: await fetch(`${base_url}/rpc/search_products`,
{
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Prefer': 'params=multiple-objects',
'Range-Unit': 'items',
'Range': '0-9',
},
body: JSON.stringify({"query": "potatoes"}),
},
) The full version of the PostgREST query, in case it helps: WITH pgrst_source AS (
WITH
pgrst_payload AS (SELECT '{"query": "potatoes"}'::json AS json_data),
pgrst_body AS ( SELECT CASE WHEN json_typeof(json_data) = 'array' THEN json_data ELSE json_build_array(json_data) END AS val FROM pgrst_payload),
pgrst_args AS ( SELECT * FROM json_to_recordset((SELECT val FROM pgrst_body)) AS _("query" text) )
SELECT pgrst_lat_args.* FROM pgrst_args,
LATERAL ( SELECT "search_products".* FROM "public"."search_products"("query" := pgrst_args."query") ) pgrst_lat_args
)
SELECT
null::bigint AS total_result_set,
pg_catalog.count(_postgrest_t) AS page_total,
coalesce(json_agg(_postgrest_t), '[]')::character varying AS body,
coalesce(nullif(current_setting('response.headers', true), ''), '[]') AS response_headers
FROM (SELECT "pgrst_source".* FROM "pgrst_source" LIMIT 10 OFFSET 0) _postgrest_t; |
Where did you add the limit? On the inner query (SELECT search_products...) or on the outer query (SELECT pgrst_lat_args...)?
This won't be as easy, because
Please try the following query: WITH
pgrst_payload AS (SELECT '{"query": "potatoes"}'::json AS json_data),
pgrst_body AS ( SELECT CASE WHEN json_typeof(json_data) = 'array' THEN json_data ELSE json_build_array(json_data) END AS val FROM pgrst_payload),
pgrst_args AS ( SELECT * FROM json_to_recordset((SELECT val FROM pgrst_body)) AS _("query" text) ),
pgrst_source AS (
SELECT "search_products".* FROM pgrst_args,
LATERAL "public"."search_products"("query" := pgrst_args."query")
)
SELECT
null::bigint AS total_result_set,
pg_catalog.count(_postgrest_t) AS page_total,
coalesce(json_agg(_postgrest_t), '[]')::character varying AS body,
coalesce(nullif(current_setting('response.headers', true), ''), '[]') AS response_headers
FROM (SELECT "pgrst_source".* FROM "pgrst_source" LIMIT 10 OFFSET 0) _postgrest_t; |
On the outer query, like this: WITH pgrst_args AS (SELECT 'potatoes'::text AS "query")
SELECT pgrst_lat_args.* FROM pgrst_args,
LATERAL ( SELECT "search_products".* FROM "public"."search_products"("query" := pgrst_args."query") ) pgrst_lat_args
LIMIT 10;
Took the same 3.7s. |
I'm running out of ideas, but I have one more: WITH
pgrst_payload AS (SELECT '{"query": "potatoes"}'::json AS json_data),
pgrst_body AS ( SELECT CASE WHEN json_typeof(json_data) = 'array' THEN json_data ELSE json_build_array(json_data) END AS val FROM pgrst_payload),
pgrst_args AS ( SELECT * FROM json_to_recordset((SELECT val FROM pgrst_body)) AS _("query" text) ),
pgrst_source AS (
SELECT "search_products".* FROM pgrst_args,
LATERAL "public"."search_products"("query" := pgrst_args."query")
),
pgrst_select AS MATERIALIZED (SELECT "pgrst_source".* FROM "pgrst_source" LIMIT 10 OFFSET 0)
SELECT
null::bigint AS total_result_set,
pg_catalog.count(_postgrest_t) AS page_total,
coalesce(json_agg(_postgrest_t), '[]')::character varying AS body,
coalesce(nullif(current_setting('response.headers', true), ''), '[]') AS response_headers
FROM pgrst_select _postgrest_t; And could you do a comparison between the following two queries, please? WITH pgrst_args AS (SELECT 'potatoes'::text AS "query")
SELECT pgrst_lat_args.* FROM pgrst_args,
LATERAL ( SELECT "search_products".* FROM "public"."search_products"("query" := pgrst_args."query") ) pgrst_lat_args
LIMIT 10; SELECT * FROM (
WITH pgrst_args AS (SELECT 'potatoes'::text AS "query")
SELECT pgrst_lat_args.* FROM pgrst_args,
LATERAL ( SELECT "search_products".* FROM "public"."search_products"("query" := pgrst_args."query") ) pgrst_lat_args
) t
LIMIT 10; |
The first one was slow (took ~3.7s) and the last two were equally fast at ~130ms. The plan for the first one:
The last 2 have the same plan, which is the following:
Thank you so much for continuing to help, by the way! 😸 |
Ah, I made a mistake with the two short queries. I wanted to write this: WITH pgrst_args AS (SELECT 'potatoes'::text AS "query")
SELECT "search_products".* FROM pgrst_args,
LATERAL "public"."search_products"("query" := pgrst_args."query")
LIMIT 10; and SELECT * FROM (
WITH pgrst_args AS (SELECT 'potatoes'::text AS "query")
SELECT "search_products".* FROM pgrst_args,
LATERAL "public"."search_products"("query" := pgrst_args."query")
) t
LIMIT 10; If that's still fast, you can run this as well: WITH pgrst_args AS (SELECT 'potatoes'::text AS "query")
SELECT * FROM (
SELECT "search_products".* FROM pgrst_args,
LATERAL "public"."search_products"("query" := pgrst_args."query")
) t
LIMIT 10; And then: WITH pgrst_args AS (SELECT 'potatoes'::text AS "query"),
pgrst_source AS (
SELECT "search_products".* FROM pgrst_args,
LATERAL "public"."search_products"("query" := pgrst_args."query")
)
SELECT * FROM pgrst_source LIMIT 10; If all of those are still fast - can you try adding the "OFFSET 0" and see whether that does slows down any of those? At some point in this chain the limit is not pushed down. Ah, and I have one more idea for the bigger query: WITH
pgrst_payload AS (SELECT '{"query": "potatoes"}'::json AS json_data),
pgrst_body AS ( SELECT CASE WHEN json_typeof(json_data) = 'array' THEN json_data ELSE json_build_array(json_data) END AS val FROM pgrst_payload),
pgrst_args AS MATERIALIZED ( SELECT * FROM json_to_recordset((SELECT val FROM pgrst_body)) AS _("query" text) ),
pgrst_source AS (
SELECT "search_products".* FROM pgrst_args,
LATERAL "public"."search_products"("query" := pgrst_args."query")
),
pgrst_select AS MATERIALIZED (SELECT "pgrst_source".* FROM "pgrst_source" LIMIT 10 OFFSET 0)
SELECT
null::bigint AS total_result_set,
pg_catalog.count(_postgrest_t) AS page_total,
coalesce(json_agg(_postgrest_t), '[]')::character varying AS body,
coalesce(nullif(current_setting('response.headers', true), ''), '[]') AS response_headers
FROM pgrst_select _postgrest_t; Thanks for running all those queries :D |
All 4 of the smaller queries ran in about 120 to 140ms, with or without The bigger query was slow, ~3.8s.
No, thank you for helping and for working on this great project! :) |
Meh. Still no clue, where it breaks :/ How about this? WITH
pgrst_payload AS (SELECT '{"query": "potatoes"}'::json AS json_data),
pgrst_body AS ( SELECT CASE WHEN json_typeof(json_data) = 'array' THEN json_data ELSE json_build_array(json_data) END AS val FROM pgrst_payload),
pgrst_args AS ( SELECT * FROM json_to_recordset((SELECT val FROM pgrst_body)) AS _("query" text) ),
pgrst_source AS (
SELECT "search_products".* FROM pgrst_args,
LATERAL "public"."search_products"("query" := pgrst_args."query")
)
SELECT "pgrst_source".* FROM "pgrst_source"
LIMIT 10 OFFSET 0 or this? WITH
pgrst_args AS (SELECT 'potatoes'::text AS "query"),
pgrst_source AS (
SELECT "search_products".* FROM pgrst_args,
LATERAL "public"."search_products"("query" := pgrst_args."query")
)
SELECT
null::bigint AS total_result_set,
pg_catalog.count(_postgrest_t) AS page_total,
coalesce(json_agg(_postgrest_t), '[]')::character varying AS body,
coalesce(nullif(current_setting('response.headers', true), ''), '[]') AS response_headers
FROM (SELECT "pgrst_source".* FROM "pgrst_source" LIMIT 10 OFFSET 0) _postgrest_t; |
The second one works, runs in ~130ms :) (The first one takes ~3.8s) |
Ok. I think I understand the problem now. Before I write that up and it turns out my conclusions are wrong... let's test it first. I expect the following query to run fast: WITH pgrst_source AS (
WITH
pgrst_payload AS (SELECT '{"query": "potatoes"}'::json AS json_data),
pgrst_args AS ( SELECT * FROM json_to_record((SELECT json_data FROM pgrst_payload)) AS _("query" text) )
SELECT "search_products".* FROM pgrst_args,
LATERAL "public"."search_products"("query" := pgrst_args."query")
)
SELECT
null::bigint AS total_result_set,
pg_catalog.count(_postgrest_t) AS page_total,
coalesce(json_agg(_postgrest_t), '[]')::character varying AS body,
coalesce(nullif(current_setting('response.headers', true), ''), '[]') AS response_headers
FROM (SELECT "pgrst_source".* FROM "pgrst_source" LIMIT 10 OFFSET 0) _postgrest_t; |
Summarizing the findings of this thread:
The problem here is the subquery in the argument list. See https://wiki.postgresql.org/wiki/Inlining_of_SQL_functions for details:
SELECT "search_products".* FROM pgrst_args,
LATERAL "public"."search_products"("query" := pgrst_args."query") This call will be inlined. We can change the query for the regular case (no Lines 30 to 34 in 3f690ec
With the change to use Below is my best guess of what happens next, assuming the query in my last post worked as expected.
WITH
pgrst_args AS ( SELECT * FROM json_to_recordset((SELECT val FROM pgrst_body)) AS _("query" text) ),
SELECT "search_products".* FROM pgrst_args,
LATERAL "public"."search_products"("query" := pgrst_args."query")
It should be possible to get a better choice here, by setting But we can do better without that for the "no-multiple-objects" case already, by not doing the round trip of converting our json object to an array and then back to a single record here: pgrst_payload AS (SELECT '{"query": "potatoes"}'::json AS json_data),
pgrst_body AS ( SELECT CASE WHEN json_typeof(json_data) = 'array' THEN json_data ELSE json_build_array(json_data) END AS val FROM pgrst_payload),
pgrst_args AS ( SELECT * FROM json_to_recordset((SELECT val FROM pgrst_body)) AS _("query" text) ) Instead we can replace that with just a straight pgrst_payload AS (SELECT '{"query": "potatoes"}'::json AS json_data),
pgrst_args AS ( SELECT * FROM json_to_record((SELECT json_data FROM pgrst_payload)) AS _("query" text) ) This should allow us to properly inline those kind of functions and have good execution plans while doing so. |
@ric2b, for right now you could also test the following: Redefine your RPC like this: create or replace function search_products(query text) returns setof product as $$
select *
from product
order by query <-> unaccent_concat_brand_name_desc(brand, name, description)::text asc
$$ stable language sql rows 2000; -- might have to play with the number and make it higher Make your request with await fetch(`${base_url}/rpc/search_products`,
{
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Prefer': 'params=multiple-objects',
'Range-Unit': 'items',
'Range': '0-9',
},
// note, that I changed the payload to an array of 1 object (was just an object before)
body: JSON.stringify([{"query": "potatoes"}]),
},
) Does that perform better? |
Sorry for the delay :(
It did not :/ Here's the plan for it:
Unfortunately not, I tried to increase the $$ stable language sql rows 2000000000; |
Hm. So the planner knows about If I'm not mistaken this is actually an optimization that should be possible (at least in theory, if not for pg right now). I will test a couple of things and might try to get some feedback from "upstream". |
So I have a test-case set up, that's kind of similar in some aspects, but different in others. I can show nicely, that the subquery in the argument list prevents inlining and causes a dramatic performance drop. However all the queries that I suggested afterwards, are very fast with my test case. ;) So there must be something more specific to your use-case. I didn't use the |
I think it shouldn't make a difference but the |
Environment
postgres:12
)postgrest/postgrest:v7.0.1
)20.10
Description of issue
(My best guess is that it is related to #621, but I might be misdiagnosing the issue)Nope, see the following comments, the use of CTE's doesn't seem to be related.
Some context
I have the following simple function to do a text search for products and return the rows ordered by similarity (
<->
is from the pg_trgm extenstion) of the text query to the concatenated and unaccentedbrand
,name
anddescription
fields:(By the way, the reason this is a function is that I don't think it's possible to represent it in PostgREST's query syntax, but please correct me if I'm wrong)
unaccent_concat_brand_name_desc
is a function I created becauseunaccent
andconcat
aren't immutable (for some very specific reasons I looked into and don't concern me) and were blocking me from creating an index to help with this query. With that function (marked as immutable) I created the following index:With that index I can run the following query in ~100ms:
I have run the query above with
explain analyse
and the plan does use the index (plus the performance difference is very noticeable when I remove the index).The issue
I'm trying to get equivalent results from PostgREST, so I call
/rpc/search_products?query=potatoes&store_id=eq.some_store_id&select=store_id,id,brand,name,description&limit=10
.However this takes nearly 2 seconds to run, this is the query that actually hits PostgreSQL:
My guess is that the CTE for
pgrst_source
is acting as an optimization fence and preventing the outerLIMIT 10
from working efficiently.Is there a way to make the
limit
on the PostgREST query behave the same as the SQL query example above? I would prefer to not hard-code it into thesearch_products
function, that does fix the issue but is much less flexible.The text was updated successfully, but these errors were encountered: