Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple "hops" for embed(Unnest/flatten embeds) #1233

Closed
blainehansen opened this issue Jan 24, 2019 · 7 comments · Fixed by #2564
Closed

Multiple "hops" for embed(Unnest/flatten embeds) #1233

blainehansen opened this issue Jan 24, 2019 · 7 comments · Fixed by #2564
Labels
embedding resource embedding enhancement a feature, ready for implementation

Comments

@blainehansen
Copy link

blainehansen commented Jan 24, 2019

Environment

  • PostgreSQL version: (if using docker, specify the image)
    docker postgres:10-alpine
  • PostgREST version: (if using docker, specify the image)
    docker postgrest/postgrest (pulled fresh this morning)
  • Operating system:
    Ubuntu 18

Description of issue (less an issue than a question)

Is it possible to do multiple "hops" to embed something?

For example, I have a schema similar to this (the real thing has more columns and checks)

create table dimension (
	id serial primary key,
	dimension_type_name text not null,

	"text" text not null
);

create table dimension_relationship (
	ancestor_id int references dimension(id) not null,
	descendent_id int references dimension(id) not null,
	check (ancestor_id != descendent_id),
	primary key (ancestor_id, descendent_id)
	-- other checks and indices to ensure these only go one way
);

The basic idea behind this structure is that it can store any arbitrary hierarchy of data. The relationships can only go in one direction, but it doesn't have to be tree-like (a dimension can can multiple parents).

I want to get a particular dimension (let's say with dimension_type_name "a"), and all the descendent dimensions with dimension_type_name "b", in a json structure similar to this:

[{
	// this is a dimension of type "a"
	"id": 1, "text": "stuff",
	// these should all be dimensions of type "b"
	"attributes": [{
		"id": 2, "text": "other"
	}],
	// ...
}]

This gets me close:

"/dimension?
	select=*,
	relationships:dimension_relationship.ancestor_id(
		attribute:dimension.descendent_id(*)
	)
	&dimension_type_name=eq.a
	&relationships.attribute.dimension_type_name=eq.b"

But there's an extra level of nesting that I don't want, and a bunch of null attributes:

[{
	"id": 32,
	"dimension_type_name": "a",
	"text": "tmgxledhjxjlgqh",
	"relationships": [
		{
			"attribute": {
				"id": 76,
				"dimension_type_name": "b",
				"text": "didpwfqkgxyfpkr"
			}
		},
		{
			"attribute": null
		},
		{
			"attribute": null
		}
	]
},
// ...
]

Is something akin to this supported?

"/dimension?
	select=*,
	attributes:dimension_relationship.ancestor_id.dimension.descendent_id(*)
	&dimension_type_name=eq.a
	&attributes.dimension_type_name=eq.b"

Here's a sql schema you can use to quickly get running with this structure:

create function random_between(low int, high int) returns int as $$
begin
	return floor(random() * (high - low + 1) + low);
end;
$$ language plpgsql strict;

create function random_text() returns text as $$
	select array_to_string(array(select chr((97 + round(random() * 25)) :: integer)
	from generate_series(1, 15)), '');
$$ language sql strict;

create table dimension_type (
	"name" text not null unique
);


create table dimension (
	id serial primary key,
	dimension_type_name text references dimension_type("name") not null,

	"text" text not null
);


create table dimension_relationship (
	ancestor_id int references dimension(id) not null,
	descendent_id int references dimension(id) not null,
	check (ancestor_id != descendent_id),
	primary key (ancestor_id, descendent_id)
);

create unique index idx_symmetric_unique_dimension_relationship on dimension_relationship
(greatest(ancestor_id, descendent_id), least(ancestor_id, descendent_id));


insert into dimension_type ("name") values
('a'), ('b'), ('c'), ('d'), ('e'), ('f'), ('g'), ('h'), ('i'), ('j'), ('k'), ('l'), ('m'), ('n'), ('o');

insert into dimension (dimension_type_name, "text")
select dimension_type_name, "text"
from (
	select
	generate_series(1, 100) as nums,
	(ARRAY['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o'])[floor(random()*15)+1] as dimension_type_name,
	random_text() as "text"
) vals;

insert into dimension_relationship (ancestor_id, descendent_id)
select ancestor_id, descendent_id
from (select generate_series(1, 300) as nums, random_between(1, 50) as ancestor_id, random_between(51, 100) as descendent_id) vals
on conflict do nothing;
@steve-chavez
Copy link
Member

@blainehansen #1075(pending enhancement) would clear the {"attribute": null} results.

Doing embeds like attributes:dimension_relationship.ancestor_id.dimension.descendent_id(*) is not supported.

@blainehansen
Copy link
Author

Is there any intention of a feature like this? Or should I just close?

Thank you 😄

@steve-chavez
Copy link
Member

Not with "multiple hops" but I think we can offer "flattening" by using PostgreSQL json arrow operators, in your example this would look like:

"/dimension?
	select=*,
	relationships:dimension_relationship.ancestor_id(
		attribute:dimension.descendent_id(*)
	)->attribute
	&dimension_type_name=eq.a
	&relationships.attribute.dimension_type_name=eq.b"

@steve-chavez
Copy link
Member

steve-chavez commented Mar 5, 2019

Using some our test suite fixtures(clients-has many-projects-has many-tasks), suppose you want to get all tasks with their projects and only the client name, you could do this:

GET /tasks?select=name,projects(*,client_name:clients(*)->name)
 [{"name":"Design w7","projects":{"id":1,"name":"Windows 7","client_name":"Microsoft"}}, 
  {"name":"Code w7","projects":{"id":1,"name":"Windows 7","client_name":"Microsoft"}},   
  {"name":"Design w10","projects":{"id":2,"name":"Windows 10","client_name":"Microsoft"}}
  {"name":"Code w10","projects":{"id":2,"name":"Windows 10","client_name":"Microsoft"}}, 
  {"name":"Design IOS","projects":{"id":3,"name":"IOS","client_name":"Apple"}},          
  {"name":"Code IOS","projects":{"id":3,"name":"IOS","client_name":"Apple"}},            
  {"name":"Design OSX","projects":{"id":4,"name":"OSX","client_name":"Apple"}},          
  {"name":"Code OSX","projects":{"id":4,"name":"OSX","client_name":"Apple"}}]            

This would generate the following query:

with pg_source as (
  select 
    "test"."tasks"."name", 
    row_to_json("projects_projects".*) as "projects" 
  from "test"."tasks"  
  left join lateral( 
    select 
      "test"."projects"."id", 
      "test"."projects"."name", 
      row_to_json("clients_clients".*)->'name' as "client_name" 
    from "test"."projects"  
    left join lateral( 
      select 
        "test"."clients"."id", 
        "test"."clients"."name" 
      from "test"."clients"  
      where "test"."clients"."id" = "test"."projects"."client_id"   ) as "clients_clients" on true  
    where "test"."projects"."id" = "test"."tasks"."project_id"   ) as "projects_projects" on true    ) 
select coalesce(json_agg(_postgrest_t), '[]')::character varying as body
from ( select * from pg_source) _postgrest_t;

I think this would be useful and we'd only have to allow the -> after the embeds.

@blainehansen Thanks for the report, this feature will be in the plans.

@steve-chavez steve-chavez added the enhancement a feature, ready for implementation label Mar 5, 2019
@steve-chavez steve-chavez changed the title Multiple "hops" for embed. Multiple "hops" for embed(Unnest embeds) Mar 5, 2019
@steve-chavez steve-chavez changed the title Multiple "hops" for embed(Unnest embeds) Multiple "hops" for embed(Unnest/flatten embeds) Oct 15, 2019
@steve-chavez steve-chavez added the embedding resource embedding label Jan 24, 2020
@steve-chavez
Copy link
Member

steve-chavez commented May 2, 2022

Another option for syntax is using the JS spread operator.

So instead of:

GET /tasks?select=name,projects(,client_name:clients()->name)

Use:

GET /tasks?select=name,projects(*,...clients(client_name:name))

With this the full object can get flattened as well:

GET /tasks?select=name,projects(*,...clients(*))
[{"name":"Design w7","projects":{"id":1,"name":"Windows 7","id":1,"name":"Microsoft"}},   
 {"name":"Code w7","projects":{"id":1,"name":"Windows 7","id":1,"name":"Microsoft"}},     
 {"name":"Design w10","projects":{"id":2,"name":"Windows 10","id":1,"name":"Microsoft"}}, 
 {"name":"Code w10","projects":{"id":2,"name":"Windows 10","id":1,"name":"Microsoft"}},   
 {"name":"Design IOS","projects":{"id":3,"name":"IOS","id":2,"name":"Apple"}},            
 {"name":"Code IOS","projects":{"id":3,"name":"IOS","id":2,"name":"Apple"}},              
 {"name":"Design OSX","projects":{"id":4,"name":"OSX","id":2,"name":"Apple"}},            
 {"name":"Code OSX","projects":{"id":4,"name":"OSX","id":2,"name":"Apple"}}]

This only seems to make sense for many-to-one and one-to-one relationships.

@wolfgangwalther
Copy link
Member

Yes, I like that, I suggested that a while back, too.

@steve-chavez
Copy link
Member

Now that we have some logic to enforce the to-one relationship(on #2511), this should be just a matter of defining the query. Which I think should be simple like this:

-- Go from 
WITH pgrst_source AS (
  SELECT
    "test"."projects".*,
    row_to_json("projects_clients_1".*) AS "clients"
  FROM "test"."projects"
  LEFT JOIN LATERAL (
    SELECT
      "clients_1".id as client_id,
      "clients_1".name as client_name
    FROM "test"."clients" AS "clients_1"
    WHERE "clients_1"."id" = "test"."projects"."client_id"
  ) AS "projects_clients_1" ON TRUE
)
SELECT
  coalesce(json_agg(_postgrest_t), '[]')::character varying AS body
FROM ( SELECT * FROM pgrst_source ) _postgrest_t;

                                               body
---------------------------------------------------------------------------------------------------
 [{"id":1,"name":"Windows 7","client_id":1,"clients":{"client_id":1,"client_name":"Microsoft"}},  +
  {"id":2,"name":"Windows 10","client_id":1,"clients":{"client_id":1,"client_name":"Microsoft"}}, +
  {"id":3,"name":"IOS","client_id":2,"clients":{"client_id":2,"client_name":"Apple"}},            +
  {"id":4,"name":"OSX","client_id":2,"clients":{"client_id":2,"client_name":"Apple"}},            +
  {"id":5,"name":"Orphan","client_id":null,"clients":null}]

-- To
WITH pgrst_source AS (
  SELECT
    "test"."projects".*,
    "projects_clients_1".*
  FROM "test"."projects"
  LEFT JOIN LATERAL (
    SELECT
      "clients_1".name as client_id,
      "clients_1".name as client_name
    FROM "test"."clients" AS "clients_1"
    WHERE "clients_1"."id" = "test"."projects"."client_id"
  ) AS "projects_clients_1" ON TRUE
)
SELECT
  coalesce(json_agg(_postgrest_t), '[]')::character varying AS body
FROM ( SELECT * FROM pgrst_source ) _postgrest_t;

                                              body
-------------------------------------------------------------------------------------------------
 [{"id":1,"name":"Windows 7","client_id":1,"client_id":"Microsoft","client_name":"Microsoft"},  +
  {"id":2,"name":"Windows 10","client_id":1,"client_id":"Microsoft","client_name":"Microsoft"}, +
  {"id":3,"name":"IOS","client_id":2,"client_id":"Apple","client_name":"Apple"},                +
  {"id":4,"name":"OSX","client_id":2,"client_id":"Apple","client_name":"Apple"},                +
  {"id":5,"name":"Orphan","client_id":null,"client_id":null,"client_name":null}]

It boils down to removing the row_to_json.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
embedding resource embedding enhancement a feature, ready for implementation
Development

Successfully merging a pull request may close this issue.

3 participants