Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/59184 order by stages and gates on project list #17530

Open
wants to merge 20 commits into
base: dev
Choose a base branch
from

Conversation

EinLama
Copy link
Contributor

@EinLama EinLama commented Jan 3, 2025

Ticket

https://community.openproject.org/wp/59184

What are you trying to accomplish?

Screenshots

order1
order2
order3
order4
order5
order6

What approach did you choose and why?

Since projects are related to the LifeCycleSteps table, which is in turn reliant on the LifeCycleStepDefinition table - which is the attribute we are sorting on, I looked for a way to query the relevant rows without having to do a lot of aggregation and group by magic.

I found two approaches that work and seem to be performant. The first used a CTE, the second a subquery. In both cases, I join the projects table on a subset of life cycle step definitions. This gets rid of obsolete rows while not requiring further aggregation.

Both approaches seem to have a similar performance impact. I found the subquery more readable, so I chose that.

There is a caveat: since we can order by multiple columns, we must ensure that our subquery identifier (-> SELECT * FROM identifier) is not used twice. I added the definition id to that name to make it unique per request.

Merge checklist

  • Added/updated tests
  • Added/updated documentation in Lookbook (patterns, previews, etc)
  • Tested major browsers (Chrome, Firefox, Edge, ...)

@EinLama EinLama force-pushed the feature/59184-order-by-stages-and-gates-on-project-list branch 2 times, most recently from f551cb5 to 73a8642 Compare January 6, 2025 08:44
@EinLama EinLama force-pushed the feature/59184-order-by-stages-and-gates-on-project-list branch from eb18a6e to b161d9a Compare January 6, 2025 13:59
@EinLama EinLama force-pushed the feature/59184-order-by-stages-and-gates-on-project-list branch 3 times, most recently from e972026 to a6ee983 Compare January 7, 2025 10:01
@EinLama EinLama marked this pull request as ready for review January 7, 2025 12:55
Copy link
Contributor

@ulferts ulferts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like where this is heading, @EinLama. The code structure is good and at least on the limited data I have, performance looks good and the results are also as expected.

There is however a permission check that needs to be added when ordering. This might still require some work to get in. The rest is just small stuff.

WHERE
steps.active = true
AND def.id = #{life_cycle_step_definition.id}
) #{subquery_table_name} ON #{subquery_table_name}.project_id = projects.id
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that the JOIN with the definitions is necessary. You could write

SELECT
			STEPS.*,
			STEPS.DEFINITION_ID AS DEF_ID
		FROM
			PROJECT_LIFE_CYCLE_STEPS STEPS
		WHERE
			STEPS.ACTIVE = TRUE
			AND STEPS.DEFINITION_ID = 3

and have the same results.

But what is necessary to add is the check for the view_project_stages_and_gates permission in each project. Otherwise, given two projects in which the user has the permission in one project and lacks it in the other, both end up being sorted on the value. But the user should not receive any info on that value so when sorting, it needs to be treated as NULL. The easiest way of doing that is to not join the value in such a case.

In the screenshot below, the user does not have the permission in the first permission, has it in the second while in the third project, the value is not set.

image

The Project.allowed_to method might come in handy here. Working in a call to Project.allowed_to(User.current, :view_project_stages_and_gates).to_sql might do the trick.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for noticing this! Good point about simplifying the query, I started with the definitions and never questioned my initial assumption after that.

About the permission: I have added an implementation for filtering out rows that do not match the new criteria. But I'm wondering whether there is a better way to do so. Please have a look and let me know what you think 🕵🏻

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I experimented around with it and measured the time on my data set. It does have a larger number of projects but unfortunately lacks a substantial number of portfolio elements so the measurement might be flawed. Overall I found very little impact, with the timing when sorting by 3 lifecycle elements always being around 20ms.

However, I still think that the better way of dealing with this query is in the following way:

# SQL

      LEFT JOIN (
              SELECT steps.*, steps.definition_id as def_id
              FROM project_life_cycle_steps steps
              WHERE
                steps.active = true
                AND steps.definition_id = :definition_id
                AND steps.project_id IN (
                  #{viewable_project_ids.to_sql}
                )
            ) #{subquery_table_name} ON #{subquery_table_name}.project_id = projects.id

# ...
# Ensure that only life cycle columns viewable to the current user are considered
# for ordering the query result.
def viewable_project_ids
  Project.allowed_to(User.current, :view_project_stages_and_gates).select(:id)
end

This avoids fetching the project ids (as done by the pluck before) once for each column sorted by. Granted, this will be cached by the SQL cache but it is still an overhead. On top of that, I find that easier to read. I benchmarked the two approaches and on my data the approaches have the following runtime when issuing:

Benchmark.bm do |x|
  x.report { 1000.times { query_ordered_by_three_lifecycle_columns.results.to_a.count } }
end

The PR's approach needed

       user     system      total        real
26.923800   1.216307  28.140107 ( 56.359955)

Whereas the changed one takes just

       user     system      total        real
19.154100   0.573853  19.727953 ( 32.417628)

This was done within the rails console so the PR's times are worsened by the fact that the query cache wasn't used. But to me the results are good enough to suggest that it at least not worse to have the subquery executed together with the rest.

I also tried to move the viewable_project_ids to a CTE to avoid issuing the same subquery thrice but that did not lead to measurable changes so I don't think this is necessary at this point in time.

end

def available?
life_cycle_step_definition.present?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strictly speaking, the check for view_project_stages_and_gates should also be in here. The code should check if the user has that permission in any project.

The behaviour is as expected because there is an imprecision in the way the selectable orders work.

The Projects::ConfigureViewModalComponent passes the selectable_columns to the Queries::SortByComponent. Within that component, the columns are then mapped to order statements without checking their available? method again. In short, this only works because the conditions for the columns are currently the same as for ordering. But this might change.

Even now, while the option is not available in the UI, it is possible to craft a URL request in a way to get the order to be applied even if the user does not have the permission. Via the API, it is possible as well. Mind you, it will not be a problem any more when the permissions are checked on the values but ideally, the user would get an error message in this case which requires to have a check for this permission here as well.

I wouldn't want to include the rework of the Modal and SortBy component in this PR but rather open a new ticket/PR to address this. In here then, one would only need to add the permission check. Do you agree?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, good catch! I agree that this might come up in the future. So I will add the permission check here and we'll have to revisit sorting and the Modal in a separate task.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added the permission check along with a spec for it.

@EinLama EinLama force-pushed the feature/59184-order-by-stages-and-gates-on-project-list branch from db349fb to c8555d9 Compare January 9, 2025 16:37
Using a named subquery will break as those have to be unique.
The same issue applies to CTEs - they need a unique name per query.
To solve this, I have allowed queries to use CTEs and define their name.
The name will be derived from the definition id, which is unique per
query. Therefore, you can now order by multiple life cycle definitions
at once.
There is no need to filter for available definitions before ordering,
this also gets rid of a Brakeman warning.
It is supposed to be a bit more performant for a case like this, too.
@EinLama EinLama force-pushed the feature/59184-order-by-stages-and-gates-on-project-list branch from d559d92 to e5eee68 Compare January 10, 2025 10:43
Copy link
Contributor

@ulferts ulferts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the query can be simplified which admittedly only slightly should also improve the performance.

WHERE
steps.active = true
AND def.id = #{life_cycle_step_definition.id}
) #{subquery_table_name} ON #{subquery_table_name}.project_id = projects.id
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I experimented around with it and measured the time on my data set. It does have a larger number of projects but unfortunately lacks a substantial number of portfolio elements so the measurement might be flawed. Overall I found very little impact, with the timing when sorting by 3 lifecycle elements always being around 20ms.

However, I still think that the better way of dealing with this query is in the following way:

# SQL

      LEFT JOIN (
              SELECT steps.*, steps.definition_id as def_id
              FROM project_life_cycle_steps steps
              WHERE
                steps.active = true
                AND steps.definition_id = :definition_id
                AND steps.project_id IN (
                  #{viewable_project_ids.to_sql}
                )
            ) #{subquery_table_name} ON #{subquery_table_name}.project_id = projects.id

# ...
# Ensure that only life cycle columns viewable to the current user are considered
# for ordering the query result.
def viewable_project_ids
  Project.allowed_to(User.current, :view_project_stages_and_gates).select(:id)
end

This avoids fetching the project ids (as done by the pluck before) once for each column sorted by. Granted, this will be cached by the SQL cache but it is still an overhead. On top of that, I find that easier to read. I benchmarked the two approaches and on my data the approaches have the following runtime when issuing:

Benchmark.bm do |x|
  x.report { 1000.times { query_ordered_by_three_lifecycle_columns.results.to_a.count } }
end

The PR's approach needed

       user     system      total        real
26.923800   1.216307  28.140107 ( 56.359955)

Whereas the changed one takes just

       user     system      total        real
19.154100   0.573853  19.727953 ( 32.417628)

This was done within the rails console so the PR's times are worsened by the fact that the query cache wasn't used. But to me the results are good enough to suggest that it at least not worse to have the subquery executed together with the rest.

I also tried to move the viewable_project_ids to a CTE to avoid issuing the same subquery thrice but that did not lead to measurable changes so I don't think this is necessary at this point in time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

2 participants