-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/59184 order by stages and gates on project list #17530
base: dev
Are you sure you want to change the base?
Feature/59184 order by stages and gates on project list #17530
Conversation
f551cb5
to
73a8642
Compare
eb18a6e
to
b161d9a
Compare
e972026
to
a6ee983
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like where this is heading, @EinLama. The code structure is good and at least on the limited data I have, performance looks good and the results are also as expected.
There is however a permission check that needs to be added when ordering. This might still require some work to get in. The rest is just small stuff.
WHERE | ||
steps.active = true | ||
AND def.id = #{life_cycle_step_definition.id} | ||
) #{subquery_table_name} ON #{subquery_table_name}.project_id = projects.id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that the JOIN
with the definitions is necessary. You could write
SELECT
STEPS.*,
STEPS.DEFINITION_ID AS DEF_ID
FROM
PROJECT_LIFE_CYCLE_STEPS STEPS
WHERE
STEPS.ACTIVE = TRUE
AND STEPS.DEFINITION_ID = 3
and have the same results.
But what is necessary to add is the check for the view_project_stages_and_gates
permission in each project. Otherwise, given two projects in which the user has the permission in one project and lacks it in the other, both end up being sorted on the value. But the user should not receive any info on that value so when sorting, it needs to be treated as NULL. The easiest way of doing that is to not join the value in such a case.
In the screenshot below, the user does not have the permission in the first permission, has it in the second while in the third project, the value is not set.
The Project.allowed_to
method might come in handy here. Working in a call to Project.allowed_to(User.current, :view_project_stages_and_gates).to_sql
might do the trick.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for noticing this! Good point about simplifying the query, I started with the definitions and never questioned my initial assumption after that.
About the permission: I have added an implementation for filtering out rows that do not match the new criteria. But I'm wondering whether there is a better way to do so. Please have a look and let me know what you think 🕵🏻
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I experimented around with it and measured the time on my data set. It does have a larger number of projects but unfortunately lacks a substantial number of portfolio elements so the measurement might be flawed. Overall I found very little impact, with the timing when sorting by 3 lifecycle elements always being around 20ms.
However, I still think that the better way of dealing with this query is in the following way:
# SQL
LEFT JOIN (
SELECT steps.*, steps.definition_id as def_id
FROM project_life_cycle_steps steps
WHERE
steps.active = true
AND steps.definition_id = :definition_id
AND steps.project_id IN (
#{viewable_project_ids.to_sql}
)
) #{subquery_table_name} ON #{subquery_table_name}.project_id = projects.id
# ...
# Ensure that only life cycle columns viewable to the current user are considered
# for ordering the query result.
def viewable_project_ids
Project.allowed_to(User.current, :view_project_stages_and_gates).select(:id)
end
This avoids fetching the project ids (as done by the pluck before) once for each column sorted by. Granted, this will be cached by the SQL cache but it is still an overhead. On top of that, I find that easier to read. I benchmarked the two approaches and on my data the approaches have the following runtime when issuing:
Benchmark.bm do |x|
x.report { 1000.times { query_ordered_by_three_lifecycle_columns.results.to_a.count } }
end
The PR's approach needed
user system total real
26.923800 1.216307 28.140107 ( 56.359955)
Whereas the changed one takes just
user system total real
19.154100 0.573853 19.727953 ( 32.417628)
This was done within the rails console so the PR's times are worsened by the fact that the query cache wasn't used. But to me the results are good enough to suggest that it at least not worse to have the subquery executed together with the rest.
I also tried to move the viewable_project_ids to a CTE to avoid issuing the same subquery thrice but that did not lead to measurable changes so I don't think this is necessary at this point in time.
end | ||
|
||
def available? | ||
life_cycle_step_definition.present? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Strictly speaking, the check for view_project_stages_and_gates
should also be in here. The code should check if the user has that permission in any project.
The behaviour is as expected because there is an imprecision in the way the selectable orders work.
The Projects::ConfigureViewModalComponent
passes the selectable_columns
to the Queries::SortByComponent
. Within that component, the columns are then mapped to order statements without checking their available?
method again. In short, this only works because the conditions for the columns are currently the same as for ordering. But this might change.
Even now, while the option is not available in the UI, it is possible to craft a URL request in a way to get the order to be applied even if the user does not have the permission. Via the API, it is possible as well. Mind you, it will not be a problem any more when the permissions are checked on the values but ideally, the user would get an error message in this case which requires to have a check for this permission here as well.
I wouldn't want to include the rework of the Modal and SortBy component in this PR but rather open a new ticket/PR to address this. In here then, one would only need to add the permission check. Do you agree?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, good catch! I agree that this might come up in the future. So I will add the permission check here and we'll have to revisit sorting and the Modal in a separate task.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added the permission check along with a spec for it.
db349fb
to
c8555d9
Compare
Using a named subquery will break as those have to be unique. The same issue applies to CTEs - they need a unique name per query. To solve this, I have allowed queries to use CTEs and define their name. The name will be derived from the definition id, which is unique per query. Therefore, you can now order by multiple life cycle definitions at once.
There is no need to filter for available definitions before ordering, this also gets rid of a Brakeman warning.
It is supposed to be a bit more performant for a case like this, too.
This is so much better
d559d92
to
e5eee68
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the query can be simplified which admittedly only slightly should also improve the performance.
WHERE | ||
steps.active = true | ||
AND def.id = #{life_cycle_step_definition.id} | ||
) #{subquery_table_name} ON #{subquery_table_name}.project_id = projects.id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I experimented around with it and measured the time on my data set. It does have a larger number of projects but unfortunately lacks a substantial number of portfolio elements so the measurement might be flawed. Overall I found very little impact, with the timing when sorting by 3 lifecycle elements always being around 20ms.
However, I still think that the better way of dealing with this query is in the following way:
# SQL
LEFT JOIN (
SELECT steps.*, steps.definition_id as def_id
FROM project_life_cycle_steps steps
WHERE
steps.active = true
AND steps.definition_id = :definition_id
AND steps.project_id IN (
#{viewable_project_ids.to_sql}
)
) #{subquery_table_name} ON #{subquery_table_name}.project_id = projects.id
# ...
# Ensure that only life cycle columns viewable to the current user are considered
# for ordering the query result.
def viewable_project_ids
Project.allowed_to(User.current, :view_project_stages_and_gates).select(:id)
end
This avoids fetching the project ids (as done by the pluck before) once for each column sorted by. Granted, this will be cached by the SQL cache but it is still an overhead. On top of that, I find that easier to read. I benchmarked the two approaches and on my data the approaches have the following runtime when issuing:
Benchmark.bm do |x|
x.report { 1000.times { query_ordered_by_three_lifecycle_columns.results.to_a.count } }
end
The PR's approach needed
user system total real
26.923800 1.216307 28.140107 ( 56.359955)
Whereas the changed one takes just
user system total real
19.154100 0.573853 19.727953 ( 32.417628)
This was done within the rails console so the PR's times are worsened by the fact that the query cache wasn't used. But to me the results are good enough to suggest that it at least not worse to have the subquery executed together with the rest.
I also tried to move the viewable_project_ids to a CTE to avoid issuing the same subquery thrice but that did not lead to measurable changes so I don't think this is necessary at this point in time.
Ticket
https://community.openproject.org/wp/59184
What are you trying to accomplish?
Screenshots
What approach did you choose and why?
Since projects are related to the LifeCycleSteps table, which is in turn reliant on the LifeCycleStepDefinition table - which is the attribute we are sorting on, I looked for a way to query the relevant rows without having to do a lot of aggregation and group by magic.
I found two approaches that work and seem to be performant. The first used a CTE, the second a subquery. In both cases, I join the projects table on a subset of life cycle step definitions. This gets rid of obsolete rows while not requiring further aggregation.
Both approaches seem to have a similar performance impact. I found the subquery more readable, so I chose that.
There is a caveat: since we can order by multiple columns, we must ensure that our subquery identifier (-> SELECT * FROM identifier) is not used twice. I added the definition id to that name to make it unique per request.
Merge checklist
Added/updated documentation in Lookbook (patterns, previews, etc)