-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
organism_analysis linker table #59
Comments
It's difficult to say no to a linker table - the more the merrier - but I am curious how this would be used. |
We (meaning HWG) typically want to list the analyses performed on an organism on the respective organism's page. this let's users visit that analysis to download the data or the associated entities (features, biosamples, what have you). Edit here is an example organism. The Transcriptome and Analysis tabs make use of the proposed linker field to display all analyses associated with this organism. |
I would worry a bit about this causing inconsistencies in the database. For example, what if your transcripts for a given analysis are for species X and the linker table points to species Y? I think this might be better implemented as a materialized view that pulls through the analysis -> feature -> organism connection. From an application standpoint this ends up being the same thing but from the database level it will help you keep consistency. Would that work for HWG @bradfordcondon? |
If I remeber correctly this table came up as a suggestion during Chado v1.3 discussions and it was shot down because it breaks the highly normalized structure of Chado for the reason @laceysanderson specified. I know of some Tripal sites that use this table but they add it as a custom table. But the MView is probably a better idea. |
I have the same concerns as @laceysanderson and @spficklin, but analyses can also be attached to non-features, like phylotree records (which would typically be multi-species so an organism_analysis record may not make sense). Still, it seems it should be possible to create a view that identifies all analyses associated with an organism without adding this table. |
@laceysanderson we originally had an mview that did that, but i have analyses that are not associated via features. Then, adding things to that view manually would be lost if the mview was repopulated. (not to mention writing a field that inserted into the mview broke because it didnt have a primary key...) |
@bradfordcondon are these non-feature-related analyses connected to organisms by any other means (however circuitous), or only via organism_analysis? If not, what is an example? |
Great question @ekcannon . Both, I'd say.
I suppose the feature-less analyses could have a single dummy feature loaded to associate them, if i, say, unpublished that entity... |
Those are valid use cases. I can see now the need for the table. There can be other analyses that don't generate genomic features. I'm not sure what to do in this case. I would need someone much smarter than I am on highly normalized dbs to weigh in. IMO I would suggest not to store dummy features... |
Is there any sane way to have both the linker table and the mview, and have the mview update the linker table without overwritting non-feature related entries? I recognize the importance of avoiding inconsistencies in the database. |
The only way I can think of to ensure database integrity is to have a trigger on analysisfeature that ensures if there is a record in organism_analysis then the organism matches the current feature being inserted/updated. However, the performance hit during loading could be substantial since it would run a query per feature :-( |
but what if you have multiple organisms associated with a single analysis? FYI I am going ahead with the linker table on HWG but I won't necessarily expect it to be part of Chado. |
If Tripal users are looking for this, here is the tripal 3 module that defines the linker field to link stuff via organism_analysis. |
Good point on the multiple organisms. Perhaps the trigger could check if the organism for the current feature being inserted is in the organism_analysis table and if not, adds it? |
I imagine that how Tripal ensures the integrity is an issue for my linker module itself, so I've moved it there: statonlab/tripal_manage_analyses#32. I dont think there's a way to force the integrity at the Chado level? |
You can do it at the Chado level using triggers: https://www.postgresql.org/docs/9.4/static/rules-triggers.html This can be more reliable then ensuring integrity at the application level. |
Very cool thank you |
This is still pseudocode but something like this? Are there cases where an analysis would have features from multiple organisms? If so I'd need to check that first... rule for insert is much simpler... CREATE RULE organism_analysis_update AS ON UPDATE TO feature.organism_id
DO UPDATE(organism_id), (NEW.organism_id) FROM organism_analysis WHERE organism_id = OLD.organism_id AND analysis_id =
(SELECT analysis_id FROM analysis A
INNER JOIN analysisfeature AF ON A.analysis_id = AF.analysis_id
INNER JOIN feature F ON F.feature_id = AF.feature_id
INNER JOIN organism O ON O.organism_id = F.organism_id
WHERE F.feature_id = NEW.feature_id); puzzling it out here, also tried a trigger which seems more complicated... https://gist.github.com/bradfordcondon/366c3ca47e93d78badcaba2469885fd9 |
It looks like the best practices recommendation is to use triggers instead of rules "unless you really know what you are doing or the performance hit from them is unacceptable, as rules can work in unexpected ways." https://stackoverflow.com/questions/5894142/what-are-postgresql-rules-good-for Ideally you would support cases where an analysis would have features from multiple organisms since that's supported by the database. For example, someone could have aligned an EST set against the current genome where the ESTs are from multiple organisms for comparative analysis. I looked quickly at the trigger in the gist you linked to. It's looking good so far :-) You would select the analysis_id for the feature using SELECT analysis_id FROM analysis A
INNER JOIN analysisfeature AF ON A.analysis_id = AF.analysis_id
INNER JOIN feature F ON F.feature_id = AF.feature_id
INNER JOIN organism O ON O.organism_id = F.organism_id
WHERE F.feature_id = OLD.feature_id |
@bradfordcondon what's the status of this? I'm not convinced it should go into 1.4, but if trigger is working well, we can at least consider it. |
the status is that people are using the tripal module to link organisms to analysis, but i never got around to figuring out how to implement lacey's trigger suggestion :( |
I think there was some request for an organism_analysis linker table in the schema? We plan on creating one at HWG anyway.
The text was updated successfully, but these errors were encountered: