organism_analysis linker table #59

bradfordcondon · 2018-01-29T15:18:51Z

I think there was some request for an organism_analysis linker table in the schema? We plan on creating one at HWG anyway.

-- ================================================
-- TABLE: organism_analysis
-- ================================================

create table organism_analysis (
       organism_analysis_id bigserial not null,
       primary key (organism_analysis_id),
       organism_id bigint not null,
       foreign key (organism_id) references organism (organism_id) on delete cascade INITIALLY DEFERRED,
       analysis_id bigint not null,
       foreign key (analysis_id) references analysis (analysis_id) on delete cascade INITIALLY DEFERRED,
       constraint organism_analysis_c1 unique (organism_id,analysis_id)
);
create index organism_analysis_idx1 on organism_analysis (organism_id);
create index organism_analysis_idx2 on organism_analysis (analysis_id);

COMMENT ON TABLE organism_analysis IS 'Linking table for associating organisms and analyses.';

The text was updated successfully, but these errors were encountered:

ekcannon · 2018-01-30T13:03:45Z

It's difficult to say no to a linker table - the more the merrier - but I am curious how this would be used.

bradfordcondon · 2018-01-30T15:47:26Z

We (meaning HWG) typically want to list the analyses performed on an organism on the respective organism's page. this let's users visit that analysis to download the data or the associated entities (features, biosamples, what have you).

Edit here is an example organism. The Transcriptome and Analysis tabs make use of the proposed linker field to display all analyses associated with this organism.

laceysanderson · 2018-01-30T17:55:21Z

I would worry a bit about this causing inconsistencies in the database. For example, what if your transcripts for a given analysis are for species X and the linker table points to species Y? I think this might be better implemented as a materialized view that pulls through the analysis -> feature -> organism connection. From an application standpoint this ends up being the same thing but from the database level it will help you keep consistency. Would that work for HWG @bradfordcondon?

spficklin · 2018-01-30T18:47:21Z

If I remeber correctly this table came up as a suggestion during Chado v1.3 discussions and it was shot down because it breaks the highly normalized structure of Chado for the reason @laceysanderson specified.

I know of some Tripal sites that use this table but they add it as a custom table. But the MView is probably a better idea.

ekcannon · 2018-01-30T19:01:24Z

I have the same concerns as @laceysanderson and @spficklin, but analyses can also be attached to non-features, like phylotree records (which would typically be multi-species so an organism_analysis record may not make sense). Still, it seems it should be possible to create a view that identifies all analyses associated with an organism without adding this table.

bradfordcondon · 2018-01-30T20:31:14Z

@laceysanderson we originally had an mview that did that, but i have analyses that are not associated via features. Then, adding things to that view manually would be lost if the mview was repopulated. (not to mention writing a field that inserted into the mview broke because it didnt have a primary key...)

ekcannon · 2018-01-30T20:48:12Z

@bradfordcondon are these non-feature-related analyses connected to organisms by any other means (however circuitous), or only via organism_analysis? If not, what is an example?

bradfordcondon · 2018-01-30T21:06:57Z

Great question @ekcannon . Both, I'd say.

Transcriptomes or genomes that have no data stored in Chado. (IE an archival analysis where we only offer a zip of the assembly to download)
Analysis Expression - One for the expression data. Possibly one for the biomaterials as well. Both could be linked to features, via the MAGE Module...
Libaries: On our site, libraries are mostly archival, so no attached chado data. If we used the module properly, we could attach it via library_feature....
other analysis types (blast_analysis, interproscan analysis): Not sure about this one, because i dont know that the annotations keep ties to the analysis? I'd have to check for this one.

I suppose the feature-less analyses could have a single dummy feature loaded to associate them, if i, say, unpublished that entity...

spficklin · 2018-01-30T23:31:56Z

Those are valid use cases. I can see now the need for the table. There can be other analyses that don't generate genomic features. I'm not sure what to do in this case. I would need someone much smarter than I am on highly normalized dbs to weigh in.

IMO I would suggest not to store dummy features...

bradfordcondon · 2018-01-31T02:33:14Z

Is there any sane way to have both the linker table and the mview, and have the mview update the linker table without overwritting non-feature related entries? I recognize the importance of avoiding inconsistencies in the database.

laceysanderson · 2018-01-31T02:52:32Z

The only way I can think of to ensure database integrity is to have a trigger on analysisfeature that ensures if there is a record in organism_analysis then the organism matches the current feature being inserted/updated. However, the performance hit during loading could be substantial since it would run a query per feature :-(

bradfordcondon · 2018-02-05T21:45:48Z

ensures if there is a record in organism_analysis then the organism matches the current feature being inserted/updated

but what if you have multiple organisms associated with a single analysis?

FYI I am going ahead with the linker table on HWG but I won't necessarily expect it to be part of Chado.

bradfordcondon · 2018-05-04T17:15:46Z

If Tripal users are looking for this, here is the tripal 3 module that defines the linker field to link stuff via organism_analysis.

https://github.com/statonlab/tripal_manage_analyses

laceysanderson · 2018-05-04T19:34:58Z

Good point on the multiple organisms. Perhaps the trigger could check if the organism for the current feature being inserted is in the organism_analysis table and if not, adds it?

bradfordcondon · 2018-05-04T19:38:34Z

I imagine that how Tripal ensures the integrity is an issue for my linker module itself, so I've moved it there: statonlab/tripal_manage_analyses#32.

I dont think there's a way to force the integrity at the Chado level?

laceysanderson · 2018-05-04T19:42:11Z

You can do it at the Chado level using triggers: https://www.postgresql.org/docs/9.4/static/rules-triggers.html This can be more reliable then ensuring integrity at the application level.

bradfordcondon · 2018-05-04T19:42:44Z

Very cool thank you

bradfordcondon · 2018-05-09T14:33:28Z

This is still pseudocode but something like this?

Are there cases where an analysis would have features from multiple organisms? If so I'd need to check that first...

rule for insert is much simpler...

CREATE RULE organism_analysis_update AS ON UPDATE TO feature.organism_id
    DO UPDATE(organism_id), (NEW.organism_id) FROM organism_analysis WHERE organism_id = OLD.organism_id AND analysis_id = 
    (SELECT analysis_id FROM analysis A
    INNER JOIN analysisfeature AF ON A.analysis_id = AF.analysis_id
     INNER JOIN feature F ON F.feature_id = AF.feature_id
     INNER JOIN organism O ON O.organism_id = F.organism_id
    WHERE F.feature_id = NEW.feature_id);

puzzling it out here, also tried a trigger which seems more complicated... https://gist.github.com/bradfordcondon/366c3ca47e93d78badcaba2469885fd9

laceysanderson · 2018-05-09T17:26:27Z

It looks like the best practices recommendation is to use triggers instead of rules "unless you really know what you are doing or the performance hit from them is unacceptable, as rules can work in unexpected ways." https://stackoverflow.com/questions/5894142/what-are-postgresql-rules-good-for

Ideally you would support cases where an analysis would have features from multiple organisms since that's supported by the database. For example, someone could have aligned an EST set against the current genome where the ESTs are from multiple organisms for comparative analysis.

I looked quickly at the trigger in the gist you linked to. It's looking good so far :-) You would select the analysis_id for the feature using

SELECT analysis_id FROM analysis A
    INNER JOIN analysisfeature AF ON A.analysis_id = AF.analysis_id
     INNER JOIN feature F ON F.feature_id = AF.feature_id
     INNER JOIN organism O ON O.organism_id = F.organism_id
WHERE F.feature_id = OLD.feature_id

scottcain · 2018-12-04T01:16:38Z

@bradfordcondon what's the status of this? I'm not convinced it should go into 1.4, but if trigger is working well, we can at least consider it.

bradfordcondon · 2018-12-04T01:27:08Z

the status is that people are using the tripal module to link organisms to analysis, but i never got around to figuring out how to implement lacey's trigger suggestion :(

bradfordcondon added the Chado 1.4 Suggestion label Jan 29, 2018

bradfordcondon mentioned this issue May 4, 2018

database integrity: feature-designed analysis needs to be taken into account. statonlab/tripal_manage_analyses#32

Open

colthom mentioned this issue Jun 16, 2019

Traceability of the source analysis or project #109

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

organism_analysis linker table #59

organism_analysis linker table #59

bradfordcondon commented Jan 29, 2018

ekcannon commented Jan 30, 2018

bradfordcondon commented Jan 30, 2018 •

edited

Loading

laceysanderson commented Jan 30, 2018

spficklin commented Jan 30, 2018 •

edited

Loading

ekcannon commented Jan 30, 2018

bradfordcondon commented Jan 30, 2018 •

edited

Loading

ekcannon commented Jan 30, 2018

bradfordcondon commented Jan 30, 2018 •

edited

Loading

spficklin commented Jan 30, 2018

bradfordcondon commented Jan 31, 2018

laceysanderson commented Jan 31, 2018

bradfordcondon commented Feb 5, 2018

bradfordcondon commented May 4, 2018

laceysanderson commented May 4, 2018

bradfordcondon commented May 4, 2018 •

edited

Loading

laceysanderson commented May 4, 2018

bradfordcondon commented May 4, 2018

bradfordcondon commented May 9, 2018 •

edited

Loading

laceysanderson commented May 9, 2018

scottcain commented Dec 4, 2018

bradfordcondon commented Dec 4, 2018

organism_analysis linker table #59

organism_analysis linker table #59

Comments

bradfordcondon commented Jan 29, 2018

ekcannon commented Jan 30, 2018

bradfordcondon commented Jan 30, 2018 • edited Loading

laceysanderson commented Jan 30, 2018

spficklin commented Jan 30, 2018 • edited Loading

ekcannon commented Jan 30, 2018

bradfordcondon commented Jan 30, 2018 • edited Loading

ekcannon commented Jan 30, 2018

bradfordcondon commented Jan 30, 2018 • edited Loading

spficklin commented Jan 30, 2018

bradfordcondon commented Jan 31, 2018

laceysanderson commented Jan 31, 2018

bradfordcondon commented Feb 5, 2018

bradfordcondon commented May 4, 2018

laceysanderson commented May 4, 2018

bradfordcondon commented May 4, 2018 • edited Loading

laceysanderson commented May 4, 2018

bradfordcondon commented May 4, 2018

bradfordcondon commented May 9, 2018 • edited Loading

laceysanderson commented May 9, 2018

scottcain commented Dec 4, 2018

bradfordcondon commented Dec 4, 2018

bradfordcondon commented Jan 30, 2018 •

edited

Loading

spficklin commented Jan 30, 2018 •

edited

Loading

bradfordcondon commented Jan 30, 2018 •

edited

Loading

bradfordcondon commented Jan 30, 2018 •

edited

Loading

bradfordcondon commented May 4, 2018 •

edited

Loading

bradfordcondon commented May 9, 2018 •

edited

Loading