-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Build the “composite“ life stages ontology directly in Uberon #3443
Conversation
The SSLSO (species-specific life stage ontology) is now a fairly normal ODK-managed ontology that we can "import" (talking about our "local imports" here, not imports in a ODK sense -- those are the imports that are used to build Composite Metazoan) without any special treatment.
The SSLSO project provides its own mapping set, so we just need to fetch it, then we can generate the bridge at the same time as all the other bridges. We do _not_ generate a distinct bridge for all the species present in SSLSO, and we will not do that until/unless there is an explicit demand for it. All bridging axioms to SSLSO terms are in a single bridge, except for HsapDv and MmusDv terms (we need MmusDv as a separate bridge to construct composite-mouse; there is no real reason I can think of to have a separate HsapDv bridge, but we always had it, so I can already hear people screaming if I dare remove it.)
Add a new product coming out of the Composite pipeline: "composite-lifestages". This is basically the equivalent to the "ssso-merged-uberon" product that used to be produced by the "developmental stages ontology" project. The intermediate file on the way to get to "composite-lifestages", "collected-lifestages", is basically the equivalent of "ssso-merged".
QC workflow cancelled as it is bound to fail currently, since the new version of SSLSO is not publicly available yet. |
Instead of calling the uberon:merge-species command repeatedly, once for every species to merge, we call it only once, with a batch file listing all the species for which a merge is required. This removes some clutter from the Makefile, but most importantly this also makes the whole operation much faster (from ~45min down to ~7min, on my machine), because in batch mode the reasoner state is shared between all merge operations -- we don't need to create a new reasoner and have it reason over the ontology for every merge, which is what takes the most time. The reasoner is initialised once at the beginning of the first merge, and then it just needs to be kept updated for the subsequent merge, which is much faster than creating a whole new reasoner instance.
As for composite-vertebrate.owl and composite-metazoan.owl, we need a separate rule to create the composite-lifestages.owl product. The generic rule 'composite-%.owl' is not enough because the standard ODK-generated Makefile already contains a more specific rule, than can only be overriden by an equally specific rule.
Now that we generate those two additional products, we must take care that they are not inadvertently committed to the repository.
Currently, the information about which bridges to generate and how, and which species to unfold in composite-metazoan and how, is dispersed in two different places: in the bridges/bridges.rules.m4 source file to generate the bridge, and the config/tax-merges.tsv to generate composite-metazoan. This commit proposes to make those config data more manageable by moving them all to a single config/taxa.yaml file, from which we derive (using a relatively simple Python script) both the SSSOM/T rule file and the batch file that drives the compositing process. Arguably, having the SSSOM/T ruleset being generated by a Python script is more maintainer-friendly than having it generated by M4 macros, given that there are likely many more ontology engineers that can read and write Python than ontology engineers that can read and write M4 (which I believe is a shame, as M4 is a powerful and lightweight tool that can do great things when used well, but that's unfortunately beyond the point).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have not reviewed src/scripts/taxa.py, and made a few optional comments for your own pleasure; non of it is binding.
src/ontology/uberon.Makefile
Outdated
$(BRIDGEDIR)/uberon-bridge-to-wbls.owl \ | ||
$(BRIDGEDIR)/uberon-bridge-to-mmusdv.owl \ | ||
$(BRIDGEDIR)/uberon-bridge-to-hsapdv.owl \ | ||
$(BRIDGEDIR)/uberon-bridge-to-sslso.owl |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not actionable, just informative: I am noting that this seems to be missing a very important organism, Xenopus: https://www.ebi.ac.uk/ols4/ontologies/xao/classes/http%253A%252F%252Fpurl.obolibrary.org%252Fobo%252FXAO_1000000?lang=en
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Double-checking now, but it looks like XAO has indeed slipped through the cracks. It used to be provided by ssso-merged
, so now we need to bring it ourselves like we do for, e.g., FBdv and WBls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
XAO is a tricky case because, contrary to FBdv, WBls, and other similar species-specific ontologies, XAO contains both anatomical terms and life-stage terms, so we need to make a special case for this one (forcefully extracting a life-stage-only version of XAO).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
The bridge/collected-lifestages-hdr.owl file is intended to provide ontology annotations to describe the collected/composite-lifestages product, but it was not included in the initial merging step. We fix that here.
Make the scripts/taxa.py script accept a command-line argument indicating the location of the taxa.yaml file, instead of hardcoding that location at the beginning of the script. Update the invocations of that script in the Makefile accordingly.
Ensure that collected/composite-lifestages contains the life stage terms from the Xenopus Anatomy Ontology. This requires a custom step in which we extract all XAO life-stage terms (all terms under XAO:1000000, including XAO:1000000 itself) and create bridging axioms between those terms and the corresponding Uberon terms.
As I suspected, since I could not test that code while I was working on that PR (since the new, ODK-managed SSLSO had not been released yet), there were some random issues with it.
Those files would automatically be downloaded and added to the repository the next time the refresh-external-resources pipeline is run, but until this is done, the absence of those files would lead the test suite to fail (since the test suite runs under MIR=false, preventing any download).
This is the old version (pre-ODK migration) of the species-specific life stage ontology (SSLSO).
The local mirror of SSLSO already contains HsapDv and MmusDv, so there is no need to download and keep HsapDv and MmusDv separately. When we need the HsapDv and MmusDv terms, we can simply extract them the entire SSLSO mirror.
Looks great, thank you! |
I am still working on a (hopefully quick) last update: Moving the source of truth for the HsapDv and MmusDv mappings from Uberon to SSLSO, now that SSLSO provides a SSSOM mapping set. Currently those mappings come both from Uberon and from SSLSO, and there are already some discrepancies between the Uberon-maintained mappings and the SSLSO-maintained ones (with the Uberon mappings pointing to some HsapDv/MmusDv terms that have been obsoleted). Better to let SSLSO be the sole source of truth. Then it’s only a matter of updating the docs. Overall I plan for everything to be done and over in time for the next release that @aleixpuigb is planning for early next week. |
The SSLSO project is now providing mappings to Uberon in the form of a SSSOM mapping set. The xrefs maintained in Uberon are redundant with those mappings, and they seem to be less maintained (several Uberon xrefs currently point to obsoleted HsapDv or MmusDv terms). We remove those cross-references entirely, so that SSLSO becomes the sole source of truth for the mappings between Uberon and HsapDv/MmusDv/OlatDv (as it already is for all the other *Dv ontologies).
Update the documentation about bridge files and collected/composite ontologies to reflect the recent changes in how those products are managed.
42e0298
to
d57e4d5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Draft PR as this is a work that depends on:
This PR changes the way Uberon interacts with the “Species-Specific Life Stages Ontology” (SSLSO) project. Roughly, instead of relying on that project to provide us with a “pre-composited” version of the life stage ontologies, we do everything here in Uberon. All SSLSO has to do is to provide us with the mappings between their terms and the corresponding taxon-neutral terms in Uberon.
There are several reasons for such a change, the most important being that it keeps all the logic to create the “composite” ontologies in the same place, here in Uberon. Having the SSLSO perform its own compositing leads to a lot of duplicated code, a lot of unnecessary back-and-forth between Uberon and SSLSO (Uberon generating the bridges with FBdv and WBls, which are then fetched by SSLSO to produce
ssso-merged-uberon
, which is then fetched by Uberon to producecomposite-metazoan
), and a risk that the two composite pipelines (the one in SSLSO and the one in Uberon) are not kept in sync and therefore behave slightly differently (which is exactly the case currently).