Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build the “composite“ life stages ontology directly in Uberon #3443

Merged
merged 19 commits into from
Jan 10, 2025

Conversation

gouttegd
Copy link
Collaborator

@gouttegd gouttegd commented Dec 6, 2024

(Draft PR as this is a work that depends on:

This PR changes the way Uberon interacts with the “Species-Specific Life Stages Ontology” (SSLSO) project. Roughly, instead of relying on that project to provide us with a “pre-composited” version of the life stage ontologies, we do everything here in Uberon. All SSLSO has to do is to provide us with the mappings between their terms and the corresponding taxon-neutral terms in Uberon.

There are several reasons for such a change, the most important being that it keeps all the logic to create the “composite” ontologies in the same place, here in Uberon. Having the SSLSO perform its own compositing leads to a lot of duplicated code, a lot of unnecessary back-and-forth between Uberon and SSLSO (Uberon generating the bridges with FBdv and WBls, which are then fetched by SSLSO to produce ssso-merged-uberon, which is then fetched by Uberon to produce composite-metazoan), and a risk that the two composite pipelines (the one in SSLSO and the one in Uberon) are not kept in sync and therefore behave slightly differently (which is exactly the case currently).

The SSLSO (species-specific life stage ontology) is now a fairly normal
ODK-managed ontology that we can "import" (talking about our "local
imports" here, not imports in a ODK sense -- those are the imports that
are used to build Composite Metazoan) without any special treatment.
The SSLSO project provides its own mapping set, so we just need to fetch
it, then we can generate the bridge at the same time as all the other
bridges.

We do _not_ generate a distinct bridge for all the species present in
SSLSO, and we will not do that until/unless there is an explicit demand
for it. All bridging axioms to SSLSO terms are in a single bridge,
except for HsapDv and MmusDv terms (we need MmusDv as a separate bridge
to construct composite-mouse; there is no real reason I can think of to
have a separate HsapDv bridge, but we always had it, so I can already
hear people screaming if I dare remove it.)
Add a new product coming out of the Composite pipeline:
"composite-lifestages". This is basically the equivalent to the
"ssso-merged-uberon" product that used to be produced by the
"developmental stages ontology" project.

The intermediate file on the way to get to "composite-lifestages",
"collected-lifestages", is basically the equivalent of "ssso-merged".
@gouttegd gouttegd self-assigned this Dec 6, 2024
@gouttegd gouttegd added tech pipeline composite bridge-files Issues related to the generation of bridge files from Uberon to other ontologies. labels Dec 6, 2024
@gouttegd
Copy link
Collaborator Author

gouttegd commented Dec 6, 2024

QC workflow cancelled as it is bound to fail currently, since the new version of SSLSO is not publicly available yet.

Instead of calling the uberon:merge-species command repeatedly, once for
every species to merge, we call it only once, with a batch file listing
all the species for which a merge is required.

This removes some clutter from the Makefile, but most importantly this
also makes the whole operation much faster (from ~45min down to ~7min,
on my machine), because in batch mode the reasoner state is shared
between all merge operations -- we don't need to create a new reasoner
and have it reason over the ontology for every merge, which is what
takes the most time. The reasoner is initialised once at the beginning
of the first merge, and then it just needs to be kept updated for the
subsequent merge, which is much faster than creating a whole new
reasoner instance.
As for composite-vertebrate.owl and composite-metazoan.owl, we need a
separate rule to create the composite-lifestages.owl product. The
generic rule 'composite-%.owl' is not enough because the standard
ODK-generated Makefile already contains a more specific rule, than can
only be overriden by an equally specific rule.
Now that we generate those two additional products, we must take care
that they are not inadvertently committed to the repository.
Currently, the information about which bridges to generate and how, and
which species to unfold in composite-metazoan and how, is dispersed in
two different places: in the bridges/bridges.rules.m4 source file to
generate the bridge, and the config/tax-merges.tsv to generate
composite-metazoan.

This commit proposes to make those config data more manageable by moving
them all to a single config/taxa.yaml file, from which we derive (using
a relatively simple Python script) both the SSSOM/T rule file and the
batch file that drives the compositing process.

Arguably, having the SSSOM/T ruleset being generated by a Python script
is more maintainer-friendly than having it generated by M4 macros, given
that there are likely many more ontology engineers that can read and
write Python than ontology engineers that can read and write M4 (which I
believe is a shame, as M4 is a powerful and lightweight tool that can do
great things when used well, but that's unfortunately beyond the point).
matentzn
matentzn previously approved these changes Jan 8, 2025
Copy link
Contributor

@matentzn matentzn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have not reviewed src/scripts/taxa.py, and made a few optional comments for your own pleasure; non of it is binding.

src/ontology/bridge/collected-lifestages-hdr.owl Outdated Show resolved Hide resolved
$(BRIDGEDIR)/uberon-bridge-to-wbls.owl \
$(BRIDGEDIR)/uberon-bridge-to-mmusdv.owl \
$(BRIDGEDIR)/uberon-bridge-to-hsapdv.owl \
$(BRIDGEDIR)/uberon-bridge-to-sslso.owl
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not actionable, just informative: I am noting that this seems to be missing a very important organism, Xenopus: https://www.ebi.ac.uk/ols4/ontologies/xao/classes/http%253A%252F%252Fpurl.obolibrary.org%252Fobo%252FXAO_1000000?lang=en

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Double-checking now, but it looks like XAO has indeed slipped through the cracks. It used to be provided by ssso-merged, so now we need to bring it ourselves like we do for, e.g., FBdv and WBls.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

XAO is a tricky case because, contrary to FBdv, WBls, and other similar species-specific ontologies, XAO contains both anatomical terms and life-stage terms, so we need to make a special case for this one (forcefully extracting a life-stage-only version of XAO).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

src/ontology/uberon.Makefile Show resolved Hide resolved
src/ontology/uberon.Makefile Outdated Show resolved Hide resolved
The bridge/collected-lifestages-hdr.owl file is intended to provide
ontology annotations to describe the collected/composite-lifestages
product, but it was not included in the initial merging step. We fix
that here.
Make the scripts/taxa.py script accept a command-line argument
indicating the location of the taxa.yaml file, instead of hardcoding
that location at the beginning of the script. Update the invocations of
that script in the Makefile accordingly.
Ensure that collected/composite-lifestages contains the life stage terms
from the Xenopus Anatomy Ontology.

This requires a custom step in which we extract all XAO life-stage terms
(all terms under XAO:1000000, including XAO:1000000 itself) and create
bridging axioms between those terms and the corresponding Uberon terms.
As I suspected, since I could not test that code while I was working on
that PR (since the new, ODK-managed SSLSO had not been released yet),
there were some random issues with it.
Those files would automatically be downloaded and added to the
repository the next time the refresh-external-resources pipeline is run,
but until this is done, the absence of those files would lead the test
suite to fail (since the test suite runs under MIR=false, preventing any
download).
This is the old version (pre-ODK migration) of the species-specific life
stage ontology (SSLSO).
The local mirror of SSLSO already contains HsapDv and MmusDv, so there
is no need to download and keep HsapDv and MmusDv separately. When we
need the HsapDv and MmusDv terms, we can simply extract them the entire
SSLSO mirror.
@gouttegd gouttegd marked this pull request as ready for review January 8, 2025 23:08
matentzn
matentzn previously approved these changes Jan 9, 2025
@matentzn
Copy link
Contributor

matentzn commented Jan 9, 2025

Looks great, thank you!

@gouttegd
Copy link
Collaborator Author

gouttegd commented Jan 9, 2025

I am still working on a (hopefully quick) last update: Moving the source of truth for the HsapDv and MmusDv mappings from Uberon to SSLSO, now that SSLSO provides a SSSOM mapping set.

Currently those mappings come both from Uberon and from SSLSO, and there are already some discrepancies between the Uberon-maintained mappings and the SSLSO-maintained ones (with the Uberon mappings pointing to some HsapDv/MmusDv terms that have been obsoleted). Better to let SSLSO be the sole source of truth.

Then it’s only a matter of updating the docs. Overall I plan for everything to be done and over in time for the next release that @aleixpuigb is planning for early next week.

The SSLSO project is now providing mappings to Uberon in the form of a
SSSOM mapping set. The xrefs maintained in Uberon are redundant with
those mappings, and they seem to be less maintained (several Uberon
xrefs currently point to obsoleted HsapDv or MmusDv terms).

We remove those cross-references entirely, so that SSLSO becomes the
sole source of truth for the mappings between Uberon and
HsapDv/MmusDv/OlatDv (as it already is for all the other *Dv
ontologies).
Update the documentation about bridge files and collected/composite
ontologies to reflect the recent changes in how those products are
managed.
@gouttegd gouttegd force-pushed the add-composite-life-stages branch from 42e0298 to d57e4d5 Compare January 9, 2025 12:50
Copy link
Contributor

@matentzn matentzn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesssssome:

I reviewed

832f254 (looks great)

and

d57e4d5

Which is _super_great!

All good to me!

@gouttegd gouttegd merged commit ec4fed2 into master Jan 10, 2025
2 checks passed
@gouttegd gouttegd deleted the add-composite-life-stages branch January 10, 2025 08:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bridge-files Issues related to the generation of bridge files from Uberon to other ontologies. composite pipeline tech
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants