WDL Package specification #499
rhpvorderman
started this conversation in
Proposals: Discussion
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
One of the core strengths of WDL is its modularity. Tasks can easily be shared across workflows without much effort (as proven by the popularity of the BioWDL tasks repo). Furthermore workflows can be called in exactly the same way as tasks, which allows easy sharing and incorporating into workflows.
When you use this level of modularity having versioned imports and package management is a requirement for easier development. Currently BioWDL solves this with versioned git submodules and packaging the result with wdl-packager to create an imports zip. Not ideal.
@DavyCats has already suggested a versioned import syntax and package registry here #493 . @cjllanwarne also discussed a syntax for version syntax here: #226.
In this discussion I want to discuss the package format itself. I have already discussed a bit with @mlin on the wdl-packager repository about possible file formats: biowdl/wdl-packager#6
I wrote up an initial package specification here: https://github.com/rhpvorderman/wdl-package-specification
Below I would like to highlight and elaborate on the choices I made:
Reproducibility
Workflow reproducibility was taken as a key feature of the package spec:
tags
and it is extremely annoying). There is an exception for SNAPSHOT packages.Tar vs zip
While zip is better supported on all platforms, it packages files individually. It also means that compression is applied to files individually. For this reason compressed tar archives can achieve much better compression rates. This is a significant thing to consider when setting up package repositories.
Additionally, since packages are reproducibly packaged, two WDL package tar files are always the same when using the exact same source.
md5sum mywdl.tar
,zcat mywdl.tar.gz | md5sum
andxz -cd mywdl.tar.xz | md5sum
all give back the same checksum. Such a thing is not possible with internal compression such as in zip.Zip could also have been used to package the files uncompressed and then compress them afterwards, but
zip.xz
is just weird.tar.xz
is much more common. Sotar
was chosen.UStar tar archives.
This is a standard from 1988. It made sense to use something that is very widely supported across different platforms. 255 character maximum pathnames should not cause big troubles. (In fact, it is kind of nice not having to type more than 255 characters in our import statements so let's embrace this technical limitation ;-) ).
MANIFEST.json
This was taken from @mlin 's implementation of
miniwdl zip
. A lot of packaging systems have a "manifest" file, so incorporating this made sense. The manifest is in JSON because it is an easily parsable format that a lot of languages have support for in their standard library. The JSON fields areunderscored_all_lowercase_variables
because this is also the preferred way to name functions in WDL and there is no reason to deviate from that style.license_file
A license file must be included in any code archive that is redistributed. Since different licenses require their license file to be named differently (
COPYING
,LICENSE
etc.) thelicense_file
key was chosen.license_id
was added to make it easier for package repositories to parse the license format without having to read the entire license.user vs packager
Having packages that:
Is great for users, but sometimes an inconvenience for the developers who package their WDL. I think the usability for the user should always take priority.
I would like to hear your take on this initial specification. Thanks!
Beta Was this translation helpful? Give feedback.
All reactions