-
Notifications
You must be signed in to change notification settings - Fork 307
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Automatic Coercions Considered Harmful #373
Comments
I agree that implicit int -> string and string -> int typecasts should be forbidden. I am not so sure about the optional stuff. This is already explicit right? the # convert an optional file-array to a file-array
Array[File] files = ["a", "b"]
Array[File]? files2 = files
# Convert an optional int to an int
Int a = 3
Int? b = a What is the exact problem with this code? In practice (in BioWDL at least) optional values are only used to set optional command line parameters in tasks. call myTask {
inputs:
my_required_input = 5
my_optional_input = 3
} After your proposal it will have to be: call myTask {
inputs:
my_required_input = 5
my_optional_input = Some(3)
} I don't see what this accomplishes except that it makes WDL harder to write. |
Forcing people to be explicit about their coercions does make WDL more verbose. However, it opens the door to using type inference (@cjllanwarne mentioned this a long time ago). The WDL type system is actually quite complex. We have overloading, A realistic way of doing this is support type-inference inside a task or a workflow, supplemented with explicit coercions when needed. |
Implementing WDL bindings in java for Additionally, it complicated by the fact that one could argue that with auto type coercions, as long as there is a path to the target type, you should be able to coerce any LHS to any RHS. For example in MiniWDL Given the above one could do something like
This really does not make any sense, and its a bit ambiguous what it will eventually resolve into. Additionally, there is no strict ruels around what is actually coercible to what... (rules were added in the development spec.. but they still are very vagues). This leads to alot of potential issues in executing workflows on different systems. Its worth noting too that even python uses explicit type conversions through functions like |
Coercion semantics usually have another requirement called coherence that needs to be satisfied, which is much more difficult to prove. For reference, here one of the papers: https://link.springer.com/chapter/10.1007%2F3-540-54415-1_70 I have not seen proofs for WDL, but I'm sure the developers must have them somewhere. Hope it helps, |
@pgrosu your confidence in us is comforting, but I am not to sure you are going to find any sort of formal proofs, or documentation on coherence. If this is an area that you are comfortable in, and could help us flesh out that would be much much appreciated |
How about type coercion of the inputs/outputs from the standard library functions?
I could not find any such case in the spec, and I think this is a bit contradicting the static typing requirement. In general though seems these examples compile and run, but we start to see coercion problems when nesting such functions and applying them to optional types. Like this:
In this case we see the problems of coercion for values of
We can solve this in dxCompiler and allow such scenarios, but isn't this in the realm of automatic type coercion and contradictory to strong/static typing of the language? Could you provide more explicit guidance on typing the inputs/outputs for the standard library functions in the language spec? |
Automatic coercions considered harmful
Authors: O. Rodeh with J. Didion (DNAnexus Inc.)
This issue describes a line of thought regarding type checking for WDL programs. It is based on experience acquired while working with DNAnexus customers on their WDL pipelines. The author also wrote a WDL type checker, evaluator, and cloud executor.
Coercions
WDL makes wide ranging use of coercions to convert one type to another. This is, in general, a well accepted idea used in a wide variety of programming languages. At DNAnexus we have run into several instances where WDL workflows errored out because static type checking was too lenient. This resulted in long running expensive customer pipelines failing. The issue is twofold: the turn around time is long, and the cost can run to thousands of dollars.
For example, under the WDL type system, a
File
is auto-coercible intoFile?
. However, this is not the case in the DNAnexus simpler type-system. If you have a task that takes anArray[File?]
and call it withArray[File]
this will fail. For example:Workflow
bar
will fail when it callsfoo
, after it has spent a long time running an expensive calculation. This is frustrating for the users. We could prevent this, as well as many other errors, by performing stricter type checking. Below is an eclectic list of automatic coercions that appears in GATK pipelines. The GATK pipelines are large real world workflows that our customers expect to execute flawlessly on our platform. The code is a simplified version of the real program lines, stripping a way the context to focus on the main point.Converting between
T
andT?
occurs automaticallyConverting between strings, integer, and floats "just works".
The branches of a conditional may not resolve to the same type. Here, one side is
Array[String]
, the other isArray[String?]
.Since
T
andT?
are interchangeable, we can perform operations likeT
+T?
+T
.The following non-intuitive code type checks in WDL (womtool-51.jar) :
And then
aai
type checks too:Proposal
Explicit coercions are useful and make sense, for example:
However, auto-coercions can lead to unexpected behavior and costly/confusing job failures. We suggest allowing explicit coercions, adding a few necessary stdlib function for type conversion, and disallowing automatic coercions. A few necessary stdlib functions, written a la Scala, are:
In order to select the first non empty integer, as below:
one would write:
This would make the typing of
select_first
accurate; it would only take an array ofT?
.Code like:
would be written:
A Pythonic way of writing these functions would be to have a single cast function that takes a type and a value and either casts the value to the type or throws an error. For example:
cast could even handle optional types, which would get rid of the need for Some(), e.g.
I hope this approach would reduce the number of preventable runtime errors.
The text was updated successfully, but these errors were encountered: