Information passing between phases #965

ittaiz · 2020-01-28T15:20:33Z

We need to define how information should be passed between phases.
Current strategy is for each phase to return a struct and then following phases get a struct with all the structs nested (keyed by phase name).
This has the following drawbacks:

The contract of what a phase exposes is unclear- especially if we want to allow people to depend on a semi-public API it needs to be more clear what a phase returns without rummaging through it's return statement.
The current strategy doesn't allow for following phases to override or augment the data returned by previous phases. One need is to add files to the exposed runfiles as shown in unify all of the default info phases implementations #958. Note: Current mechanism does allow for phases which replace phases to override their return value by registering using the same name but that is not enough.
This means that to achieve this phases need to traverse all previous structs for named attributes.

We'd like to fix both of the above. It's likely that overrides and augmentation should be achieved by using the dependency inversion principle and having downstream phases define their needs and then upstream phases can expose this in whatever strategy we choose.

One option we can choose is to have each phase expose a dict of internal_providers (keyed by the name of the provider). This dict will also be an input to each phase and so phases will be able to both override and augment internal providers.
Another advantage of providers is that they are a recognized in the eco system and tools will support them with respect to docs and such.
To the best of my understanding providers are not a lot more than structs.

@andyscott has suggested we use well defined dictionaries or structs.

We should discuss these options and what value does each bring as well as other options that people might surface.

ittaiz · 2020-01-28T15:20:40Z

cc @borkaehw

andyscott · 2020-01-29T05:32:54Z

A few notes:

1 - We should audit the interactions between the current phases. I'm interested in:

what types are commonly being returned (depset of files, array of files, flags, etc)
reoccurring field names
one off dependencies between phases, i.e. "the foo phase is the only phase using the bar phase's flibbit field"

At the very least we can use this to establish and document some conventions for fields.

2 - Do we have any cases where a downstream phase needs to override something from an upstream phase? AFAIK we only have two cases close to flavor:

We only want one of a particular thing defined by any phase
Multiple phases need to add to an aggregate over the course of aggregation. This seems like a monoid to me.

We can support both of these cases without having overrides.

ittaiz · 2020-01-29T05:57:59Z

We should audit the interactions between the current phases

Sounds good. Can you do this audit?

We can support both of these cases without having overrides.

I guess we can, what's the con of overrides? It sounds to me like a simple and clear pattern.

Additionally I'd love to understand what are the cons in your eyes to providers compared to structs/dictionaries. AFAICT providers are that with a bit of more structure and familiarity.

smparkes · 2020-01-30T04:59:28Z

Moved from #958:

I'm not convinced that providers are the way to go for passing phase
information. Providers are typically used for passing information across
rule boundaries: rule implementation a -- target graph --> rule
implementation b. Here we are passing information between functions
within the same rule implementation, i.e. phase 1 --> phase 2, all within
rule implementation c. Perhaps we should just use well structured structs
or dictionaries?

I'm consciously a bystander on the phase stuff. I'm glad you all got this
(however it falls out).

FWIW, I did have similar feelings to the comment above.

This is what I understand (you all may already know this):

I got the impression providers were intended for "inter-target"
communication: rule calls return provider instances and those instances get
attached to targets created by the rule call which (1) can be type-enforced
by rules depending on the target (I think) parameters to label attrs and
(2) can be read by things that depend on those targets in a way that the
provider type becomes the attribute/key.

This doesn't seem to relate to phases?

Putting that special behavior aside, I did explore what behavior providers
have beyond what you get with structs. Once they're created, it doesn't
look like much. They type to "struct" and have to_json like structs do.

They do have the provider declaration for documentation. And they do seem
to enforce a little bit at construction: you can't add a field that isn't
declared. You can leave out fields: it doesn't do anything that a struct
constructor without that field wouldn't.

I think I'll avoid providers when I'm not using them in the rule impl/dep
role. I find the value-add low compared to the confusion it causes me
("what target/rule is this related to?")

I will say that I find the current factoring of the scalac provider stuff a
bit confusing.

_declare_scalac_provider runs a rule to create a target which does happen
to have the provider, but what is scalac_default itself? At this point,
I'm pretty comfortable saying this should be a toolchain, though I think it
needs to be factored differently. Handling multiple version toolchains will
drive this because the current global naming can't support that.

andyscott · 2020-01-30T18:01:38Z

Sounds good. Can you do this audit?

Can do, but it will have to be a few weeks from now.

I guess we can, what's the con of overrides? It sounds to me like a simple and clear pattern.

I guess I just don't see where we need to use overrides. AFAIK we either have exactly one thing we need to return (like the final executable) or we have an aggregation (like the output files). @borkaehw might be able to weigh in too, as they've pushed on the phase architecture quite a bit.

Additionally I'd love to understand what are the cons in your eyes to providers compared to structs/dictionaries. AFAICT providers are that with a bit of more structure and familiarity.

What @smparkes said 😄

ittaiz · 2020-02-21T14:30:32Z

I've been meaning to write here.
I think the cons mentioned here are low, the value add is to add some more types and structure into the APIs. I think that the bazel ecosystem currently uses providers between rules but at the same time the bazel ecosystem doesn't use phases at all.
Hopefully the two will change.
The way I see it our current architecture means that APIs between phases are at least semi-public if not public and are too vague and string based today.
I'd like to change that.

Re overrides- I think the extra power is valuable.

eed3si9n · 2021-05-29T18:27:58Z

As a concrete example, I'm looking to filter out some JARs after collect_jars phase. It looks like phase_compile uses the JARs from the struct named collect_jars:

rules_scala/scala/private/phases/phase_compile.bzl

Lines 123 to 127 in c9cc7c2

    
           jars = p.collect_jars.compile_jars 
        
           rjars = p.collect_jars.transitive_runtime_jars 
        
           transitive_compile_jars = p.collect_jars.transitive_compile_jars 
        
           jars2labels = p.collect_jars.jars2labels.jars_to_labels 
        
           deps_providers = p.collect_jars.deps_providers

so the only way this can be customized is if I substitute collect_jars phase?

Ideally I want the original collect_jars phase to do its thing, and then in a subsequent phase I can add or subtract the JARs based on some logic inclusion or exclusion logic.

liucijus · 2021-05-31T12:57:49Z

@eed3si9n can you give more context what are you trying to achieve that you need to change jar list? Currently collect_jars has more than one responsibility (collecting jars according to dependency strategy and mapping labels to jars for dep analyzer). Maybe it needs to be refactored into more phases.

As a concrete example, I'm looking to filter out some JARs after collect_jars phase. It looks like phase_compile uses the JARs from the struct named collect_jars:

rules_scala/scala/private/phases/phase_compile.bzl

Lines 123 to 127 in c9cc7c2

jars = p.collect_jars.compile_jars

rjars = p.collect_jars.transitive_runtime_jars

transitive_compile_jars = p.collect_jars.transitive_compile_jars

jars2labels = p.collect_jars.jars2labels.jars_to_labels

deps_providers = p.collect_jars.deps_providers

so the only way this can be customized is if I substitute collect_jars phase?

Ideally I want the original collect_jars phase to do its thing, and then in a subsequent phase I can add or subtract the JARs based on some logic inclusion or exclusion logic.

eed3si9n · 2021-06-01T01:02:24Z

I work for Twitter, and we're trying to migrate existing Pants codebase to Bazel simultaneously since the BUILD files are close enough. Not so surprisingly, where we run into a major difference is the timing of 3rdparty resolution, where Pants is able perform Coursier resolution per target. I'm implementing multiversioning support so the build users can declare multiple versions of Kafka etc, and different parts of the build graph can use different versions.

There are, however, situations where majority of the users would want to use version X of something like Guava, but some needs to stay behind or ahead for specific reasons. Allowing terminal node to perform lightweight filtering can solve this sticky problem of ending up with multiple Guavas on the classpath.

Another similar use case is allowing the build users to exclude specific files for deployment images. I assume this is also done to handle conflict with the data processing runtimes such as Hadoop, or to swap out the logging libraries.

ittaiz mentioned this issue Jan 28, 2020

unify all of the default info phases implementations #958

Merged

borkaehw mentioned this issue Feb 4, 2020

Centralize dependency logic #975

Merged

ittaiz mentioned this issue Feb 8, 2020

Phase Extension Convention #992

Open

ittaiz mentioned this issue Feb 21, 2020

Generic scala_test rule #959

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Information passing between phases #965

Information passing between phases #965

ittaiz commented Jan 28, 2020

ittaiz commented Jan 28, 2020

andyscott commented Jan 29, 2020 •

edited

Loading

ittaiz commented Jan 29, 2020

smparkes commented Jan 30, 2020

andyscott commented Jan 30, 2020

ittaiz commented Feb 21, 2020

eed3si9n commented May 29, 2021

liucijus commented May 31, 2021

eed3si9n commented Jun 1, 2021

Information passing between phases #965

Information passing between phases #965

Comments

ittaiz commented Jan 28, 2020

ittaiz commented Jan 28, 2020

andyscott commented Jan 29, 2020 • edited Loading

ittaiz commented Jan 29, 2020

smparkes commented Jan 30, 2020

andyscott commented Jan 30, 2020

ittaiz commented Feb 21, 2020

eed3si9n commented May 29, 2021

liucijus commented May 31, 2021

eed3si9n commented Jun 1, 2021

andyscott commented Jan 29, 2020 •

edited

Loading