Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Enable *some* parallelizable sequences for Molecule #2135

Conversation

decentral1se
Copy link
Contributor

@decentral1se decentral1se commented Jun 27, 2019

OK, here is a working draft of a potential approach to running molecule in parallel. This is not the final implementation but rather something that can be tested early by those who need it. Please test it! I need constructive feedback on this change and this approach.


Notes on the implementation:

  • We introduce a --parallel flag which is stored at the molecule.config.Config level
  • We then store a parallel_run_uuid at the molecule.scenario.Scenario level
  • We then inject this uuid into the cache and instance creation during the test sequence
  • Molecule can't track state for partial test sequences. So, you can't run molecule create --parallel && molecule login because the parallel_run_uuid will be lost and molecule won't know which state to load.
  • Molecule itself is not implementing parallel functionality (like with async or coroutines) but is just arranging the internal state to be parallelized by another tool, like GNU parallel or Pytest).

How to Test

I tested this with the following commands:

$ apt install -y parallel
$ CMD="molecule test --parallel"
$ parallel ::: "$CMD" "$CMD" "$CMD"
$ watch ls -lha ~/.cache/molecule_parallel  # in another terminal
$ watch docker ps -a  # in another another terminal

This shows how we can run molecule in parallel 3 times and shows how each separate cache folder created as well as the separate instances. I've played around with it a bit and it looks to be a working proof-of-concept.


TODO:

  • Consider using --enable-parallel since --parallel might be misleading because Molecule itself does not manage any parallelization itself but simply allows the internals to be parallelized.
  • How to handle existing playbooks? Should we write a migration to enable to new filter on the naming?
    • Add a change log entry which says if you use a custom playbook, you need to update it like this and that etc. ?
  • Make the test suite parallelizable
  • Write documentation and change log

@decentral1se decentral1se force-pushed the spike/parallelizable-state-experimental branch from 7be2c2b to 1d31520 Compare June 27, 2019 15:14
@decentral1se decentral1se force-pushed the spike/parallelizable-state-experimental branch from 1d31520 to 6c11d96 Compare June 27, 2019 15:19
@decentral1se decentral1se requested review from themr0c and ssbarnea June 27, 2019 15:24
@tehsmyers
Copy link
Contributor

I have a few scattered thoughts, with no guarantees that any of them are useful. As always, I apologize in advance for the verbosity. Of all of these thoughts, I think only the parallelize_instance_name change is something I'd like to see in this PR, so I put that one first; the other ideas are more in the style of thinking out loud. I love the addition of the name filter, and I love the new ansible facts to further allow playbook authors more visibility and control over the molecule run based on the parallelization settings.

  • The parallelize_instance_name filter is generally useful. While I expect it will most often be used on instance names, I'd be inclined to use it in, say, openstack playbooks to ensure run-unique keypairs and security groups. Perhaps it should just be parallelize_name or something similarly generic?

  • Explicitly setting is_parallel_mode might be unnecessary. I wonder if that logic can all be based on whether or not parallel_run_uuid is set. This would probably require moving the uuid generation back to the click level, where the value of the parallel command arg is either, say, None or the uuid. This thought is mainly stemming from the _maybe_parallelize_instance_names method in the platforms module, where I feel like there's probably a cleaner approach to take but it's not immediately obvious to me what that cleaner approach might be. I see how it could be risky to overload parallel_run_uuid with also being the way to decide whether or not parallization should be done, but if is_parallel_mode can't be true unless parallel_run_uuid is set, then I think it makes sense to derive its value from parallel_run_uuid rather than explicitly carry around a configure boolean.

  • This could potentially open the door for using a predefined unique ID, allowing for something like molecule <sequence> --parallel --parallel-uid 'some-pregenerated-thing', or even just ... --parallel 'some-pregenerated-thing', so all of the same parallelization machinery could be used for arbitrary sequence calls.

  • This also highlights a potential need to centralize some click functionality using its decorator factories, should we want to expand parallelization out to all sequences using unique IDs provided by the test environment.

  • The molecule_parallel_mode ansible var seems oddly named. I'd expect that to be a string representing the name of the current parallel mode. Perhaps sticking with the "is_parallel" convention here would be helpful in indicating that this value is a boolean, e.g. molecule_is_parallel or molecule_is_parallel_mode. Similar to my first point, I also wouldn't be against just having molecule_parallel_run_uuid be set to a falsey value if the current run is not being parallelized, and deriving whether or not a run is parallelized on that value. Again, though, overloading this var with that meaning might lead to confusion.

@ssbarnea
Copy link
Member

How about a “molecule_naming_suffix” which would be empty by default in non-parallel mode? We should keep the logic of how we determine that suffix separated from the places where we use it.

@decentral1se
Copy link
Contributor Author

Perhaps it should just be parallelize_name or something similarly generic?

Ah, I didn't realise others might want this for other purposes. Sounds good.

Explicitly setting is_parallel_mode might be unnecessary. I wonder if that logic can all be based on whether or not parallel_run_uuid is set.

Well, it's all a bit circular but the way I have it my head is: 1) end-user passes --parallel 2) we store that on the Config 3) we generate the uuid. If we remove the is_parallel_mode, we'll have to do hasattr checks instead for the uuid property? Perhaps the uuid will not be the defining indicator of whether or not things are going in parallel mode, so I think it makes sense to keep this high level "the user passes --parallel" and leave uuid/whatever as deeper _internals. You mentioned this in your last point.

this could potentially open the door for using a predefined unique ID, allowing for something like molecule --parallel --parallel-uid 'some-pregenerated-thing'

I think that's for another PR but I was thinking about announcing the uuid that was under for the run (some obvious logging) and then extending molecule prepare/login/etc. to accept a --uuid flag for the state lookup. I'd prefer if molecule controls how these uuids are generated or who knows what we run into.

This also highlights a potential need to centralize some click functionality using its decorator factories, should we want to expand parallelization out to all sequences using unique IDs provided by the test environment.

Yep, agreed! It will be coming down the line on this PR.

Perhaps sticking with the "is_parallel" convention here would be helpful in indicating that this value is a boolean, e.g. molecule_is_parallel or molecule_is_parallel_mode

Good point.

@decentral1se
Copy link
Contributor Author

Well, I see no major objections to the approach, so I'll try to get this into a reviewable state.

@decentral1se decentral1se deleted the spike/parallelizable-state-experimental branch June 28, 2019 13:17
@decentral1se
Copy link
Contributor Author

OK, next attempt @ #2137.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants