Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Foundation resources to support build #1154

Closed
mhdawson opened this issue Feb 27, 2018 · 31 comments
Closed

Use Foundation resources to support build #1154

mhdawson opened this issue Feb 27, 2018 · 31 comments
Labels

Comments

@mhdawson
Copy link
Member

This has been raised by @MylesBorins in the past but I could not find the issue.

I agreed in the last meeting to open an issue to revisit the discussion. The question is should we ask the Foundation for paid resources to help improve our ability to be responsive when failures/problems occur and to make more progress overall. It feels like sometimes its hard to be responsive enough with just volunteers.

I did briefly pass the idea by Mark Hinkle and it sounds like it might be a possibility but we'd have to write down what we think would make sense in terms of what we would want.

@MylesBorins
Copy link
Contributor

AFAIK there is already budget for this. /cc @mrhinkle to confirm

@jasnell
Copy link
Member

jasnell commented Feb 28, 2018

help improve our ability to be responsive when failures/problems occur and to make more progress overall ...

This is all good, but it's not clear what is being asked for. Are we talking about the foundation hiring a devops/infrastructure person? Are we talking about just paying for better infrastructure? If there is budget allocated, what is it allocated for?

@mhdawson
Copy link
Member Author

mhdawson commented Mar 1, 2018

@jasnell this issue is to quantify what we want to ask for.

From my perspective I don't think we have any key issues driving a need to pay for better infrastructure as we have a pretty good set of resources (although others may have ideas).

On the other hand the challenge of keeping the CI up and running as a production system (ie there is impact when machines are down, acting up etc. in terms of slowing progress in the project) is not as good a fit with people volunteering their time as it is in other aspects of the project where asynchronous work fits very well. It may be that in this case only (build support) having a paid infra/devops person might make sense. However, I know that there is no consensus on this and I'm not 100% sure myself its a good/workable idea yet. This issue was for us to have/document the discussion on that front.

One thing that would help is if we captured pain points people are seeing so that we can assess what options would/would not address them.

@mhdawson
Copy link
Member Author

I think we should discuss this again. I don't think we are keeping up or handling "urgent" issues as well as we'd like and I think we should strongly consider the option of paid resource in order to handle "urgent" issues with that paid resource being under the direction of the build WG volunteers.

It's a model that seems to have worked elsewhere (ex Apache Software Foundation)

@nodejs/build

@mhdawson
Copy link
Member Author

To be 100% clear the ask would be for a paid person to handle "urgent/timely" work including things like making sure required compiler levels are on release machines etc.

@rvagg
Copy link
Member

rvagg commented Dec 19, 2018

Rich and I will begin a strategic initiative sometime in January to do some accounting of resources and costs (people and other) and try and come up with proposals and plans for reducing pain and improving the experience for collaborators while managing quality. Or something like that. Basically we need to think more strategically about this and understand what our problems and risks are and what we can do about them.

@MylesBorins
Copy link
Contributor

The foundation is in the process of working out 2019 budgets right now. If we need an infrastructure line item it would like be a good idea to have a rough idea of the budget sooner rather than later. Of course this can always be done after budgets are approved, but better early imho

One thing to consider as well is scope. Working under the assumption that a foundation merger will be happening it is worth considering if we want to make build infrastructure available to other foundation projects. An expanded scope will obviously require more work, but may also expand the circle of volunteers

Trott added a commit to Trott/TSC that referenced this issue Apr 2, 2019
From nodejs/build#1154 (comment):

---
Rich and I (Rod) will begin a strategic initiative sometime in January
to do some accounting of resources and costs (people and other) and try
and come up with proposals and plans for reducing pain and improving the
experience for collaborators while managing quality. Or something like
that. Basically we need to think more strategically about this and
understand what our problems and risks are and what we can do about
them.
---

This is that initiative (in March rather than January, but such is
life).
mhdawson pushed a commit to nodejs/TSC that referenced this issue Apr 3, 2019
From nodejs/build#1154 (comment):

---
Rich and I (Rod) will begin a strategic initiative sometime in January
to do some accounting of resources and costs (people and other) and try
and come up with proposals and plans for reducing pain and improving the
experience for collaborators while managing quality. Or something like
that. Basically we need to think more strategically about this and
understand what our problems and risks are and what we can do about
them.
---

This is that initiative (in March rather than January, but such is
life).

PR-URL: #685
Reviewed-By: Michael Dawson <[email protected]>
Reviewed-By: Anna Henningsen <[email protected]>
Reviewed-By: Ali Ijaz Sheikh <[email protected]>
Reviewed-By: Colin Ihrig <[email protected]>
@Trott
Copy link
Member

Trott commented Apr 24, 2019

Here's a quick draft of a problem statement before we try to assemble possible solutions. @rvagg @mhdawson @refack and everyone else on @nodejs/build, PTAL.

Problem statement

The Build Working Group (Build WG) is essential to the operation of the Node.js project. Continuous integration (CI) testing can't happen without the work of the Build WG. Releases cannot be built and completed without the work of the Build WG. The nodejs.org website and email addresses depend on the work of the Build WG. And so on.

The Build WG consists entirely of volunteers. Additionally, there is a barrier to entry to the more critical roles in the Build WG. Trust must be established for those roles, and that is usually done by being someone who will face serious repercussions at one's place of work in the case of abuse of those privileges. Unfortunately, this has meant that these high-privilege critical roles are filled by a small pool of members whose numbers slowly dwindle and never grow.

But it's not just the high-privilege needs that are unmet. The Build WG issue tracker perpetually has hundreds of issues that they are unable to address. Failures in CI can take hours or more to correct if they happen at the wrong time or on the wrong platform. Releases can be held up for hours or more. The cost is not just the delays but also the frustration of scores of other volunteer contributors trying to contribute code or push out a release.

Additionally, hardware is donated, resulting in scarcity in some platforms. While the shortage of personnel is more acute, both issues badly need to be addressed.

@rvagg
Copy link
Member

rvagg commented Apr 24, 2019

Sounds OK, except that second last paragraph sounds like it might be overstating the severity of the problem it's pointing to Is it an accurate reflection of how you see the severity?

@mhdawson
Copy link
Member Author

Generally I'm ok with it, but could probably use some tweaking if it is going to be promoted publicly versus being the basis for internal ongoing work.

Which hardware do you think is problematic in terms of scarcity?

@Trott
Copy link
Member

Trott commented Apr 24, 2019

@rvagg @mhdawson I'm not trying to speak for myself there. I'm trying to capture the sentiment of people such as yourselves. If you think I've overstated something, said something questionable, etc., then it's probably a sign that it needs editing. I can try to revise based on feedback, but if you could provide revised text, that would be best from my perspective.

(If I'm going to be doing the revision, I'll wait for feedback from @refack first.)

@rvagg
Copy link
Member

rvagg commented Apr 24, 2019

@Trott I'm more interested in your perception actually. It doesn't quite match mine so I wouldn't mind understanding the mismatch because maybe I don't have a clear perspective on it.

@Trott
Copy link
Member

Trott commented Apr 24, 2019

@rvagg There have definitely been times when CI is broken for basically an entire weekend. They have been relatively few lately, mostly because @refack is responsive on weekends. Literally, that's the only thing that has kept us from a mutiny and a reversion to "oh well, guess we're just going to start landing stuff even though CI is red" again. It basically comes down to @refack being around to fix things or at least take platforms out of the test suite.

That said, re-reading that paragraph you point to, it does seem to paint with a broad brush and probably is more alarming than the reality. (On the other hand, are we not obviously headed in that direction?)

@Trott
Copy link
Member

Trott commented Apr 25, 2019

Thinking about it more, I wonder if that paragraph should be removed or shortened and some material added about the threat of burnout? The number of people who really keep the lights on in Build can be counted on one hand with fingers to spare, so.....

@mhdawson
Copy link
Member Author

I agree that part of the problem we need to highlight is that even if things were prefect we are too dependent on a small number of people who are volunteers.

@sam-github
Copy link
Contributor

Does this need to be on the WG agenda? I'm not sure what needs to be discussed, and particularly, if there is anything new.

@Trott
Copy link
Member

Trott commented Jun 26, 2019

Does this need to be on the WG agenda? I'm not sure what needs to be discussed, and particularly, if there is anything new.

Here's what I'd want sorted out on this issue before closing it. (Michael's goal or someone else's might be different.)

  • If @mhdawson, @refack, and @joaocgreis agree with @rvagg's comments, I think we can close this and think about @rvagg's approach to sustainability for the strategic initiative.

  • On the other hand, if there isn't consensus that reducing the scope of our infrastructure is preferable to paying people to run CI, then this should probably stay open to sort that out.

@sam-github
Copy link
Contributor

I'm not sure if I'm expected to have an opinion about this as a WG member, but I really don't.

I'm definitely not asking for it to remain open or be closed, but I would like it taken off the agenda if there isn't something to discuss!

@mhdawson you added the label on Oct 30th, does it need to remain?

@mhdawson
Copy link
Member Author

@Trott I think the next step was some info about what "reducing the scope of our infra" would mean as a prereq for understanding if it's a viable option/alternative.

@github-actions
Copy link

This issue is stale because it has been open many days with no activity. It will be closed soon unless the stale label is removed or a comment is made.

@richardlau
Copy link
Member

This is ongoing work and is part of the build resources strategic initiative which is discussed during TSC meetings (e.g. https://github.com/nodejs/TSC/blob/master/meetings/2020-04-16.md#build-resources-update).

@jkleinsc
Copy link

One question I've wondered about is whether consideration has been given to paying for a CI provider?

@sam-github
Copy link
Contributor

I (in my very personal opinion) would be happy to straight up have the foundation pay for services if we could get them. Or at least consider it.

Unfortunately, I'm not aware of one that supports our platform requirements, which CI provider did you have in mind? What would it offload from the WG?

Our work tends to be things I'm not sure we can buy "off the shelf" (OS X notarization, arm 32, arm 64, aix, linux on lots of archs, devtoolset instead of raw gcc, older linux versions so we can build binaries that are ABI compatible to older linux, matching of the release line being built to the machines and compilers that it should be tested on, ditto for "releases built on", etc.)

@jkleinsc
Copy link

@sam-github I have some ideas but not sure about what I can disclose publicly. I'd love to discuss this at the next build WG meeting.

@sam-github
Copy link
Contributor

Sure, though the build WG meetings are usually public :-). We can discuss things after turning streaming to video off, though.

Or you could just email me if you want to bounce some ideas around. Often ideas for simplification underestimate the range of requirements we have, but I'm a big fan of simplification and streamlining to fit within available resources.

@mhdawson
Copy link
Member Author

@jkleinsc please include me in the loop as well if you send out an email. I am in the process of exploring some options with the Foundation so I'd be interested to see what ideas you have.

@jkleinsc
Copy link

@sam-github @mhdawson email sent from my electronjs email.

@github-actions
Copy link

This issue is stale because it has been open many days with no activity. It will be closed soon unless the stale label is removed or a comment is made.

@github-actions github-actions bot added the stale label Feb 22, 2021
@mhdawson mhdawson removed the stale label Feb 23, 2021
@github-actions
Copy link

This issue is stale because it has been open many days with no activity. It will be closed soon unless the stale label is removed or a comment is made.

@github-actions github-actions bot added the stale label Dec 21, 2021
@mhdawson
Copy link
Member Author

I'm just going to close this. The ask has been made both to the OpenJS Foundation staff and to the board members and there was some exploration but in the end the foundation was not in a position to provide support. We can always open a new issue/discussion if that changes in the future. @brianwarner, @rginn let me know if that's incorrect and we should re-open.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

8 participants