Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chapter proposal: Open Infrastructure #2337

Open
13 tasks
loleg opened this issue Apr 28, 2022 · 15 comments · May be fixed by #2407
Open
13 tasks

Chapter proposal: Open Infrastructure #2337

loleg opened this issue Apr 28, 2022 · 15 comments · May be fixed by #2407
Labels
idea-for-discussion This can be used for inviting discussion from collaborators or community in general

Comments

@loleg
Copy link
Contributor

loleg commented Apr 28, 2022

There is an interest to branch our communities by developing a chapter for Open Infrastructures for research. This chapter would set the context and provide a vision for how to evaluate tools and platforms with a Turing Way perspective on reproducibility, ethical alternatives and collaboration in practice.

We should start by defining "open infrastructure", e.g. as a term that encompasses a wide range of practices for providing and decentralising access to resources and knowledge that is essential to sustain online research and other forms of digital collaboration. The chapter should contrast cloud computing or infrastructure-as-a-service with under-the-desk hardware, address some of the risks of bringing your own device to reproducibility, provide some tips about negotiating and partnering with the IT departments of your institute or institution, and most importantly explain how to properly evaluate and document the infrastructure of a scientific effort.

Context

There have been a number of discussions of the term in this project already, notably in the context of Binder project, and most recently in a call with @aleesteele at a Frictionless Data meetup organised by @sapetti9.

Some relevant content already exists in the Guide for Reproducible Research (where this chapter would seem to most likely fit), the Guide for Collaboration (notably in the coverage of GitHub as an open platform, and shared ownership of open source projects), and Research Infrastructure Roles (mentioning the architectural and engineering work involved).

Working across various levels of abstraction - from programming languages and package landscapes of R, Python, Julia, to standards focused groups like the ones behind Data Packages - to more specific projects like Jupyter, Livemark or Docker, we can appreciate how critical guidance on solid principles of infrastructure (software / hardware / cloud) can be to open research projects and open data publication. Let us start a Discussion, schedule a call, and go from there.

Resources

Who can help?

  • @loleg - software engineering background with some science lab support experience, plus I know a few things about what it takes to write a good handbook - this would be my first contribution to the Turing Way, and I would be happy to spend a bit of time to brainstorm, research and collaborate on this with you folks here;

Updates

  • Write chapter outline
  • Add material to the chapter
  • Combine materials into a readable chapter
  • Proofread
  • Request reviews
  • Address reviews
  • Merge to main branch.
  • Add to Wikipedia
@welcome
Copy link

welcome bot commented Apr 28, 2022

Welcome Banner
🎉 Welcome to The Turing Way! 🎉 We're really excited to have your input into the project! 💖

If you haven't done so already, please make sure you check out our Contributing Guidelines and Code of Conduct. If you need to connect more synchronously with members of The Turing Way community, please feel free to chat with us in our Slack workspace, or you can join our Collaboration Cafe for mentored contribution or co-working.

@loleg
Copy link
Contributor Author

loleg commented Apr 29, 2022

Summary section in the test deployment looks good.

@JimCircadian
Copy link

I'm interested in contributing to this section for sure. For reference and having chatted to @aleesteele, it seems wise to incorporate some of the feedback on the open infrastructure topic that was discussed in a Collaborative Ideas session group at the SSI Collaborations Workshop 2022. There was a great discussion with Sarah Gibson from 2i2c, Arielle Bennett from ATI,
Morane Gruenpeter from Software Heritage,
Mohamed Selim from EMBL-EBI and Patricia Herterich from University of Edinburgh around the context of the solution for an open infrastructure initiative, which led us to surmise that this would be a good chapter for the Turing Way. Obviously @loleg you've had the same idea, but hopefully these notes help set some context about what "could" be achieved!!! 😄

Context
The governance, infrastructure assets and what is hosted asset (software, data, information) across all domains need a low barrier to development for research projects. Organisations exist to take this pain away, but teams might not always be able to employ these skills themselves and are unsure which options are the most suitable to choose.

Problem
Better efforts are needed to ensure there is easily accessible guidance for teams choosing open infrastructure solutions to underpin their research, that is not onerous for multidisciplinary researchers and engineers to adopt.

Solution
Create a decision making framework for PIs to pick open infrastructure to use. Pros and cons against commercially available solutions/infrastructure.

This was all derived from these initial bullet points, which came out of general discussion about why/what ideas/challenges there are to open infrastructure:

  • Running open infrastructure requires skills that are really hard to learn and most PIs (senior people) do not understand/care about it
  • Avoiding vendor lock in
  • People don’t mind paying for this (but do they care if it’s open?)
  • Information management/sharing of assets is another level of infrastructure to consider, as much about people than systems
  • Governance is also to be considered, so processes around this need to be transparent - community involvement/caring about people’s involvement
  • Building on legacy work - creates technical and policy issues
  • Guidance
  • Decision making framework over prescriptive solutions

This is very much a revised braindump from the session, but hopefully it's helpful to furthering this!

@aleesteele
Copy link
Member

aleesteele commented May 10, 2022

Hi @loleg! Welcome to The Turing Way. It's great to have you here representing the Frictionless Data and OKF communit(ies). 😄

And @JimCircadian, I can't believe that less than 1 hour after @sgibson91 flagged your post on the Software Sustainability Institute's Collaborations Workshop 22 (CW22) slack channel, you responded to this PR with notes! Thanks so much. (@loleg - you should check out the event for next year!)

I'm adding a couple of historical & contextual references for this Open Infrastructure chapter within & around The Turing Way, to document the conversations that have surrounded open infrastructure (which were definitely here long before I started as CM). Keeping this context in mind is important to establishing continuity to these existing threads, and enabling recognition for all work (both visible and invisible)!

With that being said: there are going to be a lot of Github tags here – please let me know if I'm missing anything or any one! 😄 These notes may also be missing things - please feel free to add comments, corrections, reactions.

Documenting previous conversations & possibly interested parties:

  1. Invest in Open (IOI) Infrastructure (founded in 2018) has been integral in definining and bringing attention to open infrastructure. Over the past few weeks, @malvikasharan and I have had conversations with @emmyft to learn more about their work, who has flagged this discussion about the open infrastructure chapter with the wider IOI team.

It's important to note that @KirstieJane (co-lead of TTW) is a long supporter of the movement, having recently hosted the Executive Director at a TPS event in May 2021. See this slack post.

It's also important to note that IOI is fiscally sponsored by Code for Science & Society, a broad initiative that supports community-centered practices around technology infrastructure.

  1. Within TTW, @sgibson91 (who is a core contributor to The Turing Way and member of the 2i2c team), has informed documentation about open infrastructure during initial community research: Community Research: Learning About The Turing Way #2318. We've also flagged Chris Holdgraf (@choldgraf) about the possibility of developing this chapter in collaboration with the 2i2c/Jupyter team, as their work in hosting interactive computing infrastructure for research and education and supporting open source projects is integral. The Turing Way itself is hosted on Jupyter Book.

  2. @pherterich (who was a founding member and key contributor of TTW) was recently a participant at CW22 (4-7 April 2022), and collaborated with @JimCircadian and others (cited above) to think through what an open research infrastructure contribution could look like within The Turing Way, and a decision-making framework for implementing open infrastructure in a research environment. Really important work, which sets a foundation for this chapter.

  3. @Arielle-Bennett was integral in writing the "Research Infrastructure Roles" chapter (see [WIP] Research infrastructure roles chapter #1924 / Research Infrastructure Roles Chapter + Case Studies #2160 / Additional roles and case studies for the Research Infrastructure Roles chapter #2209 ). During one recent Fireside Chat, "Emergent Roles in Research Infrastructure & Technology" (see YouTube recording here), TTW community members and others discussed how much these roles operate as another kind of infrastructure, the "human infrastructure" behind research itself. It begs the question of how a "research infrastructure" chapter can link back to developing and supporting these emergent roles.

  4. With this in mind, the Frictionless Data project, hosted by the Open Knowledge Foundation (one of the oldest organisations in the open data ecosystem) also introduces another element, 'open data infrastructure', perhaps embedded within the wider notion of open infrastructure more broadly. During the community meeting with the Frictionless community a few weeks ago, many were enthusiastic about getting involved, as this issue by @loleg shows! It's exciting to see these connections forming.

In terms of next steps: we'd love to facilitate cross-organisational collaboration to build this chapter together, both as an opportunity think about what collaborative governance we can build at The Turing Way to enable it. It means that the timeline for working on this project may take awhile, and may require a bit of experimentation and making mistakes, but ultimately will be more representative of the expertise and perspectives of the community at large.

We'll be updating this thread and the Slack for more information about starting the first steps of this collaboration: whether that is an initial cross-org meeting, brainstorming chapter content, information architecture, etc.

@aleesteele
Copy link
Member

aleesteele commented May 10, 2022

Also breaking down @loleg & @JimCircadian's points from above into general questions for all involved:

  • Shared Definitions: Should we be more specific about "open research infrastructure" as opposed to "open infrastructure" more broadly, given the researcher-focused audience of The Turing Way?

Screenshot 2022-05-10 at 14 32 15

^ Adding a screenshot from the IOI team's work [here](https://investinopen.org/about/) for reference, in terms of definitions. **Perhaps a fireside chat could be a first step in hosting an initial conversation about this topic, more casually and in a publicly-engaged way?**

Note: In hosting an event like this, it'd also be good to think through how this does/does not overlap with conversations already had at Code for Science & Society "Building Laterally" events, the IOI Community Calls/Events, Frictionless community calls, or other spaces.

Suggested topics from above (Am I synthesizing this okay? What is missing?):

  • Cloud computing vs infrastructure-as-a-service with under-the-desk hardware
  • Risks of bringing your own device to reproducibility
  • Challenges there are to open infrastructure (at institutional/interpersonal/technical/legal)
  • Tips about negotiating and partnering with the IT departments of institute or institution
  • How to properly evaluate and document the infrastructure of a scientific effort.
  • Decision-making framework for implementation (vs prescriptivist solution)

Given that the audience of this chapter would be researchers looking for best practices and guidance in their own work, it's also worth thinking through who the audience of this chapter would be: primarily researchers (at various stages of their careers) and other people who work with data. What are their priorities? What would they need help on?

@JimCircadian
Copy link

@aleesteele here's my two pence worth to follow on from your break down response. Sorry it's a bit lengthy and a bit of a brain dump! There's a lot of food for thought, so don't see these as any more than some personal observations.

My summary feeling is that starting a draft section (as @loleg has in their list) with some outline/headings/bullets would be quite beneficial whilst engaging people for review as we draft sections up. My suspicion is that we could overexert the initial discussion without producing material, and it sounds like there are plenty of people who could contribute/review/edit a draft rather easily. More than happy to start that process under a PR. It will also mean that the issue doesn't get out of control, which may inadvertently put people off contributing to the draft/discussion generally.

More than happy to raise that PR with a new section and link it to this issue.

The points you raise are very important, so here's some specific thoughts as feedback:

  • I think a shared definition of "open infrastructure" is a good idea and would be a brilliant first segment to draft, with the IOI team definition being a good starting point. I don't think it serves a particular benefit to be any more specific as the coupling of the intended audience (researchers) with the subject matter (open infrastructure) infers the direction of the section and one might (erroneously) argue that "open research infrastructure" and "open infrastructure" should not be treated differently without specific reason to do so (plus, the purpose of infrastructure doesn't necessarily change the implementation or guidance offered...)
  • "Cloud computing vs infrastructure-as-a-service with under-the-desk hardware": I would rephrase this as "types of open infrastructure". This could apply to hybrid, on-prem and cloud as you suggest, but we may also differentiate between types of infrastructure by purpose as much as situation (storage, compute, remote sensing, data, support...) Defining the types in terms of usage/scale/intention might be better, but this would be great to refine as a group on some drafts.
  • "Risks of bringing your own device to reproducibility" - this is great: why do we care, and specifically why do we care that infrastructure is open!? (That's rhetorical!!! 😆 ) Maybe replace "device" with "infrastructure", but we have to justify the relative pros/cons to developing, maintaining and supporting infrastructure of all kinds, so this is definitely a good shout to my mind...
  • "Challenges there are to open infrastructure (at institutional/interpersonal/technical/legal)" - awesome. I feel the effort required for infrastructure is often overlooked, especially when things move from research to production! This is a great place to situate these kind of things and follows on nicely from the previous topic.
  • "Tips about negotiating and partnering with the IT departments of institute or institution" - can maybe be more easily defined as "Considerations for developing infrastructure". This can then wrap up the "why are we doing this" as well as incorporate other guidance of how to plan an implementation.
  • "How to properly evaluate and document the infrastructure of a scientific effort." - awesome.
  • "Decision-making framework for implementation (vs prescriptivist solution)" - awesome. 😄

I would almost trim these down into (thinking of the nav panel on the right hand side, but appreciate you might not have been thinking of actual headers when you wrote those 👍🏼 ):

  • Types of infrastructure
  • Why make infrastructure open?
  • Challenges
  • Establishing relationships
  • Planning your project
  • Implementation
  • Support

I've split the last topic to implementation and support as the latter, in my opinion, is often the most overlooked aspect of implementing infrastructure and really sets the tone for how any infrastructure will be received by those either those responsible for it, or who might inadvertently become responsible for it! 😉

Regarding the last paragraph in your comment, I believe this might be best aimed at those leading any kind of project implementation. I think we discussed this a bit at CW22 to aim at the PI-type level, but I do work with researchers who would benefit so general guidance for any researcher thinking of deploying infrastructure makes sense. Again, this should come out in the wash with reviewing/editing the chapter and ultimately fit in line with the rest of the book!

Hopefully this is helpful feedback. As I say, I can in the next week or so draft a quick version of this unless anyone else is keen, then we can get some proper discussion on the go..!

@Arielle-Bennett
Copy link
Collaborator

Arielle-Bennett commented May 19, 2022

Going to drop this report in here as a resource/reference: Unpacking concepts & definitions – digital public Infrastructure, building blocks, and their relation to digital public goods - Digital Public Goods Alliance

@JimCircadian JimCircadian linked a pull request May 20, 2022 that will close this issue
19 tasks
@JimCircadian
Copy link

Following some discussion in the collaboration cafe (thank you @sgibson91 @Arielle-Bennett Zeynep and CC'ing @flordandrea), there's definitely a lot to think about but (if I'm not mistaken) a fair recognition that it's about starting to get some content. Ran through at a high level the headings and commentary coming out of the issue. Generated some really poignant discussion points that can guide drafting of content.

  • Implementing or choosing: will you build or pick (this could be a lead in to the chapter), do we need to distinguish between the two
  • Ensuring need to ensure we remember that infrastructure could include humans
  • Ensure that scope is defined for terms in relation to content that follows, and exclusions that might be implied as such
  • Ensuring trustworthiness, transparency and ethics are incorporated into any descriptions of infrastructure or elements of it and links to the relevant sections are in place
  • Ensure that infrastructure will meet the FAIR principles.

Types of infrastructure:

  • Considering software and hardware might not be entirely distinguishable terms, infrastructure might incorporate function, form or organisational substructures involving many kinds of entities
  • Human infrastructure - ref. research infrastructure roles. Ensure we remember that humans involved or integral to infrastructure are humans.
  • Acknowledge big term, all items will be defined when discussing types.

We could be multi-paging this.

Steps forward

  • James will take a stab at one-pagering the headings to
  • Disseminate to collaborators for review and dissolve these headings into further pages as appropriate
  • Assign/collaborate ownership of sections to people

This all seems to largely fit with the original plan proposed but we've some awesome guidance here to get something on "paper" quickly and set the discussion going...

@flor14
Copy link
Collaborator

flor14 commented Jun 4, 2022

Hi @JimCircadian! Thank you very much for adding me here. My real GitHub account is @flor14 (I created @flordandrea to teach GitHub in one course last week 😆 ).
I've been reading all the comments. Starting by thinking about a possible division of sections seems a good idea. Let me know if I can help with something before the next meeting.

@aleesteele aleesteele added the idea-for-discussion This can be used for inviting discussion from collaborators or community in general label Jun 24, 2022
@Arielle-Bennett Arielle-Bennett linked a pull request Jul 27, 2022 that will close this issue
19 tasks
@Arielle-Bennett
Copy link
Collaborator

Hi all, it's been a while since we chatted about this so I wanted to see if folks would be interested in joining the Collab Cafe on 17 August to get down to some co-working?

@JimCircadian
Copy link

Hi @Arielle-Bennett sounds like a plan, I'll make sure I make some progress before then on the headings. I've been crazy busy the last month or so, but this will make a nice deadline to have some progress ready for. 😃

@Arielle-Bennett
Copy link
Collaborator

Oh no we totally dropped this 😩 @JimCircadian @flor14 & others are we still interested in getting this down? Would this make sense to aim for at the Nov Book Dash?

Details here for anyone who is unfamiliar: https://docs.google.com/forms/d/1t_yau8Grr9iKLVf1E9bPTHcStabqZIMnwOkofvXNB8U/prefill

@JimCircadian
Copy link

Yes @Arielle-Bennett it's still on my list of things to start doing. If there's a book dash in November keen to attend to ensure some progress is made if I don't get any more done before it! 😉 Will request access to that doc.

@aleesteele
Copy link
Member

Hi @JimCircadian and @loleg - I know it's been quite a while, but I wanted to ask you both if you would be interested in joining our upcoming Book Dash in June 2024 to work on this new chapter! Even if it's just a starting point, having a chapter related to open infrastructure would be a great way to get other folks involved in the topic.

I know applications for first-timers can seem daunting! So if you'd like to join us at the upcoming Collaboration Cafe, we'll be having a Book Dash Q&A on Wednesday, 17 April if you have any questions about the application process and/or the week-long event: https://annuel2.framapad.org/p/ttw-collaboration-cafe.

Applications close in just a few weeks (26 April), we would absolutely love to see you there to work on these materials together! Feel free to reach out on here or on Slack with any questions you might have 😄

Here's the application form: https://docs.google.com/forms/d/e/1FAIpQLSdd7Zy6YUxPRpTmvd3yrtE9w7JCb9tA20NVQ-PmtGPsaRsqww/viewform

There's more information linked on these threads:

@JimCircadian
Copy link

Hi @JimCircadian ....

Quite keen but as yet unsure about my availability in this week! I did do some work on drafting this but realised that it needs an event like a book dash to make the initial inroads, as there's a lot of opinion about what this should cover. Happy to try and get involved 👍🏼

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
idea-for-discussion This can be used for inviting discussion from collaborators or community in general
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants