Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: go-libp2p v0.21 (rcmgr auto scaling) #9074

Merged
merged 27 commits into from
Aug 16, 2022
Merged

feat: go-libp2p v0.21 (rcmgr auto scaling) #9074

merged 27 commits into from
Aug 16, 2022

Conversation

marten-seemann
Copy link
Member

@marten-seemann marten-seemann commented Jul 2, 2022

Updates to go-libp2p v0.21.
Updates the rcmgr.
Enables better metrics around the rcmgr.

Before merge:

@marten-seemann marten-seemann requested a review from MarcoPolo July 2, 2022 13:04
})
return result, err

return result, mgr.ViewTransient(func(s network.ResourceScope) error { return getLimit(s) })
case strings.HasPrefix(scope, config.ResourceMgrServiceScopePrefix):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have to worry about spans here?

core/node/libp2p/rcmgr.go Outdated Show resolved Hide resolved
core/node/libp2p/rcmgr_defaults.go Show resolved Hide resolved
defaultLimits.SystemBaseLimit.Conns = defaultLimits.SystemBaseLimit.ConnsOutbound + defaultLimits.SystemBaseLimit.ConnsInbound

return defaultLimits
// if cfg.ConnMgr.Type == "basic" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was weird. I would expect the resource manager be the ultimate arbiter, not connmgr.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need something analogous though. The connection manager limit needs to be in sync with resource manager limit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If these two are out of sync which should be the source of truth?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably the rcmgr, just because it's configuration is so much more complex. If I remember correctly, there were plans to deprecate the connmgr watermark configuration options in go-ipfs.

libp2p.SetDefaultServiceLimits(testLimiter)
l := rcmgr.DefaultLimits
libp2p.SetDefaultServiceLimits(&l)
limits := l.AutoScale()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: AutoScale dedicates 1/8 of the system memory to libp2p (which effects how many conns / streams / memory etc.) will be allowed. This might be too little for a standalone go-ipfs node, and too much for IPFS Desktop / Brave.

@lidel Do you have any "profile" setting that we can use here?

Copy link
Member

@lidel lidel Aug 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do have a server profile which is used for adjusting specific config values as one-time change against Kubo config.

I think it is ok to go with AutoScale and revisit profile-specific defaults when in separate PR, as part of work towards enabling ResourceMgr by default (tracked in #8761)

@MarcoPolo
Copy link
Contributor

@MarcoPolo
Copy link
Contributor

There are two failures related to the new base256 emoji multibase: https://app.circleci.com/pipelines/github/ipfs/go-ipfs/7123/workflows/c0a0abf6-7f82-4965-8004-1413388e36fa/jobs/77371?invite=true#step-111-13053

I'm going to assign these bugs to @Jorropo since he added that multibase and the fix isn't super trivial since it involves updating some assumptions we've made in the sharness tests.

@MarcoPolo
Copy link
Contributor

rebased on v0.14.0 After tests pass I'll ask bifrost to deploy this on bank4 for smoke testing

@MarcoPolo
Copy link
Contributor

We should merge libp2p/go-libp2p-kad-dht#784 and make a new release first

@MarcoPolo MarcoPolo marked this pull request as ready for review August 11, 2022 20:23
@MarcoPolo MarcoPolo requested a review from aschmahmann August 11, 2022 20:24
@MarcoPolo
Copy link
Contributor

I randomly picked @aschmahmann for a reviewer here, but feel free to reassign.

@BigLep BigLep mentioned this pull request Aug 12, 2022
72 tasks
@marten-seemann marten-seemann requested a review from lidel as a code owner August 12, 2022 16:46
@MarcoPolo MarcoPolo changed the title WIP update go-libp2p to v0.21, use rcmgr auto scaling Update go-libp2p to v0.21, use rcmgr auto scaling Aug 12, 2022
@MarcoPolo MarcoPolo self-assigned this Aug 12, 2022
@MarcoPolo
Copy link
Contributor

@lidel This is ready for a review, thanks!

@lidel lidel self-assigned this Aug 16, 2022
@lidel lidel changed the title Update go-libp2p to v0.21, use rcmgr auto scaling feat: go-libp2p v0.21 (rcmgr auto scaling) Aug 16, 2022
Copy link
Member

@lidel lidel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @MarcoPolo, I believe this is ok to be included in 0.15-rc1

ResourceMgr.Enabled defaults to false, which makes rcmgr-related changes in this PR low risk.
Work towards enabling it by default we will continue #8761

fysa small changes I've made:

  • docs for Swarm.ResourceMgr.Allowlist
  • regression test so we can detect when rcmgr metrics disappear
  • backed out goleveldb bump (rationale below)

go.mod Outdated
github.com/syndtr/goleveldb v1.0.0
github.com/prometheus/common v0.35.0 // indirect
github.com/stretchr/testify v1.8.0
github.com/syndtr/goleveldb v1.0.1-0.20210819022825-2ae1ddf74ef7
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️
I am unsure why this was bumped:

Since we have no time to deal with this,
I've reverted to v1.0.0 in a42848a just to play this safe.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the flag, I’m also unsure why this was bumped

@lidel lidel enabled auto-merge August 16, 2022 23:34
@lidel lidel merged commit 00f2a64 into master Aug 16, 2022
@lidel lidel deleted the rcgmr-auto-scale branch August 16, 2022 23:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Archived in project
Development

Successfully merging this pull request may close these issues.

3 participants