-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rfm 17.1 - Sharing Provider Records with Multiaddress #22
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work! Some typos and a couple of clarification comments.
RFMs.md
Outdated
#### Measurement Plan | ||
|
||
- Spin up a node that generates random CIDs and publishes provider records. | ||
- Periodically attempt to fetch the PR from the DHT, tracking whether they are retrievable and whether they are shared among the multiaddresses. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do we mean "shared among the multiaddresses"? Whether the PR can be found in the multiaddress of the original node that stored the record?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a third "Measurement Plan", how about just getting a bunch of PeerIDs and their multiaddresses and pinging them over time to see whether they listen to that multiaddress? The same as we do for PRs, but now for peer records.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do we mean "shared among the multiaddresses"? Whether the PR can be found in the multiaddress of the original node that stored the record?
In the networking layer, when we ask for the PR of a CID, we just get as a reply an AddrInfo
of each provider that the remote peer is aware of. So the PR as we understand it, it's just how we store it in the DHT.
This AddrInfo
contains two fields: PeerID
and Multiaddresses
, and it will only fill up the Multiaddresses
if their TTL are still valid.
how about just getting a bunch of PeerIDs and their multiaddresses and pinging them over time to see whether they listen to that multiaddress?
That can be a nice side experiment, yes. Although I think that we are indirectly doing it. In the hoarder, I keep the AddrInfo
of each PR Holder with the first Multiaddresses
that I got from the publication, and I only use those addresses to establish the connections. So if they were changing IPs, I wouldn't be able to connect to them.
Let me know anyways if you want me to set up a specific test for the IP rotation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re: "shared among the multiaddresses": ok, so IIUC, we either mean if the PR is available in the multiaddresses of all the content providers, or if all the multiaddresses of all content providers are included in the PR. :) Is it any of these two?
Re: IP Rotation: that's great! But for this experiment we're keeping the connection open for 30mins to check the TTL, right? Can we run an experiment where we keep those connections open for a time period equal to the Expiry Interval? It would be 24hrs according to the current setting and 48hrs according to our proposal. Ideally, we'd also need to do that for a large number of peers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we either mean if the PR is available in the multiaddresses of all the content providers, or if all the multiaddresses of all content providers are included in the PR. :) Is it any of these two?
We are in the second one, we only get an AddrInfo
for those providers that the remote peer is aware of, and it depends on the TTL of the Multiaddress
to include them or not in the AddrInfo
of the provider. Should I say it in the opposite way "the multiaddress is shared among the PRs"?
Let me point you to the code; maybe is easier to understand it:
- here is the inner method during the
dht.FindProviders()
method. - here is the networking method to ask for the PRs to a remote peer.
Can we run an experiment where we keep those connections open for a time period equal to the Expiry Interval?
Absolutely! I can make a new run with 10k-20k CIDs over 60h if that is enough
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rephrasing the first point to make it clearer. Regarding the extra experiment: that would be fantastic, yes!
|
||
Results are similar when we analyze the replies of the peers that report back the PR from the DHT lookup process. We increased the number of content providers we were looking for to track the multiple remote peers. Figure [3] represents the number of remote peers reporting the PR for the CIDs we were looking for, where we can see a stable 20 peers by median over the entire study. | ||
|
||
For those wondering why more than 20 peers (k replication value when publishing the CIDs) are reporting the PR, we must remind you that Hydra-Boosters share the PR database among different `PeerID` heads. Which means that if one hydra hears about a PR, all the leaders of that common database will also share it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If that were the case, then we'd see about 2k holders, i.e., approximately the same as the number of Hydra heads in the network. Could it instead be other peers that fetch and reprovide the content?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For those wondering why more than 20 peers (k replication value when publishing the CIDs) are reporting the PR, we must remind you that Hydra-Boosters share the PR database among different `PeerID` heads. Which means that if one hydra hears about a PR, all the leaders of that common database will also share it. | |
For those wondering why more than 20 peers (k replication value when publishing the CIDs) are reporting the PR, we must remind you that Hydra-Boosters share the PR database among different `PeerID` heads. Which means that if one hydra hears about a PR, all the heads of that common database will also share it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the hoarder, I always check if the PRs that are shared back match the PeerID
of the content publisher (which, in this case, is my local host 1, the publisher one). So if someone tries to reprovide or ping the CID, it wouldn't affect these results.
About the hydras, I'm not aware of how many hydra "bellies" are out there. Is there a single big one or multiple small ones? Also, we have to keep in mind that the DHT lookup converges into a region of the SHA256 hash space, so it's quite unlucky that we will get connections and replies from hydras that are in the opposite part of the hash space.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a single big one or multiple small ones?
Yup, there is a single one shared among all of them.
@cortze I've just done a thorough review of this - great work! My main worry is that the claim of: "if a The main argument in order to increase the multiaddress TTL to the PR expiry interval would be to show that the multiaddress of the PR holder doesn't usually change. It would be great to have some experiments along the lines of the comment I inserted above: #22 (comment) I'd love to hear your thoughts on this. Basically, similar to the CID Hoarder, what we need here is a PeerID Hoarder :-D This tool would get a lot of PeerIDs, record the multiaddress by which we first saw the peer and then periodically ping the peer to figure out if it changed its Multiaddress within the PR Expiry Interval. I'm not sure if this functionality can easily be included in Nebula @dennis-tra ? This is what would give us a solid justification to argue for the extension of the TTL. Other thoughts? |
Typos and rephrasings Co-authored-by: Yiannis Psaras <[email protected]>
Thanks for the feedback @yiannisbot , I really appreciate it!
I will try to make it a bit more explicit in the conclusion (my bad). It's not an "it won't hold" statement. It is an "It won't have as much impact as we are expecting" statement. As far as your network has different TTL values for Multiaddresses (like in the current network), the smallest TTL will be the one limiting negatively the final result of the DHT lookup process (at least the
I left you a comment as well in the #22 comment I'll iter again over your comments and suggestions, will ping you back whenever I make a commit! |
Sorry for the late reply! The information is already recorded by Nebula and would just need to be analyzed :)
Just ping here or in Discord and I'll also have a proper read. I just skimmed it in the past 🙈 |
I already added some explanations and most of the changes that @yiannisbot suggested. I set up another Hoarder run with 20k CIDs for 60 hours, so the plots and some numbers might change. If you can go through and give me some thoughts @dennis-tra , I would appreciate your feedback as well 😄 |
RFMs.md
Outdated
#### Measurement Plan | ||
|
||
- Spin up a node that generates random CIDs and publishes provider records. | ||
- Periodically attempt to fetch the PR from the DHT, tracking whether they are retrievable and whether they are shared among the multiaddresses. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rephrasing the first point to make it clearer. Regarding the extra experiment: that would be fantastic, yes!
|
||
Results are similar when we analyze the replies of the peers that report back the PR from the DHT lookup process. We increased the number of content providers we were looking for to track the multiple remote peers. Figure [3] represents the number of remote peers reporting the PR for the CIDs we were looking for, where we can see a stable 20 peers by median over the entire study. | ||
|
||
For those wondering why more than 20 peers (k replication value when publishing the CIDs) are reporting the PR, we must remind you that Hydra-Boosters share the PR database among different `PeerID` heads. Which means that if one hydra hears about a PR, all the leaders of that common database will also share it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a single big one or multiple small ones?
Yup, there is a single one shared among all of them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor edit to address one of my previous comments.
Great that the Hoarder contacts the original Multiaddress! That's what we need. So if we run the experiment for long enough and monitor that, then we have what we're looking for. This ^ together with an analysis of logs from Nebula will tell us what is the rate of PR Holders that switch IP addresses over the republish interval. I think with those two, this will be complete and ready for merging. |
Co-authored-by: Yiannis Psaras <[email protected]>
|
||
_Figure 2: Number of PR Holders replying with the `PeerID` + `Multiaddress` combo._ | ||
|
||
### 4.2-Reply of peers reporting the PR during the DHT lookup |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure I understood this correctly: The only difference between 4.1 and 4.2 is that Hydras appears in 4.2 but not 4.1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since hydras are present in the set of PR holders, they appear in both 4.1 and 4.2.
However, since the DHT lookup wasn't stopped after the first retrieval of the PRs, I assume that most of the peers that report the PRs beyond those initial PR Holders are Hydras (for their shared DB of PR).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So what exactly is the perform operation? Is it only a FindProviders
? And it may get more than 20 peers responding with the PR, because some peers on the path to the CID would be Hydra nodes?
As the number of hops in a DHT lookup is usually 3-5, we would expect at MOST 23-25 peers responding with a PR, if all of the peers helping to route the request (NOT PR holders) are Hydra nodes. According to the plot in 4.2 there are regularly much more than this number. How do you explain this?
Or maybe I missed something here ^^
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So what exactly is the perform operation? Is it only a FindProviders?
Yes, it's a modification of the FindProviders()
method that doesn't look in the local Provider DB of the host, and that directly performs the DHT lookup.
And it may get more than 20 peers responding with the PR, because some peers on the path to the CID would be Hydra nodes?
Exactly, that is the explanation that I gave for this phenomenon.
As the number of hops in a DHT lookup is usually 3-5, we would expect at MOST 23-25 peers responding with a PR
Can you give a bit more context on this statement? My understanding from RFM 17 is that we perform between 3 and 6 hops, however, that only determines the depth of the peer tree that is built during the lookup. We are not taking into account that the tree can also grow in width.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you give a bit more context on this statement? My understanding from RFM 17 is that we perform between 3 and 6 hops, however, that only determines the depth of the peer tree that is built during the lookup. We are not taking into account that the tree can also grow in width.
In Figure 3, we see that up to 60 peers respond with the PR during the DHT lookup. There are only 20 PR holders, and 2-5 intermediary DHT server nodes to which we send the request (2-5 as the last hop is a PR holder). How can we get responses from 60 peers?
In the case where we would expect the most answers, we would have the 20 PR holders + 5 intermediary nodes that are all Hydras, which is far from 60. Even if we add the concurrency factor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an interesting point worth digging into, but want to understand a detail:
However, since the DHT lookup wasn't stopped after the first retrieval of the PRs
@cortze how does the operation of the Hoarder differ compared to the vanilla version? When it gets a response with a PR, it doesn't stop and keep looking, but up to which point? And when does it stop?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yiannisbot The FindProviders()
that I use in the hoarder slightly differs from the vanilla operation:
It removes the "Find in the ProvidersStore" operation, forcing it to look for the Providers only using the vanilla DHT lookup, and adds some traces to track when we receive a new Provider.
I've been relaunching the test with a two-minute timeout for the FindProviders
operation, and the results seem to be in the range that @guillaumemichel suggests (keep in mind that the Hydras' DB has been plugged off).
The number of remote peers replying with the PR during the DHT lookup (with a 2-minute timeout) looks like this.
@cortze do we have any results from this experiment? I think with these results and addressing Guillaume's question, this should be ready to be merged, right? |
Co-authored-by: Yiannis Psaras <[email protected]>
@yiannisbot The results of this run were not as good as I expected. To track such a large set of CIDs, I had to increase the concurrency parameters of the hoarder, and as we spotted in our last meeting (link to the Issue describing the bottleneck) the code is not that prepared to support such a high degree of concurrency. However, I think that even with such a low number of CIDs and a lower ping-interval between pings (3 minutes), we can conclude that increasing provider Multiaddress' TTL would improve content fetching times. And the impact would be much higher if we merge it with go-libp2p-kad-dht#802. RFM17 already proved that the IP rotation of PRHolders barely happens: |
@yiannisbot I've updated the document with your suggestions and with two extra paragraphs describing:
I've also updated the figures. The new ones have the DHT lookup limited to 2 mins - which shows a reasonable number of peers that return the PRs as pointed out by @guillaumemichel . The new data still faces a lower number of online PR Holders due to a problem storing the records in a part of the network. However, I consider them more than good enough to conclude that increasing the TTL of the Provider's Multiaddres would avoid the second DHT lookup to map the Let me know what do you think about the update :) Cheers! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work! 👏🏼 I hope the suggested changes land in production soon. Thanks for making the final touches.
This is the first draft of the report that extends RFM17 to measure if the Multiaddresses of a content provider are being shared during the retrieval process of a CID process.
It includes the study's motivation, the methodology we followed, the discussion of the results we got out of the study, and a conclusion.
All kind of feedback is appreciated, so please, go ahead to point out improvements!
Also, should I be running a more extensive set of CIDs for extended periods?
cc: @yiannisbot @guillaumemichel @dennis-tra