Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reorient toward a general purpose DHT foundation library #84

Open
iand opened this issue Aug 9, 2023 · 8 comments
Open

Reorient toward a general purpose DHT foundation library #84

iand opened this issue Aug 9, 2023 · 8 comments
Labels
kind/proposal Proposals for changes in design, code style or direction

Comments

@iand
Copy link
Contributor

iand commented Aug 9, 2023

This issue arises from some recent discussions with Juan about the scope and direction of go-kademlia and go-libp2p-kad-dht.

Currently, our focus is on improving our Kademlia implementation in order to implement protocol enhancements. But there's value in creating a space for experimenting with different DHT concepts. Focusing on a platform that allows experimentation with various DHT properties and algorithms might better serve us in the long run. Our goal should be to address today's needs while also preparing for future challenges.

Our discussions often revolve around the technicalities of implementation but many in our community are keen on seeing real-world benefits. They're interested in mixing different membership styles, the idea of multi-level DHTs, and the ability to easily introduce varied methods and record types. As we think about tweaks to the IPFS DHT, it's crucial to align with community feedback. Their input should guide our path forward.

Instead of honing in on just one DHT improvement, we should consider building a foundation for broader DHT experimentation, allowing us to test, iterate, and optimise a variety of DHT structures and algorithms. A versatile DHT platform needs a modular design which should clearly define and allow modifications to key attributes such as distance metrics, routing algorithms, and communication protocols, enabling integration of new features or changes.

However, Kademlia still remains central to IPFS for the near term. We need to dedicate resources to its refinement, ensuring that its implementation is as robust and efficient as possible, while we determine how best to make room for other designs or their features in a new DHT foundation.

@iand iand added the kind/proposal Proposals for changes in design, code style or direction label Aug 9, 2023
@github-project-automation github-project-automation bot moved this to Unplanned in DHT Optimization Aug 9, 2023
@guillaumemichel
Copy link
Contributor

I totally agree that this repo should reorient toward a general tooling library helping building any kind of DHT.

After thinking about it, I think that we aren't far from it.

If someone wanted to implement Chord, they would just have to build the Chord routing table, and query mechanism. They could already use the other components (e.g scheduler, endpoint, key, sim etc.). If someone wants to build a DSHT, R5N or custom kademlia-geometry based DHT, it would be the same process: build the appropriate routing table and query mechanism, all the other components can be reused.

Most other DHT geometries, such as the ones in use in Chord, Pastry, Tapestry etc. also use binary keys as identifiers. Some newer DHT use exotic geometries, that we couldn't implement at the moment using this repo. We could make the DHT identifier Id (currently Key type) even more generic (basically any), and Key would implement Id. This would allow to support basically every geometry.

Maybe we can add a package to compose DHTs, but it may not be required as combining routing table memberships should happen in the routing table implementation.

I think the first important move we need to take, is to rename the repo to remove Kademlia from it, because we can build any kind of DHT, and the current name is misleading. After the repo is renamed, we need to make sure that the routing table and query implementations that we already have contain a kad in their package name, to indicate that they are specific to kademlia.

Potential names for the repo could be:

  • (go-)DHTools
  • (go-)libdht
  • (go-)dht-forge
  • (go-)dht-crafter
  • (go-)dht-kit
  • (go-)dht-builder

@yiannisbot
Copy link
Member

yiannisbot commented Aug 22, 2023

The ProbeLab team had a meeting today to discuss the outcome of the discussion with @jbenet and identify next steps.

The TL;DR is that we think we're not far away at all from the vision that @jbenet laid out (as @guillaumemichel points out above), although the devil is in the details. It also seems that the refactoring work (as well as the Reprovide Sweep) are not affected by the discussion, so this line of work continues as scheduled.

We gathered the following Action Items (in this order):

  • ProbeLab team gathers the main points and questions from discussion with Juan and writes down some answers.
    • ETA: 2023-08-25
  • @BigLep schedules an additional call with Juan (ideally for 1hr), where all of ProbeLab can be present, so that we can discuss on the main points.
    • The outcome of the meeting should be aligned expectations between ProbeLab, Steve and Juan on what we're building, what we're dropping and rough timelines.
    • ETA: 2023-08-31 (@BigLep)
  • Identify what the blogpost should look like. At this point we’re not sure what the storyline should be. One idea is to focus the blogpost on the refactoring work only, without talking about the composable DHT. But if it turns out we're aligned after the extra discussion, then we could go with what we have.
    • @BigLep let us know thoughts
    • ETA: TBD (after expectation alignment with Juan and Steve)
  • We need to identify what the extra effort is to make go-kademlia into something more generic, as per Juan's request and @guillaumemichel's note above. For instance, in order to check if other routing algorithms (e.g., Chord) are compatible, we’d need to implement and test at least one of them, so that we know it will work for others. This is a sizeable extra item, which could add several weeks not accounted for right now.
    • ETA: TBD (after expectation alignment with Juan and Steve)

@guillaumemichel
Copy link
Contributor

Thinking about it again.

The Routing Table interface is generic enough to be used with any geometry. We would just need to replace the Key dependency with a generic Point dependency (allowing other formats than just bitstring). Also the generic Query mechanism only using the RoutingTable interface works for all geometries and routing table implementations.

https://github.com/plprobelab/go-kademlia/blob/44238be8a1f611e0415e7b4c03cc04f5e833cb07/kad/kad.go#L41-L71

  • Self will always return the Point representing the location of the current node inside the geometry.
  • AddNode is required for one need to be able to add nodes to a routing table. The routing table implementation then derives the associated Point from the provided Node.
  • RemovePoint / RemoveNode is also required, because whatever geometry is being use, one need to be able to remove peers from a routing table. Nodes are indexed by their associated Point in the routing table, hence providing the Node isn't required, providing the point is enough to find and remove a Node from the routing table. RemoveNode is also generic, and requires the routing table to compute the associated Point given the provided Node.
  • NearestNodes is the most interesting, and geometry depending method. However, it can be used by all DHT implementations, because a key requirement of a DHT is to use a geometry along with a distance definition within this geometry. Without distance metric, it isn't possible to converge efficiently towards the target Point. Hence, whatever the geometry, it will always be possible to sort Points from the closest to the furthest from a reference Point in any geometry (taking the minimal distance between two Points for instance. Note that it is possible that many Points are equidistant to a reference Point. So the NearestNodes method is always required, and will always be usable by a generic Query/Lookup mechanism, that is querying the _Nearest known peers, and then the Nearest nodes returned by queried remote nodes until it converges to the desired Point.

So it is possible to have a generic query implementation, only making use of the methods defined by the RoutingTable interface, and converging any Point within the geometry used by any custom routing table implementation. Note however that the generic query may not be the most optimized query mechanism to converge to a Point in a specific geometry. A Routing Table based on a custom geometry can expose additional methods, that could be used by a custom Query mechanism optimized for the routing table.

This generic RoutingTable interface doesn't prevent routing table implementers from implementing specific methods optimized for their use.

@BigLep
Copy link

BigLep commented Aug 24, 2023

I'm just ACKing that I saw the pings here and have read the above and the content being generated in https://www.notion.so/pl-strflt/2023-08-07-IPFS-Tech-Sync-With-Juan-4061193911544b9fa864a663f8f1ee84?pvs=4#4bda867c13074a029a39150b6539757a so far.

I'll respond more tomorrow with a fresh head, but a few quick things:

  1. We have 30 minutes with Juan on 2023-08-31. I'll see if we can expand it.
  2. Agreed we should rename the package if it's not kademlia specific. Is there a frontrunner new name? I like go-libdht or go-dht-forge. But yeah, lets get a new name to avoid confusion.
  3. I don't want to preclude future options, but I also don't want to get caught doing too much abstraction. I don't know where to draw the line. The focus of doing the refactoring so we can have really solid test coverage for the reprovide sweep work seems reasonable.
  4. My guess is that scoping the public comms down to committing us to the refactor work and "amino" naming, while hinting at things we'd like to do down the line like composable DHT and reader privacy makes sense. I think to move things forward @yiannisbot , you could put together an outline for what you want to cover with Juan and bonus (but lower priority) an updated blog entry. That could be something quick to review that you and I could align on.
  5. Generally statement to hold me accountable: I want to make sure I'm not getting into too much of the weeds here not to become a blocker because I know I won't be able to devote sustained attention here right now in this season.

Thanks a lot for all your work here guys!

@BigLep
Copy link

BigLep commented Aug 24, 2023

With fresh head, I still agree with my comments in #84 (comment)

One other thing:

We need to identify what the extra effort is to make go-kademlia into something more generic, as per Juan's request and @guillaumemichel's note above. For instance, in order to check if other routing algorithms (e.g., Chord) are compatible, we’d need to implement and test at least one of them, so that we know it will work for others. This is a sizeable extra item, which could add several weeks not accounted for right now.

I generally want to avoid weeks of extra work right now that prevent us from providing user value (like the reprovide sweep).

Are the options:

  1. Don't be as generic (and thus not incur extra time)
  2. Be generic, but the team only feels good about being generic if they implement a second use case (which incurs weeks of work)?
    I assume we don't feel comfortable being generic but not validating it with a second use case because the worry is that we are adding unnecessary abstraction. Is that right? That is understandable.
    If we don't go generic now, do we feel like that is something we can come do later easily enough?

@yiannisbot
Copy link
Member

I agree we have to draw the line carefully and not get into too much abstraction, especially given the imminent org restructuring. We should avoid committing to a very generic design and development as this will be a long-term commitment.

  • I'll have the discussion doc ready by Monday 2023-08-28
  • I'll have the public comms doc restructured by Tuesday 2023-08-29
  • @guillaumemichel prepared a draft doc for Composable DHT in this Notion doc. It's not final yet, but he's continuing to work on it and there will be an updated version by Wednesday 2023-08-30.

Generally statement to hold me accountable: I want to make sure I'm not getting into too much of the weeds here not to become a blocker because I know I won't be able to devote sustained attention here right now in this season.

@BigLep I reckon you be present when we make a decision on how we carry forward, as this will influence quite a few things down the line.

@yiannisbot
Copy link
Member

I'll think more about it and put in the doc that I'll prepare for the discussion, but some first thoughts to address the tradeoff between realism and usefulness to the ecosystem is to:

  • Finalise Refactoring of the existing codebase now, so that we and the ecosystem has a cleaner ground to work on.
  • Finalise the spec and implement Reprovide Sweep, so that we can start seeing return on our investment (bigger content providers start using the DHT).
  • Finalise Composable DHT doc and spec and market it to the ecosystem to gauge interest.
  • Proceed with development when we have assurance that interest is there and perhaps collaborative or funded development by other teams.

@BigLep
Copy link

BigLep commented Aug 25, 2023

Thanks @yiannisbot and @guillaumemichel for the updates and docs. Great work!

@BigLep I reckon you be present when we make a decision on how we carry forward, as this will influence quite a few things down the line.

Yes, I'll be present for decision making and make sure I'm aware of the decision and the implications.

The timeline and suggested actions makes sense to me. For Reprovide Sweep that we are anchoring our refactor work to and proposing to complete to get ROI, it would be great if we can also provide user anecdotes are will readily adopt this feature when it's available and ideally are excited to see it be released.

Good stuff - thanks all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/proposal Proposals for changes in design, code style or direction
Projects
Status: Unplanned
Development

No branches or pull requests

4 participants