In simple terms, how does a BitTorrent client initially discover peers using DHT?

Summary

How can a new client join a swarm without either a tracker or the knowledge of at least one member of the swarm to exchange peers with?

You can't. It is impossible.*

* (Unless a node on your local area network happens to already be a node in the DHT. In this case, you could use a broadcasting mechanism, such as Avahi, to "discover" this peer, and bootstrap from them. But how did they bootstrap themselves? Eventually, you'll hit a situation where you need to connect to the public Internet. And the public Internet is unicast-only, not multicast, so you're stuck with using pre-determined lists of peers.)


References

Bittorrent DHT is implemented via a protocol known as Kademlia, which is a special case of theoretical concept of a Distributed hash table.


Exposition

With the Kademlia protocol, when you join the network, you go through a bootstrapping procedure, which absolutely requires that you know, in advance, the IP address and port of at least one node already participating in the DHT network. The tracker that you connect to, for instance, may be itself a DHT node. Once you are connected to one DHT node, you then proceed to download information from the DHT, which provides you connectivity information for more nodes, and you then navigate that "graph" structure to obtain connections to more and more nodes, who can provide both connectivity to other nodes, and payload data (chunks of the download).

I think your actual question in bold -- that of how to join a Kademlia DHT network without knowing any other members -- is based on a false assumption.

The simple answer to your question in bold is, you don't. If you do not know ANY information at all about even one host which might contain DHT metadata, you are stuck -- you can't even get started. I mean, sure, you could brute force attempt to discover an IP on the public internet with an open port that happens to broadcast DHT information. But more likely, your BT client is hard-coded to some specific static IP or DNS which resolves to a stable DHT node, which just provides the DHT metadata.

Basically, the DHT is only as decentralized as the joining mechanism, and because the joining mechanism is fairly brittle (there's no way to "broadcast" over the entire Internet! so you have to unicast to an individual pre-assigned host to get the DHT data), Kademlia DHT isn't really decentralized. Not in the strictest sense of the word.

Imagine this scenario: Someone who wants P2P to stop goes out and prepares an attack on all commonly used stable DHT nodes which are used for bootstrapping. Once they've staged their attack, they spring it on all nodes all at once. Wham; every single bootstrapping DHT node is down all in one fell swoop. Now what? You're stuck with connecting to centralized trackers to download traditional lists of peers from those. Well, if they attack the trackers too, then you're really, really up a creek. In other words, Kademlia and the entire BT network is constrained by the limitations of the Internet itself, in that, there is a finite (and relatively small) number of computers that you would have to successfully attack or take offline to prevent >90% of users from connecting to the network.

Once the "pseudo-centralized" bootstrapping nodes are all gone, the interior nodes of the DHT, which are not bootstrapping because nobody on the outside of the DHT knows about the interior nodes, are useless; they can't bring new nodes into the DHT. So, as each interior node disconnects from the DHT over time, either due to people shutting down their computers, rebooting for updates, etc., the network would collapse.

Of course, to get around this, someone could deploy a patched BitTorrent client with a new list of pre-determined stable DHT nodes or DNS addresses, and loudly advertise to the P2P community to use this new list instead. But this would become a "whack-a-mole" situation where the aggressor (the node-eater) would progressively download these lists themselves, and target the brave new bootstrapping nodes, then take them offline, too.


Short answer: It gets it from the .torrent file.

When a BitTorrent client generates a trackerless .torrent file (that is, when someone is getting ready to share something new via BitTorrent), it adds a "nodes" key (key as in "key/value pair"; like a section header, not a crypto key) to the .torrent file that contains the K closest DHT nodes known to that client.

http://www.bittorrent.org/beps/bep%5F0005.html#torrent-file-extensions

A trackerless torrent dictionary does not have an "announce" key. Instead, a trackerless torrent has a "nodes" key. This key should be set to the K closest nodes in the torrent generating client's routing table. Alternatively, the key could be set to a known good node such as one operated by the person generating the torrent. Please do not automatically add "router.bittorrent.com" to torrent files or automatically add this node to clients routing tables.

So when you feed your BitTorrent client the .torrent file of a trackerless torrent that you want to download, it uses the value of that "nodes" key from the .torrent file to find its first few DHT nodes.

Tags:

Bittorrent