The files dht.c and dht.h implement the variant of the Kademlia Distributed Hash Table (DHT) used in the Bittorrent network (``mainline'' variant). The file dht-example.c is a stand-alone program that participates in the DHT. Another example is a patch against Transmission, which you might or might not be able to find somewhere. The code is designed to work well in both event-driven and threaded code. The caller, which is either an event-loop or a dedicated thread, must periodically call the function dht_periodic. In addition, it must call dht_periodic whenever any data has arrived from the network. All functions return -1 in case of failure, in which case errno is set, or a positive value in case of success. Initialisation ************** * dht_init This must be called before using the library. You pass it an integer identifying a bound IPv4 datagram socket in non-blocking mode, an integer identifying a bound IPv6 datagram socket in non-blocking mode, and your node id, a 20-octet array that should be globally unique. The integers that identify the two sockets should usually be file descriptors; however, as the library does not directly perform any socket- or file-related operations on them, they can be arbitrary integers, for example indices in a table of structures that represent sockets in your code. If you're on a multi-homed host, you should bind your sockets to one of your addresses. This is especially relevant for IPv6. Node ids must be well distributed, so you cannot just use your Bittorrent id; you should either generate a truly random value (using plenty of entropy), or at least take the SHA-1 of something. However, it is a good idea to keep the id stable, so you may want to store it in stable storage at client shutdown. * dht_uninit This may be called at the end of the session. Bootstrapping ************* The DHT needs to be taught a small number of contacts to begin functioning. You can hard-wire a small number of stable nodes in your application, but this obviously fails to scale. You may save the list of known good nodes at shutdown, and restore it at restart. You may also grab nodes from torrent files (the nodes field), and you may exchange contacts with other Bittorrent peers using the PORT extension. * dht_ping_node This is the main bootstrapping primitive. You pass it an address at which you believe that a DHT node may be living, and a query will be sent. If a node replies, and if there is space in the routing table, it will be inserted. * dht_insert_node This is a softer bootstrapping method, which doesn't actually send a query -- it only stores the node in the routing table for later use. Note that dht_insert_node requires that you supply a node id. If the id turns out to be wrong, the DHT will eventually recover; still, inserting massive amounts of incorrect information into your routing table is not a good idea. An additionaly difficulty with dht_insert_node is that a Kademlia routing table cannot absorb nodes faster than a certain rate. A freshly initialised routing table is able to absorb 128 nodes of each address family without dropping any. The tolerable rate after that is difficult to estimate: it is probably on the order of one node every few seconds per node already in the table divided by 8, for some suitable value of 8. Doing some work *************** * dht_periodic This function should be called by your main loop periodically, and also whenever data is available on the socket. The time after which dht_periodic should be called if no data is available is returned in the parameter tosleep. (You do not need to be particularly accurate; actually, it is a good idea to be late by a random value.) The parameters buf, buflen, from and fromlen optionally carry a received message. If buflen is 0, then no message was received. Dht_periodic also takes a callback, which will be called whenever something interesting happens (see below). * dht_search This schedules a search for information about the info-hash specified in id; it returns 1 if this is a new search, and 0 if it merely reset the timeouts for a search in progress. If port is not 0, it specifies the TCP port on which the current peer is listening; in that case, when the search is complete it will be announced to the network. The port is in host order, beware if you got it from a struct sockaddr_in. In either case, data is passed to the callback function as soon as it is available, possibly in multiple pieces. The callback function will also be called when the search is complete. Up to DHT_MAX_SEARCHES (1024) searches can be in progress at a given time; any more, and dht_search will return -1. If you specify a new search for the same info hash as a search still in progress, the previous search is combined with the new one -- you will only receive a completion indication once. Information queries ******************* * dht_nodes This returns the number of known good, dubious and cached nodes in our routing table. This can be used to decide whether it's reasonable to start a search; a search is likely to be successful as long as we have a few good nodes; however, in order to avoid overloading your bootstrap nodes, you may want to wait until good is at least 4 and good + doubtful is at least 30 or so. It also includes the number of nodes that recently sent us an unsolicited request; this can be used to determine if the UDP port used for the DHT is firewalled. If you want to display a single figure to the user, you should display good + doubtful, which is the total number of nodes in your routing table. Some clients try to estimate the total number of nodes, but this doesn't make much sense -- since the result is exponential in the number of nodes in the routing table, small variations in the latter cause huge jumps in the former. * dht_get_nodes This retrieves the list of known good nodes, starting with the nodes in our own bucket. It is a good idea to save the list of known good nodes at shutdown, and ping them at startup. * dht_dump_tables * dht_debug These are debugging aids. Functions provided by you ************************* * The callback function The callback function is called with 5 arguments. Closure is simply the value that you passed to dht_periodic. Event is one of DHT_EVENT_VALUES or DHT_EVENT_VALUES6, which indicates that we have new values, or DHT_EVENT_SEARCH_DONE or DHT_EVENT_SEARCH_DONE6, which indicates that a search has completed. In either case, info_hash is set to the info-hash of the search. In the case of DHT_EVENT_VALUES, data is a list of nodes in ``compact'' format -- 6 or 18 bytes per node. Its length in bytes is in data_len. * dht_sendto This will be called whenever the library needs to send a datagram. If the integers passed to dht_init are file descriptors, this can simply be an alias for the sendto system call. * dht_blacklisted This is a function that takes an IP address and returns true if this address should be silently ignored. Do not use this feature unless you really must -- Kademlia supposes transitive reachability. * dht_hash This should compute a reasonably strong cryptographic hash of the passed values. SHA-1 should be good enough. * dht_random_bytes This should fill the supplied buffer with cryptographically strong random bytes. It's called every 30 minutes on average, so it doesn't need to be fast. Final notes *********** * NAT Nothing works well across NATs, but Kademlia is somewhat less impacted than many other protocols. The implementation takes care to distinguish between unidirectional and bidirectional reachability, and NATed nodes will eventually fall out from other nodes' routing tables. While there is no periodic pinging in this implementation, maintaining a full routing table requires slightly more than one packet exchange per minute, even in a completely idle network; this should be sufficient to make most full cone NATs happy. Juliusz Chroboczek <[email protected]>
BitTorrent DHT library
Overview
Comments
-
Improve compilation on Windows
MSVC doesn't provide unistd.h and sys/time.h. Another header, fcntl.h, althought provided is not used during compilation.
MSVC doesn't provide w32api.h header (and WindowsXP macro), but it is safe to assume that 0x0501 will represent Windows XP for the time being. Also, MSVC and MinGW define both WINVER and (newer) _WIN32_WINNT macros (and also NTDDI_VERSION, but that's whole other story), so both should be overridden to support Windows XP.
MSVC and MinGW define their own EAFNOSUPPORT macro, which leads to redefinition warning.
MSVC and MinGW provide declaration (and Windows provides implementation) of inet_ntop function starting with Windows Vista.
MSVC doesn't provide snprintf function, but has _snprintf instead with same signature and purpose.
Tested on Windows 7 with MSVC 2013 and MinGW 4.9.2, defining _WIN32_WINNT and WINVER to 0x0501 and 0x0600.
-
"implied_port" is improperly handled
If a message includes the "implied_port" argument, the message parser finds the "port" portion of the "implied_port" argument key and uses its value as the port, which is always 0 or 1.
See BEP 5 for proper handling of the "implied_port" argument.
-
Use time(3) instead of gettimeofday(2) for `now`
Of timeval value, only
tv_sec
member is being used. Since precision is not needed, and since gettimeofday() function could be missing on non-POSIX systems, it is better to use time() function instead. -
Add CMakeLists to DHT library
Now more and more projects use CMake as their build system. With this file, you can use add_subdirectory function to add DHT library to your project and use it the natural way.
-
dht-example compilation fixes
Compilation fixes for dht-example.c, and a bug fix for the socket address length (on my platform, passing
sizeof(sockaddr_storage)
tosendto
givesEINVAL
) -
BEP-44 support?
Would be nice if the library have support to store arbitrary values, so other decentralized application could integrate with the Bittorrent DHT more easily.
Have found this was asked before (in issue #9 ). But, now there is a BEP related (http://bittorrent.org/beps/bep_0044.html).
I think other libraries and programs (like libtorrent) has implemented it. But i have not found a DHT specific library (in c/c++) with it.
Thanks
-
Storing values
I didn't see a method for storing/publishing values. Is this correct?
How difficult would it be to implement such a function. I am willing to do it if it's not to complicated.
-
Only write debug messages if descriptor is set
fflush() flushes all open descriptors in case NULL is passed as argument. This patch prevents calls to fflush in case debug file descriptor is not initialized.
See also: https://trac.transmissionbt.com/ticket/4531.
-
API improvements around DHT_EVENT_SEARCH_DONE notification
Tracking when a search completes is somewhat difficult at the moment. If the user allocates memory for a search, calls
dht_search()
, and then deallocates whenDHT_EVENT_SEARCH_DONE
arrives, memory can leak. The user can try to keep track of in-progress searches, and also its own timeout values, but this is is extra effort and slightly inaccurate.There are a few small things I think could be improved about the API which should be backwards compatible:
- if
dht_search
finds an existingstruct search
, it could return a different value (2
?) so the caller knows not to expect an additionalDHT_EVENT_SEARCH_DONE
event - when a
struct search
expires, callback withDHT_EVENT_SEARCH_DONE(6)
before freeing
I'd be happy to submit a pull request if these are welcome changes.
- if
-
free(buckets6)
dht.c 1668
if(s >= 0) { buckets = calloc(sizeof(struct bucket), 1); if(buckets == NULL) return -1; buckets->af = AF_INET; rc = set_nonblocking(s, 1); if(rc < 0) goto fail; } if(s6 >= 0) { buckets6 = calloc(sizeof(struct bucket), 1); if(buckets6 == NULL) return -1; buckets6->af = AF_INET6; rc = set_nonblocking(s6, 1); if(rc < 0) goto fail; }
...
fail: free(buckets); buckets = NULL; free(buckets6); buckets6 = NULL; return -1;
what happens if _set_nonblocking(s, 1)_ returned an error before we might _buckets6 = calloc_ (which comes after) _fail:_ free(buckets6) ?
Or maybe the opposite no v4 socket at all and _set_nonblocking(s6, 1)_ returns error _fail:_ free(buckets) ?
maybe I'm not seeing this right...
-
It this a complete implementation for Kademlia DHT?
It's a really good implementation for DHT.
But I am a little confused with the algorithm for finding nodes. I think the most wonderful part of Kademlia DHT is that we can do a recursive lookup, and each step we can make it
half-more-closer
to the target(aha, I have named ithalf-way-iteration-efficent
.). And the most important data struct is how it deals withbuckets
, orroute table
.As the described in Ethereum P2P Node Discovery.
Simple to say, It make use of an
N
size buckets with thei
bucket storing the nodes havei
common prefix asMyId
.But, as codes below, It uses a bucket link and iterate the
3
nearby buckets and pick thek
nearest nodes. I have not figured out how It can make ahalf-way-iteration-efficent
? Or if there something wrong understand to me, please sort me out, will be thankful.b = find_bucket(id, AF_INET); if(b) { numnodes = buffer_closest_nodes(nodes, numnodes, id, b); if(b->next) numnodes = buffer_closest_nodes(nodes, numnodes, id, b->next); b = previous_bucket(b); if(b) numnodes = buffer_closest_nodes(nodes, numnodes, id, b); }
-
find_closest
regards code of buffer_closest_nodes, a trivial way is to scan the bucket and its adjacent buckets and find the 8 closest node , what's your consideration
-
Only accept values from nodes that have not lied about their id
Some malicious nodes in the network cheat by replying to other nodes' queries with an node ID that is close to the querying node ID. They constantly change the ID in their replies so they can appear as "close nodes" in the bucket list of as many nodes as possible, which means their address will often appear in the "nodes" array in find_node and get_peers replies. These nodes also happen to send replies to get_peers with a "values" array containing bogus addresses.
While BEP-42 should prevent nodes from easily changing the first bits of their node IDs, it is currently almost never enforced.
In this change, we try to detect if a node replying to a get_peers query has already been seen with a different node ID for the same search. In this case, we flush the node from the current search and put it in the blacklist.
Since this technique appears to be widespread, we also increase the blacklist size to 32 nodes (which seems sufficient).
-
Implement BEP-42: DHT Security extension
This change implements BEP-42.
- Decode the "ip" field in replies' top-level dictionnary
- Implement a simple voting system for selecting the external IP address that will be used to compute the node id's prefix
- Add a CRC32C implementation to compute the node id's prefix.
- Include an "ip" field when replying to other nodes
Whenever the voting process terminates, we directly update the myid variable with the new prefix, the ID change will automatically be reflected in all new requests.
edit: some cleanups, split the change in multiple independent commits, add the enforcement (disabled by default).
-
Extract and store v-key values from received messages
This updates the message parser to extract and store v-key values (client identifier + version info). Contents are useful e.g. to help analyze incompatibilities with certain clients, e.g. when receiving unparseable message one can check if this is related to a certain client software.
Additions/changes:
- modifies
parse_message()
to extract v-key value from received messages and store in message struct - modifies
new_node()
to include v-key value in parameter list and to store v-key value in node struct for both instant arrival (new node is added after receiving pong message) and late arrival (new node is added after 'Nodes found', but pinged later) - update all calls to
new_node()
to account for new parameter list - modifies
dump_buckets()
to include v-key values
NOTE:
- This is based on my PR #37 to make use of/extend the enhanced debug output features
- If PR #31 gets merged, this has to be updated/migrated as well
Screenshot:
- modifies
-
bdecoder
A few notes:
- fixes several bugs in the old parser
- does not require NULL termination of the packet
- does not use the heap
- recursive, but tracks depth. could abort at a max depth, but even a full packet of
[[[[...
doesn't add much to the stack
an efficient feature complete C++ bittorrent implementation
libtorrent is an open source C++ library implementing the BitTorrent protocol, along with most popular extensions, making it suitable for real world d
Tool for inspecting, creating and editing BitTorrent metafiles.
A commandline tool for creating, inspecting and modifying bittorrent metafiles.
Transmission is a fast, easy, and free BitTorrent client.
Official Transmission BitTorrent client repository
Transmission is a fast, easy, and free BitTorrent client
About Transmission is a fast, easy, and free BitTorrent client. It comes in several flavors: A native Mac OS X GUI application GTK+ and Qt GUI applica
qBittorrent - A BitTorrent client in Qt
qBittorrent - A BitTorrent client in Qt Description: qBittorrent is a bittorrent client programmed in C++ / Qt that uses libtorrent (sometimes called
uTorrent Transport Protocol library
libutp - The uTorrent Transport Protocol library. Copyright (c) 2010 BitTorrent, Inc. uTP is a TCP-like implementation of LEDBAT documented as a BitTo
Arduino library for DHT sensors - integer only
DHTINT Arduino library for DHT sensors - integer only Description This is an experimental integer only library, based upon - https://github.com/RobTil
libTorrent BitTorrent library
LibTorrent Copyright (C) 2005-2014, Jari Sundell LICENSE GNU GPL, see COPYING. "libtorrent/src/utils/sha_fast.{cc,h}" is originally from the Mozil
an efficient feature complete C++ bittorrent implementation
libtorrent is an open source C++ library implementing the BitTorrent protocol, along with most popular extensions, making it suitable for real world d
Tool for inspecting, creating and editing BitTorrent metafiles.
A commandline tool for creating, inspecting and modifying bittorrent metafiles.
A bittorrent plugin for VLC.
vlc-bittorrent (Bittorrent plugin for VLC) What is this? With vlc-bittorrent, you can open a .torrent file or magnet link with VLC and stream any medi
Transmission is a fast, easy, and free BitTorrent client.
Official Transmission BitTorrent client repository
This is a collection of tools for creating and manipulating BitTorrent v2 torrent files
torrent tools This is a collection of tools for creating and manipulating BitTorrent v2 torrent files. torrent-new can create hybrid torrents, but the
Transmission is a fast, easy, and free BitTorrent client
About Transmission is a fast, easy, and free BitTorrent client. It comes in several flavors: A native Mac OS X GUI application GTK+ and Qt GUI applica
qBittorrent - A BitTorrent client in Qt
qBittorrent - A BitTorrent client in Qt Description: qBittorrent is a bittorrent client programmed in C++ / Qt that uses libtorrent (sometimes called
SSD1306 library and simple graphics core library based on Adafruit GFX Library.
Raspberry Pico SSD1306 + GFX Library Based on Adafruit GFX Library https://github.com/adafruit/Adafruit-GFX-Library Usage Hardware Connect your SSD130
This library provides a cross-platform image loading library in C11 for projects based on our foundation library
Image Library - Public Domain This library provides a cross-platform image loading library in C11 for projects based on our foundation library.
The dgSPARSE Library (Deep Graph Sparse Library) is a high performance library for sparse kernel acceleration on GPUs based on CUDA.
dgSPARSE Library Introdution The dgSPARSE Library (Deep Graph Sparse Library) is a high performance library for sparse kernel acceleration on GPUs bas
C-based/Cached/Core Computer Vision Library, A Modern Computer Vision Library
Build Status Travis CI VM: Linux x64: Raspberry Pi 3: Jetson TX2: Backstory I set to build ccv with a minimalism inspiration. That was back in 2010, o
A simple header-only C++ argument parser library. Supposed to be flexible and powerful, and attempts to be compatible with the functionality of the Python standard argparse library (though not necessarily the API).
args Note that this library is essentially in maintenance mode. I haven't had the time to work on it or give it the love that it deserves. I'm not add