Aaron Ximm here, I work at the Internet Archive (archive.org). I'm on a bit of client survey slash recruitment drive, hoping to encourage support for some simple protocol extensions we've worked on with BitTorrent Inc. for use with archives, libraries, and anyone with large-scale 'mutable' content distribution/preservation 'issues.'
At the Archive we've been trying make use of BitTorrent wherever we can, e.g. we make around 2-3 million items (one 'thing' in the archive -- could be a netlabel album, a live show with 20 tracks, a book with various derived formats like plaintext from OCR...) available through Torrents.
Most are only webseeded but a few are very popular like the Musopen full-res recordings, our trackers see moderate use of around 20K peers at any time.
Our needs are a little different than most people using BitTorrent. The main thing is that we have a commitment to long-term availability of our collections, which creates some unique challenges, especially when paired with our reliance on webseeding.
Almost all the time our users request are accessing less-popular Torrents and hence get them only via Getright style webseeds we provide (we don't maintain a super-seeder).
That's not a bad thing generally. Webseeding has worked great for us!
But, it DOES collide with the fact that our content on disk can change or update for various reasons. E.g. when the metadata (description text for example) is edited.
When that happens the webseeded blocks fail until an updated Torrent is retrieved. In some cases, on some clients, blocks that have changed on disk don't validate and the error is chalked up to e.g. transmission problems, and the blocks are automatically retried. In the worst case this means infinite looping trying to retrieve blocks unless the webseed is blacklisted as a peer or something. :/
The root issue is that currently clients have no established way to tell that a Torrent they have is 'outdated' (has been deprecated), i.e. no notion that the content the Torrent delivers might be 'mutable' and subject to revision.
We took this to BitTorrent Inc and came up with a solution is defined in BEP 39 -- the addition into the info block of a URL which can be queried for a new Torrent:
(kind of a 'acceptable source' in the magnet link sense for an update.)
The latest uTorrent Windows alpha supports this feature and we've been using it. It works! It's pretty cool. It makes a Torrent into something a bit more like a pointer to a remote distributed file cache that is 'alive' rather than fixed at the time of creation.
I'm popping in here to see if we could maybe get Deluge to support this feature. Our dream is to have as many popular clients as possible support this feature, so the millions of Torrents (we're going to scale up to about 10M I'm guessing) we distribute will automagically stay current in local copies as changes are made.
The basic idea is pretty simple; if the info-block contains an update URL feed, before download a HEAD or GET is done on it; if the results indicate that a newer version exists, the new Torrent is retrieved before download).
Additionally, the client can (should for our use case...) poll for changes and hence automatically stay in sync with the 'remote' -- kind of a poor-man's mirroring.
The details and kinks are still being worked out of the implementation, but we're really keen on it and hope to encourage wider adoption/support for the idea.
Happy to go over the details if anyone would like to!
The pitch is that our long-term strategy is to encourage and support the wide adoption of BitTorrent as a sunlit protocol for institutions like us with large vulnerable but non-static collections which would benefit from peer cloud mirroring -- to guarantee that collections do not go dark if the institution should, e.g., burn down. Just what BitTorrent is great for. Now we want to bring that to archives and libraries who have needed features like this to consider adoption...
...hoping Deluge can be part of that!
More to say, please PM or write me at the Archive at ximm at archive org -- happy to discuss and go deeper into the value we see.
PS btw the current uTorrent implementation does two cool things as well, one is that for this feature to be used by the client, it requires that both the original and new torrent be signed by the same certificate using the strategy described in BEP 35.
Also, since most of the time most content has NOT changed and especially if padding is used to align files to block boundaries, only one or two files might change, local blocks e.g. retrieved for earlier versions can serve as source for the updated torrent, as described in BEP 38.
Suggestions and discussion of future versions
3 posts • Page 1 of 1
- Top Bloke
- Posts: 3474
- Joined: Mon Dec 07, 2009 6:04 am
- OS or Distro: Ubuntu 16.04
- Location: Scotland
Who is online
Users browsing this forum: No registered users and 4 guests