Page 3 of 3
Today one of the most used decentralised P2P networks is BitTorrent. This makes use of the same basic ideas but uses jargon which is becoming standard for P2P in general. The first machine to share a file is called a "seed". The seeded complete file is then used supply other machines, peers, to download portions of the file on the way to acquiring the whole file. As each machine downloads a portion that portion becomes a download target. This means that even though no machine in a set of peers has the entire file they can still down load a complete copy to another peer by each contributing a few portions. Any peer that does eventually acquire a complete copy of the file becomes an additional seed.
The first seed creates a metadata file or a torrent file which describes the file to be shared. This is uploaded to a tracker - a server which co-ordinates file distribution. Any peer that wants to download the file first has to acquire the appropriate torrent file from a tracker which also informs it of seeds and other peers with file fragments waiting to be downloaded. The need for a tracker, i.e. a central server, can be avoided by using a modification to the protocol in which every peer acts as a tracker.
The key to making a P2P network fast is to have multiple copies of files on different machines and connect clients to machines that aren't heavily loaded. At the next level of sophistication it should be possible to dynamically move the download of a file from one machine to another. However this implies that the files stored on different machines are exact byte for byte copies. How can you be sure that two files are the same? If there are two video files called "Star Trek" they might be the same movie but recorded at different standards or with different editing.
The solution is to compute checksums or hash values on similarly named files. A hash value is computed using every byte in the file and if two files are exactly the same they always give the same hash value. If two files are different then the probability that they have different hash values is very high. Using this method the system can quickly, but not perfectly, determine if two files can be used as if they were the same. Indeed some networks, eDonkey for example, treat two files with the same hash but different names as identical and two files with the same names but different hash as different.
Another technique to speed things up is to allow a file to be downloaded by another user before its download has completed - see the description of BitTorrent above.
These, and even more advanced methods of automatically organising P2P networks, are likely to be the main areas of development in the future and surprisingly they are coming from commercial attempts to make P2P a mainstream computing technique.
If you would like to be informed about new articles on I Programmer you can either follow us on Twitter, on Facebook , on Digg or you can subscribe to our weekly newsletter.