Machine Learning Speeds TCP
Machine Learning Speeds TCP
Written by Mike James   
Monday, 22 July 2013

By applying machine learning to the TCP congestion control algorithm, a team at MIT has beaten similar algorithms designed by humans. But don't expect to understand how it works.

TCP is the protocol that sits on top of the basic packet switching network to ensure that your data gets through. A packet switching network simply acts as a transport for chunks of data, with no guarantee that the chunks will actually arrive, or if they do arrive that they are in the same order in which they were transmitted.

If you want to use a fast, but not reliable method, of transmitting data then you can use UDP but most connections over the internet use TCP. What TCP does is to introduce handshaking to make and break connections. Packets are sent between the source and receiver complete with numbers that indicate the  order in which they should be reassembled, and each packet has to be acknowledged.

This is a more complicated protocol than you might imagine. For example, how long do you wait for an acknowledgment before you decide to send the packet again? More subtle is the problem of congestion. If you design the TCP algorithm to run as aggressively as possible, i.e. sending and resending data as fast as it can, then the result might be the best possible connection for you, but the rest of the users will tend to suffer a reduced throughput and increased delays because you are hogging the bandwidth. In fact, you could probably get as good a connection by slowing down the rate that you send data so to better match the available bandwidth.

This is the problem of TCP congestion control and it is such a complicated problem it is still the subject of ongoing research. Most TCP stacks use one of two congestion prevention algorithms Compound TCP (Windows)or Cubic TCP (Linux), and these methods are only around ten years old. There is clearly still scope for improvement, but are we ready for improvements designed by computer?

The MIT team used machine learning to discover the best policy for a decentralized Partially Observable Markov process. Essentially, what the policy amounts to is that each agent either sends or abstains from sending based on a set of observables and the policy attempts to maximize throughput and minimize delays over the whole network, i.e. for all agents. 

The result is Remy - a computer generated congestion control algorithm - and it seems to work. If you compare it to the existing algorithms then it is clear that it does much better - indeed it occupies a distinct area of the performance graph to the others.




The real problem with this approach is a familiar one in the sense that many AI approaches suffer from the problem of "understandability" or "explainability". A neural network might be able to recognize a cat but we don't really understand exactly, in an engineering sense, how it does it. Expert systems had a similar problem in that they would come to a conclusion, or a diagnosis, but without any reassuring explanation as to why the conclusion had been reached. 

As the research team comments:

"Although the RemyCCs appear to work well on networks whose parameters fall within or near the limits of what they were prepared for — even beating in-network schemes at their own game and even when the design range spans an order of magnitude variation in network parameters — we do not yet understand clearly why they work, other than the observation that they seem to optimize their intended objective well.

We have attempted to make algorithms ourselves that surpass
the generated RemyCCs, without success. That suggests to us that Remy may have accomplished something substantive. But digging through the dozens of rules in a RemyCC and figuring out their purpose and function is a challenging job in reverse-engineering. RemyCCs designed for broader classes of networks will likely be even more complex, compounding the problem."

So are network engineers willing to trust an algorithm that seems to work but has no explanation as to why it works other than optimizing a specific objective function? As AI becomes increasingly successful the question could also be asked in a wider context.  



Go 1.8 Goes Faster

Google's Go is still going and its latest destination is the new 1.8. After more than seven years of existence thing have settled down and the new version is as much about consolidation as anything el [ ... ]

ACM Celebrates 50 Years of Turing Award

This year the ACM (Association of Computing Machinery) is marking 50 years of its most prestigious prize, the A.M. Turing Award. The celebrations will culminate in a conference in June, to be held in  [ ... ]

More News




Last Updated ( Monday, 22 July 2013 )

RSS feed of news items only
I Programmer News
Copyright © 2017 All Rights Reserved.
Joomla! is Free Software released under the GNU/GPL License.