Minification and obfuscation are two useful techniques for making code smaller and providing some protection. Now a machine learning technique promises to undo both and you can try it out.
Minification is great unless you need to read the code that is being used in a web page. You can download and look at the code that is running in the browser but it won't be in a good human readable form. The whole point of minification is that it removes white space and reduces variable names down to single letters - it is just as machine readable, but to a human it is a mess.
The first step in deobfuscation or deminifying is easy - you simply restore some formatting - line breaks, white space and indents. The big problem is restoring the variable names. You may have called the variable totalCost, but after transformation it ends up as A1 or something similar.
How can you possibly restore the meaningful name?
It is clear that humans can do it. We read through the program and see that the variable is being used in particular ways and where it gets its value from and eventually we can guess that it should be called something like totalCost. Could this be done by a machine?
The authors claim a 60% success rate for suggested identifiers, which goes a long way to help you work out what the code is doing and to work out good names for the remaining 40%.
It is also suggested that you could use it to improve your own code by getting it to perform the analysis on the human readable form and see what variable names it suggests. Personally I'd be a little upset to find that a statistical method could name my variables better than I could, but I'd try to swallow my pride.
Does this make obfuscation useless?
The simple answer is no because there is a difference between handing the world your work to look at and making them work for it. There is always the simple point that obfuscation is an arms race. Once you know what JSNice can undo, it should be easy enough to hide the patterns of code it is using to assign the names of variables. You could even change the signature so that totalCost was renamed something completely misleading.
Mayhem, a fully autonomous system for finding and fixing computer security vulnerabilities, emerged as the winner of the first all-machine hacking tournament held last week at the DEF CON hacker [ ... ]