|Codex - English To Code|
|Written by Mike James|
|Wednesday, 11 August 2021|
Open AI has a new release of its GP3-based code generation tool, Codex. Basically you talk to the program and it creates the code you would have written if you hadn't spent the time talking to it. It is also the basis for the better known GitHub Copilot, but it isn't clear whether the update will propagate. So are all our futures in jeopardy?
I've written about AI and code generation before and previously I've been very upbeat about our prospects as human developers up against automated competition. AI just didn't seem to be hacking it and, judging by the route it was taking, it was going to take a lot longer to get there than most programmers' horizons stretch to.
Now I'm not so sure.
What has changed? It isn't the raw performance of the models, although they are surprisingly good; it is the way the overall approach seems to be bringing more success than you could reasonably expect. You could say that we have moved on from "the unreasonable effectiveness of deep learning" to "the unreasonable effectiveness of GP3-type language models".
The basic idea is very simple. Take a neural network that is particularly well suited to learning sequence-based relationships and make it very very big. Then feed it language and get it to predict parts of the text that you have deleted. Keep doing this until it is very good at filling in the missing bits. When you have finished training it on a lot of examples, you have GP3. If you train it on programming text then you have Codex.
It all sounds too simple to be any use. How can learning to fill in the missing bits of language help with anything related to AI?
This viewpoint comes from not really appreciating language in all its wonders. Most of the successful deep learning models have worked with the raw world or at least the world only slightly processed and they attempt to build models of the regularities. Human language, however, is already a model of the real world.
We still don't know the role that language plays in general AI. We tend to down-play it. When you try and solve a problem you "think about it" and "visualize the solution" and so on. But you also reason and this is a linguistic process. The fact that you can describe your solution in language demonstrates that it has the power to host the solution and might even play a bigger part than introspection might suggest. After all good is a solution that you cannot render into language?
Now think about this idea applied to programming. Yes, I know that you think in "deep" terms about how the algorithm works and the overall design of your program, but as with general problems, the end product is expressed in language. Is it so surprising that the way that we use programming language provides a model for what we do? Notice that this isn't a neural network which extracts the regularities of a given programming language, but one that extracts a model of how we use programming language and this is much deeper and more powerful that it first appears.
How powerful? Take a look at the video:
The announcement adds:
Proficient in more than a dozen programming languages, Codex can now interpret simple commands in natural language and execute them on the user’s behalf—making it possible to build a natural language interface to existing applications. We are now inviting businesses and developers to build on top of OpenAI Codex through our API.
You can apply to help try it out at the OpenAI website.
I am now of the opinion that, not only is AI on the verge of an unexpected breakthough in automating programming, but this seems to be a new route to the general AI that we have been seeking for so long.
or email your comment to: firstname.lastname@example.org
|Last Updated ( Wednesday, 11 August 2021 )|