Does AI Copy Code - Lawsuit Says No
Wednesday, 10 July 2024

Are we worried about AI code assistants? Well some of us were worried and offended enough to take GitHub/ Microsoft and Open AI to court over code copying by GitHub Copilot. But the judge came down on the side of the AI. Why?

The problem is that when you get an AI assistant to help you there is always a chance you will recognize the code. Did it just steal that code from you or from another developer?

GitHub Copilot Provides Productivity Boost

To stop you from seeing code that is identical to code in the training data, GitHub introduced a duplication detection filter that would eliminate any code suggestions that were identical to public code on GitHub. Neural networks do sometimes memorise things in their training data, but this is generally considered to be an example of "overfitting" and something to be avoided. However, given the size of the networks in use some overfitting is to be expected.

Such is the concern over the legality of code copying by GitHub Copilot that a group of people started court proceedings. We many never know the identity of the individuals concerned as are identified as "Doe" plaintiffs, which is a legal mechanism to keep their names confidential in court documents. The defendants in the case were GitHub Inc together with Microsoft Corporation and OpenAI Inc.

The AI that is the subject of this action was indeed trained on publicly available code on GitHub, most of which is not only free to view but open source under one or other of the many licenses. You could argue that this is fair use but you could also argue that its copyright theft. The argument now is that the fact that GitHub has a duplicate filter means that the user can turn it off and hence receive copyright material in the responses.

The claim was dismissed with prejudice, meaning it cannot be resubmitted, due to the opinion of the judge that most of Copilot's suggestions aren't close enough to the original code in the training data. The failure of the plaintiffs to provide examples of copying seems to be the basis of this conclusion and the idea that the existance of a duplicate filter is not proof that duplicates occur. To quote from the Court Order:

they “do not explain how the tool makes it plausible that Copilot will in fact do so through its normal operation or how any such verbatim outputs are likely to be anything beyond short and common boilerplate functions.”

Now this mention of boilerplate functions is food for thought.

Isn't most code in some sense "boilerplate"?

In general, we are not talking about code that implements some clever new algorithm that you have just invented. The bulk of difficult code written today is simply an attempt to understand the documentation of other systems of code. What generally happens is that you sit down with the intent to implement something and you have to sort out what the documentation is telling you and put together some function calls or object usage to get it working.

This usually isn't easy because the documentation is very poor, many of us are very bad at reading and understanding the principles behind what we have read, and we have little intuition about how it all might work. As a result we do battle with the documentation in an effort to produce something that works. We have to get the function calls just right and this seems to be rocket science. But once it is done we have something repeatable. We have won against the documentation and got it working - yay! But what we really have is some boilerplate code that can be reused to do the same job without doing battle a second time. This is not copyrightable material as it has no originality of expression and no claim to being unique in anyway.

Bearing in mind the outcome of Oracle v Google and my personal opinion I don't think code can be subject to copyright and as a result I don't think we should sue AI agents that do what we all do - take boilerplate code for granted.

What might be worth providing some sort of intellectual property protection for is something akin to "look and feel" - the UI or even the API might be copyrightable in some extended sense, but not code that makes use of them, even if it was hard won and hence seems to have an invested value.

This isn't the end of the story. The case grumbles on to decide if Copilot has broken any licence conditions  - this is a much tougher decision. If the precedent of Oracle v Google with regard to Android is anything to go by, this case might take years to settle.


More Information

GitHub Copilot claims dismissed

Related Articles

Do You Have To Attribute Stack Overflow Code?

Oracle Files Response To Google and API Copyright - We Are All Doomed

The Software Industry Rallies Behind Google To Save Programming

Supreme Court Asks For Government Help In Oracle v Google

EEF Calls For Supreme Court To Decide If APIs Copyrightable

Computer Scientists Petition Supreme Court Over API Copyright

Are APIs Copyrightable? Computer Scientists Urge Court To Say No 

Supreme Court Refuses To Reconsider API Copyright Decision

Appeals Court Rules In Favor of Oracle

Android's Uncertain Future

Supreme Court Refuses To Reconsider API Copyright Decision

White House Advises That APIs ARE Copyrightable

Supreme Court Seeks Guidance On API Copyright Issue

Android Copyright Battle Goes To Supreme Court

Oracle v Google - Are Computer Languages Copyrightable?

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.



Access LLMs From IntelliJ With Devoxx Genie

Devoxx Genie is a fully Java-based LLM Code Assistant plugin for IntelliJ IDEA, designed to integrate with local and cloud LLM providers.

Apache NiFi Adds Python Processor Support

Apache NiFi 2, a project for processing and distributing data, has been released with support for Python processors in the MiNiFi framework, and a completely rebuilt user interface.

More News


Last Updated ( Wednesday, 10 July 2024 )