Copying From Stack Overflow - It's Not Just A Joke
Written by Sue Gee   
Wednesday, 21 April 2021

On April 1st Stack Overflow played a prank that had some of its community really worried - it was going to make us pay for copying code. But as with all good jokes there was a payoff. Stack Overflow was able to quantify not only the extent the copying happens but give details of what is copied.


When we reported on The Key - Stack Overflow's custom-built keyboard to simplify the Copy and Paste operation - we were aware of its April Fool's status. However that wasn't the end of the story. In order to play the joke a system was set up to react every time someone issued a copy command and this enabled some very interesting data collection that Stack Overflow has now reported on its its blog.

One out of every four users who visits a Stack Overflow question copies something within five minutes of hitting the page. That adds up to 40,623,987 copies across 7,305,042 posts and comments between March 26th and April 9th. People copy from answers about ten times as often as they do from questions and about 35 times as often as they do from comments. People copy from code blocks more than ten times as often as they do from the surrounding text, and surprisingly, we see more copies being made on questions without accepted answers than we do on questions which are accepted.

Yes we knew copying from Stack Overflow was rife - but this rate of copying seems unbelievable. However looking further into the data to put it in context does makes it more reasonable.

For a start the vast majority of copies are made by users with a zero-reputation score. Given that creating an account on Stack Overflow automatically gives you a reputation of 1 this means they are casual and anonymous visitors - although they may also be users with an account who have not logged in. Taking away this group the number of copies goes down to less than half a million and is concentrated among those with a reputation score of 5 or less:


Stack Overflow also looked into whether accepted answers, i.e answers that are found helpful by the person who originally asked the question, were more frequently copied than ones that hadn't been accepted. In what seems like a surprisingly result it was found that more copies were made of answers that are not accepted:


However on average, accepted answers get seven copies per unique post while non-accepted answers get five copies per unique post leading Stack Overflow to comment that there is higher knowledge reuse, i.e taking advantage of what other developers have created and proved, from accepted answers.   

It isn't just answers that are copied. Questions and comments are also copied. Given the distribution of copied questions revealing that the majority of copies are from posts with 1-5 reputation points, David Gibson, who reported on the data on the Stack Overflow blog writes:

I suspect that is because users are copying the question to reproduce it and eventually post an answer.

Yes, that makes sense. And so does the finding that as a post increases in Post Score so does the Copies Per Post.



As Gibson notes:

This makes sense because as a post increases in score it is more likely that the knowledge is being reused by our community.

The blue dots to the left of the chart are posts with a negative score to here Gibson's exploration of the data makes sense of what would otherwise be a mystery. He examines a answer with a score of -2 that was copied 288 times revealing that it turns out to be a longer version of an accepted answer with a score of 29 that had a total of 493 copies showing it to be:

the perfect example of a “too long didn’t read” post.

The blog post goes into popular tags, noting the preponderance of Python-related tags, and finally reveals at the post that received the most copies. Again to quote from the blog:

With a post score of 3,497 and 11,829 copies, I am happy to announce that How to iterate over rows in a DataFrame in Pandas received the most copies. Answered in 2013, this question continues to help thousands of people each week.

As for Copy-and Paste keyboard designed for the April Fools, there was so much interest in it that Stack Overflow is going ahead with having it manufactured.


More Information

How often do people actually copy and paste from Stack Overflow? Now we know.

Related Articles

What Programming Has Come To - Copy & Paste

Most Used Stack Overflow Snippet Has A Bug

CROKAGE AI Gets Stack Overflow Answers For You

How To Ask A Successful Question on Stack Overflow 

Stack Overflow Considered Harmful?

Stack Overflow: A Code Laundering Platform?

Do You Have To Attribute Stack Overflow Code?

Weak typing - the lost art of the keyboard

Type Properly Or Suffer The Consequences

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.



Run WebAssembly Components Inside Node.js With Jco

Jco 1.0 has been just announced by the Bytecode Alliance.It's a native JavaScript WebAssembly toolchain and runtime that runs Wasm components inside Node.js. Why is that useful?

Java Version 22 Released

JDK 22 is not a Long Term Support release, but is one of the regular releases that are scheduled to arrive every six months. Still, it has got a lot to show for itself.

More News

raspberry pi books



or email your comment to:

Last Updated ( Wednesday, 21 April 2021 )