Exposing The Most Frequent Mistakes In Programming
Written by Nikos Vaggalis   
Monday, 09 May 2016
Article Index
Exposing The Most Frequent Mistakes In Programming
Unbalanced parentheses top of the list

The Blackbox project is a massive data collection initiative by the University of Kent, that sifts through millions of source code compilations looking to identify the most frequent mistakes made by student programmers.


Why is that useful?

Understanding how students learn to program through their common misconceptions and their recurring mistakes is important for many reasons:

  • Produce educational material focused on these issues

  • Render educators more efficient

  • Build IDE's or programming tools that protect against those errors 

  • Improve the readability and the helpfulness of the errors emitted by compilers

  • Language design - improve the future syntax and design of a language by taking into consideration the syntax barriers students typically encounter


What data is collected?
The data collection is based on BlueJ, a free Java IDE, which is designed specifically in providing usability enhancements to beginners. Blackbox acts as an add-in to BlueJ that collects anonymous information on the ways the IDE is used, the source code of the project the student is working on, as well the errors resulting from its compilation(s).

Although the source code and the compilation errors are the springboard of the research, the researchers also went on to calculate the time it took students to fix a bug by looking forward in time to find the next compilation where the mistake was no longer present. This emerging metric serves as an indication of and how long it typically takes students to learn from their mistakes.

The errors then were classified according a classification resulting from a previous study based again on the Blackbox data, called "Investigating Novice Programming Mistakes : Educator Beliefs vs Student Data". This in turn used a classification established in an even older study "Identifying and correcting Java programming errors for introductory computer science students" that surveyed educators asking them for their experience on the most frequent mistakes their students made.

They came up with 18 errors grouped into three broad categories; Syntax, Semantic and Type. The errors were subsequently labelled A through R, and were informally categorized as follows: 


 A: Confusing the assignment operator (=) with the
comparison operator (==). For example:if (a = b)...

 C: Unbalanced parentheses, curly or square brackets
and quotation marks, or using these different symbols
interchangeably. For example:while (a == 0]

 D: Confusing \short-circuit" evaluators (&& and ||)
with conventional logical operators (& and |).
For example:if ((a == 0) & (b == 0))...

 E: Incorrect semi-colon after an if selection structure before the if statement or after the for or while repetition structure before the respective for or while loop.For example:
if (a == b);
return 6;

 F: Wrong separators in for loops (using commas instead of semi-colons).For example:for (int i = 0, i < 6, i++)...

 G: Inserting the condition of an if statement within
curly brackets instead of parentheses.For example:if {a == b}...

 H: Using keywords as method or variable names.
For example:int new;

 J: Forgetting parentheses after a method call.
For example:myObject.toString;

 K: Incorrect semicolon at the end of a method header.
For example:

public void foo();


L: Getting greater than or equal/less than or equal
wrong, i.e. using => or =< instead of >= and <=.
For example:if (a =< b) ...


P: Including the types of parameters when invoking a
method.For example: myObject.foo(int x, String s);

Type errors:

I: Invoking methods with wrong arguments (e.g. wrong
types).For example:list.get("abc")


Q: Incompatible types between method return and
type of variable that the value is assigned to.
For example:int x = myObject.toString();

Other semantic errors:

 B: Use of == instead of .equals to compare strings.
For example:if (a == "start")...


 M: Trying to invoke a non-static method as if it was
static.For example:MyClass.toString();

 N: A method that has a non-void return type is called
and its return value ignored/discarded.
For example:myObject.toString();

 O: Control
ow can reach end of non-void method without returning.
For example:

public int foo(int x)
if (x < 0)
return 0;
x += 1;

 R: Class claims to implement an interface, but does
not implement all the required methods.
For example:class Y implements ActionListener { }


In "Investigating Novice Programming Mistakes: Educator Beliefs vs Student Data" the researchers used these 18 errors to contrast the mistakes the educators perceived to be the most frequent with the reality exposed by the errors resulting from source code compilation.

The study came to some surprising findings :

"Our first finding was that educators have only a weak consensus about these frequencies....

Our further result that educators are not very accurate compared to a large data set of students suggests that educators are also not accurate about the frequencies of these mistakes,so any claims that “students always make mistake X” are unlikely to be accurate.....

Our most surprising result was that an educator’s level of experience (as measured by years as and educator, years teaching introductory programming in any language, or years teaching introductory programming in Java)had no effect on how closely the educator’s frequency rankings agreed with those  from the Blackbox data...."

Last Updated ( Monday, 09 May 2016 )