R 3.4.0, codenamed “You Stupid Darkness”, a major release of the R language used for data science, big data analysis, predictive modeling and visualization, has been released. It has a long list of new features and bug fixes and their overall effect is improved performance.
The open source programming language R was created for statistical analysis and graphing. It lets you organize data, define complex calculations, and display the results visually and is popular with statisticians who need to analyze data. At the end of 2014 we reported R Heads For Top Ten Languages as ranked by the TIOBE index. It is currently at #15 which reflects a significant level of interest.
There are the following major performance improvements in R 3.4.0:
JIT ("just-in-time") byte-code compiler enabled by default This means that functions will be compiled on first or second use and top-level loops will be compiled and then run. For now, the compiler will not compile code containing explicit calls to browser(): this is to support single stepping from the browser() call.
Linear algebra performance improvements R uses a BLAS library for high-performance implementations of many linear algebra routines like matrix multiplication. It now uses faster routines for matrix-vector multiplications and is also slightly faster for each call, by reducing the time to check whether the data include missing values (which BLAS generally doesn't handle.
Improvements for packages with compiled code. Many packages include code written in C , C++ or even Fortran that is then called from R functions. R 3.4.0 includes a new system that allows package developers to choose to expose compiled functions to other packages or to keep them private.
Accumulating vectors in a loop is faster Assigning to an element of a vector beyond the current length now over-allocates by a small fraction. The new vector is marked internally as growable, and the true length of the new vector is stored in the truelength field. This makes building up a vector result by assigning to the next element beyond the current length more efficient, though pre-allocating is still preferred. The implementation is subject to change and not intended to be used in packages at this time.
Performance improvements to other functions. Sorting vectors of numbers is faster thanks to the use of the radix-sort algorithm by default. Tables with missing values compute quicker. Long strings no longer cause slowness in the str function. The sapply function is faster when applied to arrays with dimension names.
Improvements not related to performance include:
An updated version of the Tcl/Tk graphics system in R for Windows.
More consistent handling of missing values when constructing tables.
Accuracy improvements for extreme values in some statistical functions.
Better detection and warning of likely programmer errors, like comparing a vector with a zero-length array.