Page 1 of 3
R is a language targeted at statistics but it has an interesting way of working with data. In this introduction to R we take a programmer's point of view and celebrate that fact that R is based on Lisp.
R is a language targeting statistics, it is open source, hence it is free and it seems to be taking over the world... well the statistical world at least.
With the rise of "big data" as a serious application stats and statistical languages are also becoming more important. If you have used SPSS, SAS, Mathematica or Maple to do stats then it is worth looking at R as an alternative - if only because it is free. If you are considering becoming involved in "big data" then starting out with R is a good idea.
First - how to characterise R?
The most important thing to realise is that R isn't a particularly ground breaking language - but does have some facilities that you might consider strange if you know a classical object oriented language like C# or Java. It is targeted at a fairly narrow set of tasks and as such its real power comes from having a range of functions that do statistical analysis and many users can get all the results they need without ever worrying about any programming aspects of R. So to the non-programmer R looks very simple but as a programmer you are going to want to do more with it so you really do need to get under the skin of R
The best way to describe the language is as being related to Lisp but with some additions to make it easier to use and to make sophisticated data easier to work with. Its basic data type is the List and this is used to implement all of the more sophisticated data structures needed for statistics and to make them easier to use. It is a typed language, you could even say that it has too many definitions of "type" but it attempts to be weakly typed in the way that it treats its base data and its data structures.
Overall the language can be written in a procedural style with a very strong leaning towards the functional style. It is often said that R is object-oriented but this is more wishful thinking on behalf of its supporters than any real facilities provided. In reality it calls everything and object and provides a manual type system that is used to work out which form of a function should be used to process the data. This could be called polymorphism but you could just as well call it dynamic generics or something similar. Essentially it provides function overloading based on the first parameter. What is surprising is that this simple scheme coupled with the user friendly form of the List do turn out to be quite powerful and easy to use/
It is often said that R is a language that supports a procedural approach with the option of using an object oriented approach. This seems to overstate the case. R is a language that supports a functional procedural approach with some typing to support simple function overloading. If you try to treat it as an object oriented language you are going to spend a long time looking for the objects. For example the entities referred to as objects don't have properties and methods that are specified by compound names. If L is a List (see later) you write length(L) and not L.length() to retrieve its length. As already said R is mostly functional.
So to sum up.
The best way to describe the language is as being related to Lisp but with some additions to make it easier to use and to make sophisticated data easier to work with.
Getting and installing R is simple. Just go to the website, download the appropriate binary and run the installation. If you are working under Windows then it is best to stick with the 32bit version until you reach a point where you need to the bigger data handling power of the 64 bit version.
The installation includes the RGui which provides an easy to use R Console that you can use to type instruction into directly.
Once you have it installed you can run the R Console and type in commands. Notice that R uses a persistent environment. That is any data structures or objects you create remain accessible and you have an option to save and reload the user environment complete with data objects each time you quit and start a session.
All of the examples given in the rest of this article can be typed into the command console and tried out at once.
While R isn't really a functional language the fact that it supports some sophisticated data types and provides functions which perform complicated operations on them does give it the flavour of a functional language. To see this in action we need to first look at the fundamental data structure - the list.
The R list is just a list of objects and an object is basically anything R can operate on. The list is the most sophisticated data structure in R because it can contain elements of any type including other lists. That is a list is potentially recursive. You can guess that this is more power than the average R user needs to in addition to the all powerful list R provided data structures such as the vector which are restrictions on the basic list.
For programmers it is important to meet the R list as soon as possible.
For example, to define a list of numbers you would use
The <- operator assigns the result of the list function. The list function combines the values into a vector. You can also write this expression using the assignment function which <- is a shorthand for:
All operators in R can be regarded as shorthands for function calls.
Notice that you don't have to declare variables and there are no line end markers. You can also use the -> operator the other way round and write:
which can be worrying if you are used to assignment always being to the left.
Although the List isn't the most used of the R data structures it is worth finding out a little about how to use it.
The first thing to point out is that a list really can store elements of different primitive and structural types. For example:
is a list of strings and numbers.