What exactly is it that data scientists do? Why does everyone want one? And, how do you become one? These are the some of the key questions circulating as the term "data scientist" spreads.
Heralded as the hottest profession of today, increased demand for data scientists is expected to continue over the next 5 years as a third of the UK organizations implement big data analytics programs.
Described as part scientist, part artist, this new breed of data analyst earns an average salary of $95,000-$118,000 (according to the salary comparison sites Glassdoor and Payscale) and is actively sought by businesses across the globe.
What does a data scientist do?
Similar to a business/data analyst, data scientists combines knowledge of computer science and applications, modelling, statistics, analytics and math to uncover insights in data. Evolving beyond the business/data analyst, the data scientist takes those insights and combines them with strong business acumen and effective communication to change the way an organisation approach challenges.
The average day of a data scientist involves extracting data from multiple sources, running it through an analytics platform and then creating visualizations of the data. They will then spend hours cleansing and analysing the data from multiple angles, looking for trends that highlight problems or opportunities. Any insight is communicated to business and IT leaders with recommendations to adapt existing business strategies.
As an example, they might uncover a section of consumers who behave differently. After further analysis they uncover this subsection of consumers share a similar trait. They can then recommend ideas to modify the consumer’s behaviour.
Why are data scientists in such high demand?
Modern day businesses track everything, from website visits and customer transactions right the way through to individual consumer reviews. We are living in a world of data overload. Hidden within this vast expanse of data are new revenue streams and business efficiencies, the issue is finding them.
This is where the data scientist comes into play. Combining skills within statistics, computer science and analytics, the data scientist will derive meaning where many are simply overwhelmed by the data. Through effective communication of findings, the data scientist allows businesses to realise this hidden revenue streams and business efficiencies.
Although the vast majority of demand for data scientists comes from commercial businesses, sectors within government and medicine are beginning to wake up to the practical application of data science. This will only serve to fuel existing demand.
What are the skills required to become a data scientist?
Everyone has their own opinion on the core skills required of a data scientist. Luckily, Ferris Jumah a data scientist from LinkedIn has (unsurprisingly) taken a data driven approach to uncovering the most popular data science skills.
Simply listing the skills can be confusing without understanding how they relate to one another. Thankfully, Ferris also went to the trouble of creating this graphic highlighting the core skills and how they relate to one another.
Data Science Skills Network - Image Credit Dataconomy
Looking at this list of skills can be intimidating, but don’t panic, most data scientists will combine one or two skills under each core approach to the role. Let’s take a look at these in a bit more detail. A data scientist will typically:
Look at data with a mathematical mind-set
Skills such as machine learning, data mining, data analysis and statistics are all included in the top 10. This reflects the data scientists need to interpret and represent data mathematically. If p-value, k-nearest neighbours and multivariable calculus are alien terms, maybe it’s time you got your head in a book.
If you’re planning on developing these skills, I’d recommend the following resources:
Pattern Discovery in Data Mining – freely available on Coursera and created by the University of Illinois, learn about the basic concepts behind data mining and pattern discovery
Statistical Aspects of Data Mining – this Google Tech Talks series on Youtube covers core aspects around exploring and visualizing data, association analysis, classification, and clustering
Use a common language to access, explore and model data
You’re going to need to know the tools of the trade. Knowledge of a statistical programming language, like R, Python or MATLAB, and a database querying language like SQL will be crucial. Data extraction, exploration and hypothesis testing are core to the role of a data scientist.
Thankfully there is an extensive and growing range of resources as well as professional training and certification around this group of skills:
Python – this is an estimated 13 hour course from Codeacademy teaching you how to program in Python. It’s aimed at beginners and there are almost 2.5 million students working through this free course right now.
Develop strong computer science and software engineering backgrounds
Developing just one of the following skills such as Java, C++, Algorithms and Hadoop will be crucial. These skills are required primarily to leverage data to architect systems.
You might want to check out these learning avenues:
Become a Cloudera Certified Developer for Apache Hadoop – this course will teach you to navigate the Hadoop ecosystem, utilise the API and learn a series of best practices for Hadoop development amongst other areas. The certification will also ratify your experience in this area.
Intro to Algorithms - this free course from Udacity will teach you the core concepts required to devise new algorithms for graphs and other important data structures. You will then learn to evaluate the efficiency of these algorithms.
If you prefer books check out these I-Programmer reviews of two titles that merited 5-stars, i.e. Highly Recommended: