What is a Data Scientist and How Do I Become One?
Written by Edward Jones   
Thursday, 26 February 2015

What exactly is it that data scientists do? Why does everyone want one? And, how do you become one? These are the some of the key questions circulating as the term "data scientist" spreads.

Heralded as the hottest profession of today, increased demand for data scientists is expected to continue over the next 5 years as a third of the UK organizations implement big data analytics programs.

Described as part scientist, part artist, this new breed of data analyst earns an average salary of $95,000-$118,000 (according to the salary comparison sites Glassdoor and Payscale) and is actively sought by businesses across the globe.

What does a data scientist do?

Similar to a business/data analyst, data scientists combines knowledge of computer science and applications, modelling, statistics, analytics and math to uncover insights in data. Evolving beyond the business/data analyst, the data scientist takes those insights and combines them with strong business acumen and effective communication to change the way an organisation approach challenges.

The average day of a data scientist involves extracting data from multiple sources, running it through an analytics platform and then creating visualizations of the data. They will then spend hours cleansing and analysing the data from multiple angles, looking for trends that highlight problems or opportunities. Any insight is communicated to business and IT leaders with recommendations to adapt existing business strategies.

As an example, they might uncover a section of consumers who behave differently. After further analysis they uncover this subsection of consumers share a similar trait. They can then recommend ideas to modify the consumer’s behaviour.

Why are data scientists in such high demand?

Modern day businesses track everything, from website visits and customer transactions right the way through to individual consumer reviews. We are living in a world of data overload. Hidden within this vast expanse of data are new revenue streams and business efficiencies, the issue is finding them.

This is where the data scientist comes into play. Combining skills within statistics, computer science and analytics, the data scientist will derive meaning where many are simply overwhelmed by the data. Through effective communication of findings, the data scientist allows businesses to realise this hidden revenue streams and business efficiencies.

Although the vast majority of demand for data scientists comes from commercial businesses, sectors within government and medicine are beginning to wake up to the practical application of data science. This will only serve to fuel existing demand.

What are the skills required to become a data scientist?

Everyone has their own opinion on the core skills required of a data scientist. Luckily, Ferris Jumah a data scientist from LinkedIn has (unsurprisingly) taken a data driven approach to uncovering the most popular data science skills.

Simply listing the skills can be confusing without understanding how they relate to one another. Thankfully, Ferris also went to the trouble of creating this graphic highlighting the core skills and how they relate to one another.

 

datasciskills

Data Science Skills Network - Image Credit Dataconomy 

 

Looking at this list of skills can be intimidating, but don’t panic, most data scientists will combine one or two skills under each core approach to the role. Let’s take a look at these in a bit more detail. A data scientist will typically:

Look at data with a mathematical mind-set

Skills such as machine learning, data mining, data analysis and statistics are all included in the top 10. This reflects the data scientists need to interpret and represent data mathematically. If p-value, k-nearest neighbours and multivariable calculus are alien terms, maybe it’s time you got your head in a book.

If you’re planning on developing these skills, I’d recommend the following resources:

 

 

Use a common language to access, explore and model data

You’re going to need to know the tools of the trade. Knowledge of a statistical programming language, like R, Python or MATLAB, and a database querying language like SQL will be crucial. Data extraction, exploration and hypothesis testing are core to the role of a data scientist.

Thankfully there is an extensive and growing range of resources as well as professional training and certification around this group of skills: 

  • Microsoft MTA Database Fundamentals – this new entry level course teaches you how to create, manipulate and administer a database as well as an intro to T-SQL.

  • R Cookbook – another book reviewed and recommended by I Programmer, the R Cookbook is loaded with more than 200 practical recipes for utilising R to its upmost.

  • R- Coursera’s Computing for Data Analysis – this 4 week course teaches you to program in R and use R for reading data, writing functions and applying modern statistical methods. It forms part of Coursera’s Data Science Specialisation

  • Introduction to MATLAB – another free nugget brought to you this time from the MIT open courseware platform.

  • Python – this is an estimated 13 hour course from Codeacademy teaching you how to program in Python. It’s aimed at beginners and there are almost 2.5 million students working through this free course right now.

  • Introduction to Computation and Programming Using Python - this textbook has an emphasis on statistics and algorithm construction. It was recommend in our 5-star review is recommended for those wanting a thorough grounding in Computer Science and is also recommended for the edX MOOC 6.00.1x, also titled "Introduction to Computer Science and Programming Using Python", which is co-taught by its author John Guttag. 

Develop strong computer science and software engineering backgrounds

Developing just one of the following skills such as Java, C++, Algorithms and Hadoop will be crucial. These skills are required primarily to leverage data to architect systems.

You might want to check out these learning avenues: 

  • Become a Cloudera Certified Developer for Apache Hadoop – this course will teach you to navigate the Hadoop ecosystem, utilise the API and learn a series of best practices for Hadoop development amongst other areas. The certification will also ratify your experience in this area. 

  • Intro to Algorithms - this free course from Udacity will teach you the core concepts required to devise new algorithms for graphs and other important data structures. You will then learn to evaluate the efficiency of these algorithms.
  • If you prefer books check out these I-Programmer reviews of two titles that merited 5-stars, i.e. Highly Recommended: 

Algorithms in a Nutshell

Introduction to Algorithms

 

So there you have it, hopefully you’ve found the answers you were looking for. If not, feel free to ask further questions in the comments below.

  • Edward Jones works for Firebrand Training, a provider of accelerated IT training. He actively works to serve the IT community with news, reviews and technical how to guides. 

 

Banner

 

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.

 

raspberry pi books

 

Comments




or email your comment to: comments@i-programmer.info

Last Updated ( Wednesday, 30 November 2016 )