What Skills Do Data Scientists Need
Written by Sue Gee   
Tuesday, 06 August 2019

There is currently a huge demand for data scientists, which is a top-trending job with attractive salaries. But what knowledge and skills are employers looking for?

It's a few years since we asked What is a Data Scientist and How Do I Become One? The answer given back in 2015 is still valid as a starting point:

Similar to a business/data analyst, data scientists combine knowledge of computer science and applications, modelling, statistics, analytics and math to uncover insights in data.

But what does this mean in terms of the skillset a data scientist should acquire. The question How to Become More Marketable as a Data Scientist has been tackled by the research team at CV Compiler, a company which provides guidance on creating a convincing resume to developers and others in the software industry. For an analysis of the skills required by data scientists the CV Compiler team looked at 300 Data Science vacancies from StackOverflow, AngelList, and similar websites. Then using their own text analytics tool, they identified the terms which were mentioned the most frequently and created this chart:



It needs to be noted that the research represents the preferences of employers, rather than of data scientists.

I would have expected to see "Machine Learning" near the top of the list because looking at job descriptions you discover that Machine Learning Engineers work in Data Science teams and that Data Science Interns can expect to "gain valuable AI/ML skills". Perhaps the two terms are so intertwined that knowledge of  Machine Learning is assumed.

While R is frequently referred to as "the language of data science, Python outnumbering it in job vacancies makes sense in that Python a general-purpose language and currently trending when it comes to popularity. I'm surprised to see Scala quite so high and the complete absense of Julia both from the table and from the blog report write up where other skills and tools that gain substantial number of mentions are discussed. For example, while Big Data is in the table with 221 mentions, the term Data Mining, used for "collecting big data" isn't in the table despite but the fact that it had 128 mention in job vacancies is reported.

While SQL comes high in the list, and ETL (Extract, transform, load) is in the table, there's no mention anywhere Mongo DB or No SQL. On the other hand mentions of the open source  Apache Spark outnumber those of Hadoop. Commenting on this Andrew Stetsenko writes:

According to the 2018 Big Data Analytics Market Study, Big Data adoption in enterprises soared from 17% in 2015 to 59% in 2018. Thus the popularity of Big Data tools also grew. [In addition to Spark and Haddoop] the most popular ones are MapReduce (36), and Redshift (29) .....some employers still expect candidates to be familiar with Apache Pig (30), HBase (32), and similar technologies. HDFS (20) is still being mentioned in vacancies as well.

As with Compiler CV's earlier report on the skills needed by JavaScript developers, the figures in brackets are the number of mentions.

Stetsenko also mentions the importance of data visualization, mentioned in 55 job vacancies and notes:

It’s crucial that you could represent the outcomes of your work in a format, understandable to any team member or a customer. As for the data visualization tools, employers prefer Tableau (54).

The fact that Computer Vision and NLP (Natural Language Processing) make it into the table serves to emphasize that AI and Data Science are inextricably linked and that knowledge of AI tools such as Tensorflow is well worth acquiring.


More Information

How to Become More Marketable as a Data Scientist


Related Articles

Coursera TensorFlow Specialization Fully Available

Jobs Need More Than JavaScript

Machine Learning Engineer Rated Best Job 2019 

Data Scientist Best Paying Entry-Level Job Says Glassdoor

What is a Data Scientist and How Do I Become One?

Scientists, Data Scientists And Significance 


To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.


Researchers Use AI To Decode Dog Language

Scientists from the University of Michigan have used AI to decode what dogs mean by different types of bark. Wav2Vec2 succeeded at four classification tasks - dog recognition, breed identification, ge [ ... ]

Access LLMs From IntelliJ With Devoxx Genie

Devoxx Genie is a fully Java-based LLM Code Assistant plugin for IntelliJ IDEA, designed to integrate with local and cloud LLM providers.

More News

kotlin book



or email your comment to: comments@i-programmer.info

Last Updated ( Wednesday, 07 August 2019 )