The Data Science Jobs have moved. The new path forward.

Oh, the changes.  When I first started developing courses in data science -- all of three years ago -- it was a pretty straightforward task.  You take Drew Conway's data science diagram and essentially write a curriculum that matches.  


 


In 2018, at a well respected bootcamp, me and my team developed a curriculum that looked like the following:

1. Coding: Python and SQL

2. Math Foundations: Statistics and Linear Algebra

3. Machine Learning Algorithms (Linear & Logistic Regression, CART, Clustering, NLP)

Within two months of graduating, 9 of 11 students got jobs, and looking at their linkedin profiles years later, they're doing quite well.



"Yea, that's just the bare minimum now."

I heard this on a call with a BCG data scientist a couple of months ago.  And echos of it on multiple other calls.  At this point, I'm a firm believer.  Here's why.

1. The Data Science Market is Flooded with Applicants

I probably don't need to tell you this, but the entry level data scientist is quite competitive.  Over four hundred applicants for a data science position just a few days after posting?  Browse linkedin, and you'll see that a lot.



The next screen shot from Aug 2020 shows this is more than anecdotal.  While there are 678 entry level remote jobs for data science with under 10 applicants, only 70 of them are actually for "data science".  The others are engineering positions.

Let's focus in on that bottom corner.



The scarcity of data science jobs relative to applicants echos what I've heard from data scientists: one data scientist mentioned getting upwards of 20 pings per day from those looking to break into the field.

2. The Skillset Needed is Diverse and Deep

To get an overview of the requested skills, I scraped indeed postings for Data Science in August 2020 and looked at the requested skills.  Here are the results.


If we look at the chart above, we'll see the requested skills are both deep and diverse.  Below, I organized them into the following.

A. Coding: Python (.98), OOP (.42), Spark (.32), Tests (.16), Pandas, Numpy, ETL, AWS, Hive (~ .10)
B. Soft Skills: Communication (.42), Visualization (.31), Stakeholder (.18), Metric (.13)
C. Statistics/ML: Statistics (.70), Regression (.22), Deep Learning (.16)

Think about an engineer that just met the coding skills above: Python, OOP,  Testing, ETL, and AWS.  Right off the bat, he'd be pretty competitive as a junior engineer.  Add in the soft skills and he'd make a good tech lead.  Now add in knowledge of statistics and our classic algorithms and maybe we meet the data science requirements.

Of course, these are recruiting wishes, and yes, it's hard to find a candidate with all of these skills.  But with the competition being what it is, having a strength in either soft skills or coding seems to be a pre-requisite for many of these jobs.  

3. My recommendation:  Get data engineering skills first.  

Just the coding component of data science involves strong (but learnable) amount of coding skills.  And these skills are in such demand there's a position dedicated to them.  It's called a data engineer.  These are the requirements listed in roughly 900 data engineer postings on Indeed.com.

Data engineering, for the most part, focuses on the engineering skills required by a data scientist.  Take a look at the overlap:

DS Coding: Python (.98), OOP (.42), Spark (.32), Testing (.16), Pandas, Numpy, ETL, AWS, Hive (~ .10)

Data Engineer: Python (.55), OOP (.06), Spark (.12), Testing (.17), ETL (.24) AWS (.28) 

So with exception of numpy and pandas, there's pretty strong overlap between the two.  And yes, a little digging on LinkedIn shows many data engineers switching over, if that's what they want to do.


And what's nice about data engineering is that it's an in demand skillset.  Here's the same search for entry level data engineer positions, with fewer than 10 applicants:

Again let's focus in on the bottom corner.


Yes, many of them are for software engineering, but that's also a good description of someone with that skillset.  Even so, there are still a lot of purely data engineering positions available. And this is backed up, with multiple industry reports.  

To me, it comes back to the engineering skills.  This is something I doubled down on when I taught my last set of students (with testing, OOP, and Docker), but will be adding even more focus to the course that starts this October.


If you're looking to start getting these skills, checkout some of the free free online workshops that I'll be hosting this October.  In the meantime, check out the curriculum at Jigsaw Labs, and feel free to send any questions my way.

Resources


Comments

Post a Comment