How to Learn from a Data Scientist: The Most Wanted Technical Skills

Which technical knowledge is becoming most popular with employers, and which are losing their popularity.

image

In my original article in 2018 , I looked at the demand for common skills - statistics and communication. I also looked at the demand for Python and the R programming language. Software technology changes much faster than the demand for general skills, so I only include technology in this updated analysis.

I searched for keywords that appeared on job listings for Data Scientist in the US on sites like SimplyHired , Indeed , Monster, and LinkedIn. This time I decided to write code to examine all the lists instead of manually searching. This decision was very successful for SimplyHired, Indeed, and Monster. I used Requests and Beautiful Soup from the Python HTTP library. You can see the code with the analysis in my report on GitHub .

Getting through LinkedIn turned out to be much more difficult. You must go through the authorization process to view the exact number of job listings. I decided to use Selenium to view pages without a graphical user interface. In September 2019, the U.S. Supreme Courtwon the case against LinkedIn, thereby allowing you to clear the site data. However, I was unable to access my account after several login attempts. Perhaps this problem arose due to speed limits. Update: I was still able to log in, but I'm afraid that they will block me when I try again.

Incidentally, Microsoft owns LinkedIn, Randstad Holding owns Monster, and Recruit Holdings owns Indeed and SimplyHired.

In any case, LinkedIn data did not provide an accurate comparison of the previous year with the present. This summer, I noticed huge fluctuations when looking for jobs in the technical field. I suppose that maybe they experimented with a search algorithm using natural language processing. On the contrary, approximately the same number of vacancies for 'Data Scientist' has appeared on other sites over the past two years.

That is why I excluded the results of LinkedIn 2019 and 2018 from this article.

For each job search site, I calculated the percentage of the total number of job advertisements for data scientists where the keyword appeared. Then I averaged these percentages across three sites for each keyword.

I manually explored new searches and reviewed the most promising ones. None of the new requests reached an average of 5% in the list of 2019; Below you will see the result of the selection.

Go!

results


There are at least four ways to view the results for each keyword:

  1. Divide the number of keyword ads by the total number of queries that include a “data scientist” on each job search site for each code. Then take the average of all three sites. It is this process that I described earlier.
  2. 2018 2019 .
  3. 2018 2019 .
  4. . .

Let's look at the first three options using histograms. Then I will show you a table with data, and we will discuss the results.

So, here is a chart with the first paragraph for 2019. We can see that Python appears in almost 75% of ads.

image

Below is a chart with the second paragraph, showing additions and decreases in terms of the average percentage of ads between 2018 and 2019. AWS grew 5% points. On average, in 2018 it appeared in 14.6% of ads, while in 2019 it grew to 19.4%.

image

Here is a chart for the third paragraph, showing the percentage change from year to year. PyTorch grew by 108.1% compared to the average percentage of ads in which it appeared in 2018.

image

All diagrams were compiled in Plotly. If you want to learn how to use Plotly to create interactive visualizations, check out my guide. If you want to take a look at interactive charts, then go to the HTML file in my report on GitHub. Code with analysis and visualization in the same place.

Below in the form of tables is the information from the graphs above, sorted by the percentage change in the average percentage of ads from 2018 to 2019.

image

I understand that this is all a bit confusing, so here is a small guide to the information in the table.

  • 2018 Avg is the average percentage of ads since October 10, 2018 from SimplyHired, Indeed, and Monster.
  • 2019 Avg is the same as 2018 Avg, only this is for December 4th, 2019. These data are shown in the first of the three diagrams above.
  • Change in Avg is the 2019 column minus 2018. This information is from the second of the three diagrams above.
  • % Change is the percentage change from 2018 to 2019. This data is on the third chart.
  • 2018 Rank is a ranking relative to other keywords in 2018.
  • 2019 Rank is a ranking relative to other keywords in 2019.
  • Rank Change is an increase or decrease in the rating over these two years.

What can we learn from this information?


Significant changes occurred in less than 14 months.

Winners


Python is still on horseback. This is by far the most common keyword. Literally in three of four ads. Python has grown decently since 2018.

SQL is our rising star. He almost surpassed R in the second largest average. At this pace, he will soon come in second place.

The biggest growth was shown by deep learning frameworks .

In PyTorch was the biggest gain keywords. Keras and TensorFlowalso showed success. Both Keras and PyTorch climbed four steps, TensorFlow - three. Please note that PyTorch started with a low average, and the average TensorFlow is still twice as high as PyTorch. Cloud platform

skills are becoming more popular. AWS appeared in almost 20% of ads, Azure in about 10% and climbed four steps. These are the most advanced technologies.



Losers


We R the biggest decline in the average value. This discovery is not very surprising, given the results of other studies . Python is far ahead of R as a programming language. Be that as it may, R continues to be very popular, appearing in 55% of ads. Do not despair if you own R, but also think about learning Python if you want to get a more sought-after skill.

Many Apache products , including Pig , Hive , Hadoop, and Sparklose their popularity. Pig dropped five positions in the ranking - much more compared to any other technology. Spark and Hadoop are still in high demand, but based on my findings, you can see the trend towards Big-Data technologies.

The statistical software packages MATLAB and SAS are greatly lost in popularity. MATLAB dropped four lines in the ranking, while SAS dropped from sixth to eighth place. Both languages ​​show a significant percentage decrease compared to the 2018 average.

Tip


There are a lot of technologies on this list. Of course, you do not need to know everything. No wonder the mythical data scientist is called a unicorn.

My advice is the following - if you are starting to work in this area, concentrate on the technologies that are in demand.

Concentrate.
On the.
Studying.
One.
Technologies.
Behind.
Time.

(This is excellent advice, although I myself have not always adhered to it.)

In this order, I recommend studying:

  1. Learn Python for general programming.
  2. Pandas. , , data scientist Python Pandas Scikit-learn. Scikit-learn , Pandas . Pandas Matplotlib NumPy.
  3. Scikit-learn. «Introduction to Machine Leaning with Python».
  4. SQL .
  5. Tableau . , .
  6. . AWS – - . Microsoft Azure – . , Google Cloud, . , Google Cloud, , Data Engineer Google Cloud.
  7. . TensorFlow. «Deep Learning with Python» Keras . Keras TensorFlow, . PyTorch . , .

These are my general learning tips. Adapt them to your goals or hammer and do what you want.



image
Learn the details of how to get a sought-after profession from scratch or Level Up in skills and salary by taking SkillFactory online courses:



Read more



All Articles