5 Skills Every Data Scientist Will Need For Their Job in 2018


We asked our newest data scientist what skills, tools, and concepts she thinks you'll need to be a successful data scientist in the coming year.

This post may describe functionality for an old version of CARTO. Find out about the latest and cloud-native version here.
5 Skills Every Data Scientist Will Need For Their Job in 2018

According to Glassdoor data scientist is the #1 job in the United States based on number of job openings salary and overall job satisfaction.

From Reddit to the New York Times data scientists are in hot demand.

Many want to break into the field but the available career advice can be overwhelming: Which coding languages should you know? Do you have to be an expert in machine learning? Is it better to beef up on your technical skills or to nail down your design principles?

To answer these questions (and more) we spoke to Wenfei Xu our newest addition to the Data & Research team about the journey that ultimately led her to CARTO. Below she shares her five secrets for success.

1. A foundation in statistics helps you understand large datasets

At the University of Chicago Xu studied economics including single and multi-linear regressions time series econometrics and forecast modeling.

Though she's no longer trying to model the price of fixed income instruments she still relies on her undergrad training to guide her data analysis

For example Xu is currently working on a project to better understand how people use public parks in New York City through mobile phone data. Working with terabytes of data (like the latitude and longitude where the app was opened the timestamp the length of use) she was able to choose the right distribution method (logarithmic) to evaluate the data.

Look for her blog post about NYC park data next month on our blog!

2. Study design principles to better understand how to visualize data

After three years working in finance Xu decided to go back to school and study architecture and urban planning. While at MIT she learned how to use design principles to prioritize certain information.

Xu worked on a project last month about the different communities living in Williamsburg Brooklyn. As a former resident of the neighborhood she wanted to depict the co-habitation of those communities using visual design principles. In her analysis she used pick-up and drop-off data from New York City taxis to identify different sub-groups.

Williamsburg Communities

She found that a substantial number of people nicknamed "partiers " take taxis from Lower Manhattan and other parts of Brooklyn to Williamsburg generally pretty late at night and on the weekend.

To make that message stick out she used a simple black-and-white map with street boundaries and overlaid it with bright primary colors for each of the pick-ups and drop-offs.

3. Practice telling a story with your data

According to Xu the best statistical models and sharpest design principles should ultimately come together to tell a narrative.

When she created the Williamsburg taxi map Xu discovered that there were 75 potential communities she could include far too many to for one map.

But because she had a clear story in mind — that demographically disparate communities co-exist in Williamsburg often in overlapping space — she was able to best support her argument by whittling down the options to the best five groups even if they weren't necessarily the largest.

4. Find a community or group to bounce ideas off of

Wenfei says she's lucky because CARTO sits "at the intersection of industry and academia " meaning she has access to the best minds in both.

She can also count on her coworkers for help. For example many of the parallel processing and visualization tools she's currently using were introduced to her (or made) by her colleagues.

If you don't yet have your own data science squad yet you're in luck. Wenfei writes a newsletter that you can join.

5. Pay attention to these tools concepts and programming languages

Hard skills are also important especially when it comes to landing your data scientist dream job. Wenfei recommends the following:

Concepts to know

  • a solid foundation in statistics
  • hypothesis testing
  • linear regressions
  • machine learning
  • visualization principles

Skills to have

  • python or R
  • spatial analysis
  • cloud computing and distributing computing methods
  • database skills such as PostgreSQL

Tools to be familiar with

  • the iPython/Jupyter environment
  • Matplotlib
  • Pandas
  • NumPy
  • Dask
  • Bokeh
  • scikit-learn

Whether you're beginning your data science journey or looking for what skills to develop next subscribe to Wenfei's Data Science Newsletter to get the best resources and articles sent straight to your inbox.