This year, I finished a bootcamp and immediately landed a data science job. If I had to go back and learn everything by myself, here’s how I would do it.
When it comes to learning data science online, these recos represent the best, mostly free resources I’ve run across in the three years I’ve been training on Python, analytics, and productionizing machine learning models. A data scientist should be a great programmer, a decent analyst, and a reasonably good engineer. You also need a rock solid understanding of statistics — for that, there’s ESL.
To learn everything else, let’s get started.
Disclaimer: this post is in no way sponsored, nor does it represent views of anyone but myself. Sling your arrows this way if these recommendations don’t work out for you.
Python has quickly grown into the lingua franca of machine learning. It’s outpaced R and offers abundant packages for scientific computing. A data scientist must be an adept Python programmer.
In addition to having good coding chops, it’s reasonable to expect a data scientist to possess some core analytics skills, including data visualization. I offer some tips on a popular third-party tool, Tableau, for drag-and-drop analytics. A data scientist should be comfortable communicating their insights, sometimes using visualizations.
Finally, to be truly full-stack, a data scientists should be comfortable with all the steps from prototyping to productionizing a model. I love this quote from a guide on operations: “Putting a model in production is the beginning of the model’s journey, not the end.” A data scientist should be aware of what it takes to productionize a model.
Develop muscle memory for coding
Codecademy is the first place I tell people to go in order to learn Python, command line, and Git before jumping into data science. The platform’s simple interface helps you practice coding until getting the computer to do what you want is no longer the hard part — this frees up brainspace to focus on the challenges associated with actual data science. (Not free — but worth it!)
Learn the latest and greatest version of the most popular programming language in the world!
Cover the essentials of Data Science
Flatiron School, the amazing institution where I did my Data Science Bootcamp, offers a totally free online mini-curriculum. Legitimately so helpful.
Not at all. While making progress in our free courses is the best way to strengthen your application to our full-time…
Get acclimated to Machine Learning
Machine Learning is a first-class ticket to the most exciting careers in data analysis today. As data sources…
Develop Computer Science fundamentals
Eric Grimson is kind of like your stern and brilliant uncle — the one who is harvesting pieces from a dozen no-longer-functioning laptops to build a quantum computer. His classic course is a great way to learn CS best practices while deepening your knowledge of Python.
MIT edX 6.00.1x
Try a sample data science project
Here are several ideas to get you started:
Improve your understanding of geospatial information through GeoPandas DataFrames and Google Colab
A data scientist’s quick start guide to navigating Spotify’s Web API and accessing data using the Spotipy Python…
How to build & deploy an ML app with Streamlit and DevOps tools
Learn more quickly by getting excited about data science
Pick something you’re passionate about and dive deep. To start, check out this list of general resources — blogs, YouTube channels, and podcasts. You can also follow me on LinkedIn and Twitter for real time updates on my favorite learning resources.
Resources to Supercharge your Data Science Learning in 2020
Advance your understanding of machine learning with this helpful collection of journals, videos, and lectures.
Learning Data Viz
Tableau worksheet with dimensions in blue and measures in green. Sidebar at far left shows out-of-the-box analytics tools for basic summary statistics. via Tableau.
When your data needs to get dressed up, Tableau is a fool-proof style service. It offers a sleek, drag-and-drop interface for data analytics with native integration to pull data from CSVs, JSON files, Google Sheets, SQL databases, and that back corner of the dryer where you’ve inevitably forgotten a sock.
Data is automatically separated into dimensions (qualitative) and measures (quantitative) — and presumed to be ready for chart-making. Of course, if there are still a few data cleaning steps to be undertaken, Tableau can handle the dirty laundry as well. For example, it supports re-formatting data types and pivoting data from wide to tall format.
When ready to make a chart, simply ctrl+click features of interest and an option from the “Show me” box of defaults. This simplicity of interaction enables even the most design-impaired data scientist to easily marshal data into a presentable format. Tableau will put your data into a suit and tie and send it to the boardroom.
Follow these tips to go from “good” to “great” in your data visualization abilities.
Gain inspiration from master chart-makers
Throughout my time as a business analyst at a Big Four firm, these three blogs were my go-tos for how to create a great looking, functional Tableau dashboard.
Regular dispatches from the Tableau Public Team.
Evolytics shares how-to articles, analytics tips, expert advice, industry insights and news. Learn more about timely…
Order up! We have another month’s worth of hot and fresh data resources ready for you. In this blog round up,…
Keep these 4 guidelines in mind
#1 — Sheets are the artist’s canvas and dashboards are the gallery wall. Sheets are for creating the artwork (ahem, charts), which you will then position onto a dashboard (using a tiled layout with containers — more on this in a second) along with any formatting elements.
#2 — To save yourself time, set Default Properties for dimensions and measures. This will provide a unified approach to color, number of decimal points, sort order, etc. and prevent you from having to fiddle with these settings each time you go to use a given field.
#4 — Avoid putting floating objects into your dashboards. Dragging charts around becomes a headache once you have more than two or three to work with. You can make your legends floating objects, but otherwise stay away from this “long-cut.”
Instead, use the tiled layout, which forces objects to snap into place and automatically resizes if you change the size dimensions of your dashboard. Much faster and simpler in the long run.
Get started with your first dashboard
In summary, the Tableau platform is easier than finger paints to use, so if you’re ready to get started, Tableau Public is the free version that will allow you to create publicly accessible visualizations— like this one I put together after webscraping some info on questionable exempted developments from the Washing DC Office of Zoning — and share them to the cloud.
Getting ready to present financials to the C-suite.
After investigating data from your local community, another good sample project is pulling your checking account data and pretending you’re presenting it to a CEO for analysis.
Read more about the difference between a data scientist and a data analyst:
What’s the Difference Between a Data Analyst, Data Scientist, and a Machine Learning Engineer?
Explore the distinction between these common job titles with the analogy of a track meet.
Now if you not-so-secretly love data viz and need to find more time to devote to putting your models into production (
), let’s move on to…
Your machine learning model is only as good as its predictions and classifications on data in the real world setting. Give your model a fighting chance by gaining at least a basic understanding of DevOps — the field responsible for integrating development and IT.
Reframe your thinking about what data science is or isn’t
In this brilliant article, hero of deep learning Andrej Karpathy argues that machine learning models are the new hotness in software — instead of following if-then rules, data is their codebase.
I sometimes see people refer to neural networks as just “another tool in your machine learning toolbox”. They have some…
Get a sense for how this works in enterprise
This clever novel fictionalizes The DevOps Handbook and is surprisingly readable. (Not free — but if you buy a copy, give it to your coworker and hope they become super passionate about productionizing your models).
A story about rebel developers & business leaders racing against time to innovate, survive & thrive in a time of…
Introduce your machine learning model to the wild
Check out this article about how to use Streamlit for both deployment and data exploration. I’d be remiss if I didn’t also mention Docker and Kubernetes as enterprise-level tools for productionization.
The Most Useful ML Tools 2020
5 sets of tools every lazy full-stack data scientist should use
Other Useful Topics
Explain Computer Science Like I’m Five
Learn about the internet, programming, machine learning, and other computer science fundamentals through clear and…
How to Ace the AWS Cloud Practitioner Certification with Minimal Effort
Forecast: cloudy with a 100% chance of passing on your first try.
Comprehensive Guide to the Data Warehouse
Learn about the role of the data warehouse as the master store of analysis-ready datasets.
Except for the featured image, this story has not been edited by Javelynn and is published from a syndicated feed. Originally published on https://towardsdatascience.com/new-data-science-f4eeee38d8f6.