Understanding AI through emojis
Posted: 21 March 2017 | By Darcie Thompson-Fields
Getting your head around machine learning can be very difficult when you’re not a software engineer. Luckily, data scientist and blogger Emily Barry has put together an emoji cheat sheet to help you understand the fundamentals of the technology.
As Barry learned more about machine learning she wanted a better way to organise her learnings. Finding no existing cheatsheets she decided to create one herself using her favourite tool off communication, emoji.
“I didn’t set out to make a cheatsheet, really. Nor did I set out to make an emoji cheatsheet. But a few things in my research on this subject lead me on this path:
- There aren’t that many (good) machine learning cheatsheets that I could find. If you have one please share it!
- The machine learning cheatsheets I did find did nothing to demystify how to actually use the algorithms.
- The cheatsheets I found weren’t fun to look at! I’m a highly visual person and having a box for regression, a box for classification and a box for clustering makes a lot of sense to me.” Barry wrote in a blog post.
Barry also wrote a step by step guide to her cheat sheet:
Initially, I just had the dots for supervised, unsupervised and reinforcement, but it was requested that I add a box for what makes each different. Like many fancy words (such as stochastic), supervised and unsupervised learning are tossed around like no one’s business. Make sure you know what they mean when you dish them out.
Lots of things are “fundamental data science”, but these things really are. Linear regression especially is a thing you learn constantly in other contexts and may not realize it’s used in data science.
I’ve done a lot of visual design, and this may be one of those I’m most proud of. I think this cell communicates a lot of important information, and really makes sense of what I was trying to accomplish. Beginning with the categories of learning styles – it’s clear that neural net is queen bey of complicated algorithms, but with great power comes great responsibility.
The emoji choices throughout this sheet were very intentional. I could explain my logic for each, but we might be here all day. Random forest is really the least serious of them (but also my favorite).
Also, shoutout to Naive Bayes for having at least three ways to call it from sklearn.
Clustering is an extremely useful subset of data science that’s like classification, but not quite. Therefore, it needs its own cell. With teddy bears.
The Curse of Dimensionality
I added this section because the more research I did on the algorithms themselves, the more I realized that feature reduction is mega key to making any of them work. I have experienced this during projects and I would be surprised if any data scientist hadn’t.
Note: tradeoff between calling t-SNE and possibly forgetting what the acronym stands for, and actually spelling “neighbor” correctly.
Our * Wildcard * Section
There are a few important things in data science that might have just ended up having their own section in a perfect world where a 3-D cheatsheet is a thing.
1. Bias Variance Tradeoff is the most baseline element that describes data science as an art – you will have to strike a balance between noisy data and biased but low variance data for models you create.
2. Underfitting/overfitting – this is similar to the bvt, but you need to make sure you have enough data to not overfit a model, and have the model be descriptive enough to generalize.
3. Inertia – entropy, in its simplest form.
4. We talk about these four items a lot with classification – I thought it was important to know what we’re really talking about.