In the past decade, machine learning has moved from scientific research labs into everyday web and mobile apps. Machine learning enables your applications to perform tasks that were previously very difficult to program, such as detecting objects and faces in images, detecting spam and hate speech, and generating smart replies for emails and messaging apps.
But performing machine learning is fundamentally different from classic programming. In this article, you’ll learn the basics of machine learning and will create a basic model that can predict the species of flowers based on their measurements.
How Does Machine Learning Work?
Table of Contents
Classic programming relies on well-defined problems that can be broken down into distinct classes, functions, and if–else commands. Machine learning, on the other hand, relies on developing its behavior based on experience. Instead of providing machine learning models with rules, you train them through examples.
There are different categories of machine learning algorithms, each of which can solve specific problems.
Supervised learning
Supervised learning is suitable for problems where you want to go from input data to outcomes. The common trait of all supervised learning problems is that there’s a ground truth against which you can test your model, such as labeled images or historical sales data.
Supervised learning models can solve regression or classification problems. Regression models predict quantities (such as the number of items sold or the price of stock) while classification problems try to determine the category of input data (such as cat/dog/fish/bird, fraud/not fraud).
Image classification, face detection, stock price prediction, and sales forecasting are examples of problems supervised learning can solve.
Some popular supervised learning algorithms include linear and logistic regression, support vector machines, decision trees, and artificial neural networks.
Unsupervised learning
Unsupervised learning is suitable for problems where you have data but instead of outcomes, you’re looking for patterns. For instance, you might want to group your customers into segments based on their similarities. This is called clustering in unsupervised learning. Or you might want to detect malicious network traffic that deviates from the normal activity in your enterprise. This is called anomaly detection, another unsupervised learning task. Unsupervised learning is also useful for dimensionality reduction, a trick that simplifies machine learning tasks by removing irrelevant features.
Some popular unsupervised learning algorithms include K-means clustering and principle component analysis (PCA).
Reinforcement learning
Reinforcement learning is a branch of machine learning in which an intelligent agent tries to achieve a goal by interacting with its environment. Reinforcement learning involves actions, states, and rewards. An untrained RL agent starts by randomly taking actions. Each action changes the state of the environment. If the agent finds itself in the desired state, it receives a reward. The agent tries to find sequences of actions and states that produce the most rewards.
Reinforcement learning is used in recommendation systems, robotics, and game-playing bots such as Google’s AlphaGo and AlphaStar.
Setting Up the Python Environment
In this post, we’ll focus on supervised learning, because it’s the most popular branch of machine learning and its results are easier to evaluate. We will be using Python, because it has many features and libraries that support machine learning applications. But the general concepts can be applied to any programming language that has similar libraries.
(In case you’re new to Python, freeCodeCamp has a great crash course that will get you started with the basics.)
One of the Python libraries often used for data science and machine learning is Scikit-learn, which provides implementations of popular machine learning algorithms. Scikit-learn is not part of the base Python installation and you must install it manually.
macOS and Linux come with Python preinstalled. To install the Scikit-learn library, type the following command in a terminal window:
pip install scikit-learn
Or for Python 3:
python3 -m pip install scikit-learn
On Microsoft Windows, you must install Python first. You can get the installer of the latest version of Python 3 for Windows from the official website. After installing Python, type the following command in a command-line window:
python -m pip install scikit-learn
Alternatively, you can install the Anaconda framework, which includes an independent installation of Python 3 along with Scikit-learn and many other libraries used for data science and machine learning, such as Numpy, Scipy, and Matplotlib. You can find the installation instruction of the free Individual Edition of Anaconda on its official website.
Step 1: Define the Problem
The first step to every machine learning project is knowing what problem you want to solve. Defining the problem will help you determine the kind of data you need to gather and give you an idea of the kind of machine learning algorithm you’ll need to use.
In our case, we want to create a model that predicts the species of a flower based on the measurements of the petal and sepal length and width.
This is a supervised classification problem. We’ll need to gather a list of measurements of different specimens of flowers and their corresponding species. Then we’ll use this data to train and test a machine learning model that can map measurements to species.
Continue reading A Primer on Machine Learning with Python on SitePoint.