Takes in data which may be raw or cleaned into features; mostly incomplete information and predicts its category. The task of a classification algorithm is to classify data in a discrete way - this or that? example pear or apple? An example could be an algorithm that takes in data as features like color, size and weight and gives the output as pear, apple and blueberry.
Regression predicts a value aka it is predictive of a continuous value. On the other hand classification is like a boolean value. Lets say you are given a bunch of data, like the altitude of an airplane, weather conditions and some other bunch of data and then you are asked to guess its speed. In classification you may put the speed into categories like slow, kinda fast and crazy fast but in this case a number like 300km/h would be more helpful when predicted by the algorithm.
- Logistic Regression
- Naive Bayes
- K-Nearest Neighbors(KNN)
- Markov Decision Process(MDP)
- Decision Trees/ Random Forest
- K-Means Clustering
- Support Vector Machines
A. Supervised Learning - Supervision :
The Model learns from training examples, provided by humans. In the fruit classification algorithm, the model is trained by giving images of the fruits and we have a human look at the picture and tell the algorithm that this a pear/apple/blueberry. Mostly used for discrimination(recognition) tasks. The training data, utilized to fine-tune the model, and the test data, employed to gauge its performance, are usually divided in an 80%/20% ratio. This division conceptually resembles our educational experience, wherein mathematics textbooks encompass both solved and unsolved problems, aiding our comprehension and application of mathematical concepts.
B. Unsupervised Learning :
Model is learning from data that is unlabeled without any human input other than how the algorithm is designed and the type of data the algorithm is exposed to. Mostly used for generative (imagination) tasks. No split between training and testing data.
C. Reinforcement Learning :
Here, The model learns the best strategy using a scenario given by humans. We are giving the model a lot of information even beyond the actual data we are even going to tell it some of the possible outcomes and some of the rules of the road. We are not labeling the data as the right or wrong choices but there will be some human input. Scenario here is : What action can you, the algorithm take as an actor in this scenario and how is the environment going to respond to this action. Mostly used for decision making tasks. (eg: robotics) Reinforcement learning is a type of Semi-supervised learning where you are giving the model some human guidance but also allowing it to learn from its own.
How many variables do you have?(age, height,…)
- More Features: the model is more optimized - consider a Support Vector Machine.
- Less Features: the model is less optimized - consider a Decision Tree.
- Point to note: Can you reduce the number of features to simplify your model?(Feature Selection)
- Some features have more predictive value than others - example if you want to predict tomorrow’s weather - an important feature to take in would be today’s weather and not something like what you ate for cereal today :)
Is your data linear?
- Linear: Consider a Support Vector Machine.
- Non Linear: Consider a Decision Tree.
How much data do you have, and how fast is your computer?
- Faster Training: Consider a logistic regression.
- Slower Training: Consider a random forest.
- More training = more accuracy
How much flexibility do you want in your training?
- Less Parameters: Consider Logistic Regression(4)
- More Parameters: Consider K-means Clustering(8)
- Split data into train/validation/ test to examine the parameter space.