Supervised vs unsupervised
Supervised learning
Supervised learning tasks tackle problems that have labeled data.
Intuition : It can be thought of a teacher who corrects a multiple choice exam. At the end you will get your average result as well as the correct answer to any of the questions.
Supervised learning can be further separated into two broad type of problems:
- Classification: here the output variable $y$ is categorical. We are basically trying to assign one or multiple classes to an observation. Example: is it a cat or not ?
- Regression: here the output variable $y$ is continuous. Example : how tall is this person ?
Unsupervised learning
Unsupervised learning tasks consist in finding structure in unlabeled data without a specific desired outcome.
Unsupervised learning can be further separated into multiple subtasks (the separation is not as clear as in the supervised setting):
- Clustering: can you find cluster in the data?
- Clustering: can you find cluster in the data ?
- Density estimation: what is the underlying probability distribution that gave rise to the data?
- Dimensionality reduction how to best compress the data?
- Outlier detection which data-point are outliers?
Due to the lack of ground truth labels, it difficult to measure the performance of such methods, but such methods are extremely important due to the amount of accessible unlabeled data.