An outlier is an observation in a dataset that appears to be inconsistent with the remainder of that set of data. Anomaly detection can be defined as a process that will detect such outliers. Anomaly detection can be categorized into the following types based on the percentage of labelled data:
- Supervised anomaly detection is characterized by the following:
- Labels available for both normal data and anomalies
- Similar to rare class mining/imbalanced classification
- Unsupervised anomaly detection (outlier detection):
- No labels; training set = normal + abnormal data,
- Assumption: anomalies are very rare
- Semi-supervised anomaly detection (novelty detection):
- Only normal data available to train
- The algorithm learns on normal data only