We will ingest the SMS spam dataset for this use case. This dataset is available from Federal University in Sao Carlos, Brazil.
The link to the dataset is as follows: https://archive.ics.uci.edu/ml/datasets/SMS+Spam+Collection.
The dataset consists of a collection of 425 items from the Grumbletext website. Grumbletext is a site in the UK where users manually report spam text messages. In addition to the spam text messages, 3,375 SMS messages that were randomly chosen from the National University of Singapore SMS Corpus (NSC) have also been added to the dataset. Another 450 benign SMS messages were collected from Caroline Tag's PhD thesis, available at http://etheses.bham.ac.uk/253/1/Tagg09PhD.pdf.
The dataset is divided into training and testing data, and, for featurization, the tf–idf method is used.
The dataset looks as follows:
