Spam Detector

This model attempts to identify whether the provided text is spam. This model only works with sentences up to 19 words long.

This was built using the tutorial provided by Google Developers which can be found here.


Prediction Input Values

How it Works

Step 1: Tokenization

This model works by separating sentences into their individual words. These separated words are referred to as tokens. For example, for this provided sentence:

"The quick brown fox jumped over the lazy dog"

A list of tokens is produced that looks like this: [the, quick, brown, fox, jumped, over, the, lazy, dog].

Step 2: Mapping to a Tensor

These models have their own dictionary of words to use. Since most models don't handle non-numerical data well, each word in the dictionary is mapped to a number. The dictionary is then used with the list of tokens to create a list of numbers that correspond to those words in the dictionary, which the model can use.

This particular model uses "0" to fill empty space in the list (to ensure it's always 20 elements long), "1" to signify the start of a sentence, and "2" for words not found in its dictionary.

The list that would get passed to the model for "The quick brown fox jumped over the lazy dog" would be:

[1, 54, 212, 2, 2, 821, 2, 54, 2, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

Fun fact: These lists are known as tensors, which is how the popular library TensorFlow got its name.

Step 3: Prediction

The model has each number assigned a weight that represents how likely that it's corresponding word is to be used a sentence that is spam. The more words there are with high spam probability in the provided sentence, The higher the probability that the model predicts that the provided sentence is spam.