Classification

Classification is one of the classical problems in machine learning. Data under consideration could belong to one class or another, for example, if the images provided are data, they could be pictures of cats or dogs. Thus, the classes, in this case, are cats and dogs. Classification means identifying the label or class of the objects under consideration. Classification falls under the umbrella of supervised machine learning. In classification problems, a training dataset is provided that has features or inputs and their corresponding outputs or labels. Using this training dataset, a model is trained; in other words, the parameters of the model are computed. The trained model is then used on new data to find its correct labels.

Classification problems can be of two types: binary class or multiclass. Binary class means that the data is to be classified into two distinct and discrete labels; for example, the patient has cancer or the patient does not have cancer, and the images are of cats or dogs and so on. Multiclass means that the data is to be classified among multiple classes, for example, an email classification problem will divide emails into social media emails, work-related emails, personal emails, family-related emails, spam emails, shopping offer emails, and so on. Another example would be of pictures of digits; each picture could be labeled between 0 and 9, depending on what digit the picture represents. In this chapter, we will look at examples of both kinds of classification.

The most popular method for classification is logistic regression. Logistic regression is a probabilistic and linear classifier. The probability that the vector of input features belongs to a specific class can be described mathematically by the following equation: 

In the preceding equation, the following applies:

  • Y represents the output
  • i represents one of the classes
  • x represents the inputs
  • w represents the weights
  • b represents the biases
  • z represents the regression equation 
  • ϕ represents the smoothing function (or model, in our case)

The ϕ(z) function represents the probability that x belongs to class i when w and b are given. Thus, the model has to be trained to maximize the value of this probability.