In this article, we will use PyTorch to build an image classifier that can distinguish between dogs vs cats. We will use a convolutional neural network (CNN) architecture to achieve this.


Image classification is a common computer vision task that involves categorizing an image into one of several classes. Convolutional neural networks (CNNs) are a popular deep learning technique that have been very successful in image classification tasks.

The task of distinguishing between dogs and cats is a classic image classification problem. In this article, we will use PyTorch to build a CNN that can accurately classify images of dogs and cats.


We will use the Dogs vs. Cats dataset from Kaggle. This dataset contains 25,000 images of dogs and cats, split evenly between the two classes.

To prepare the data for our CNN, we will first resize the images to a fixed size (224×224), normalize the pixel values, and split the data into training and validation sets.

import torch
import torchvision.transforms as transforms
from torchvision.datasets import ImageFolder

transform = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))

train_data = ImageFolder('train', transform=transform)
val_data = ImageFolder('val', transform=transform)

train_loader =, batch_size=32, shuffle=True)
val_loader =, batch_size=32, shuffle=False)

This code will resize the images to 224×224, normalize the pixel values to have zero mean and unit variance, and split the data into training and validation sets. We will use a batch size of 32 for both the training and validation loaders.


We will use a CNN architecture consisting of several convolutional and pooling layers, followed by several fully connected layers.

import torch.nn as nn
import torch.nn.functional as F

class CNN(nn.Module):
    def __init__(self):
        super(CNN, self).__init__()
        self.conv1 = nn.Conv2d(3, 32, kernel_size=3, stride=1, padding=1)
        self.pool1 = nn.MaxPool2d(kernel_size=2, stride=2)
        self.conv2 = nn.Conv2d(32, 64, kernel_size=3, stride=1, padding=1)
        self.pool2 = nn.MaxPool2d(kernel_size=2, stride=2)
        self.conv3 = nn.Conv2d(64, 128, kernel_size=3, stride=1, padding=1)
        self.pool3 = nn.MaxPool2d(kernel_size=2, stride=2)
        self.fc1 = nn.Linear(128 * 28 * 28, 512)
        self.fc2 = nn.Linear(512, 2)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        x = self.pool1(x)
        x = F.relu(self.conv2(x))
        x = self.pool2(x)
        x = F.relu(self.conv3(x))
        x = self.pool3(x)
        x = x.view(-1, 128 * 28 * 28)
        x = F.relu(self.fc1(x))
        x = self.fc2(x)
        return x

This code defines a CNN model using the nn.Module class in PyTorch. The architecture consists of three convolutional layers with max pooling, followed by two fully connected layers. The forward method defines the forward pass of the model.


We will train the model using the cross-entropy loss function and the Adam optimizer.

import torch.optim as optim

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = CNN().to(device)

criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

for epoch in range(10):
    running_loss = 0.0
    for i, data in enumerate(train_loader, 0):
        inputs, labels = data
        inputs, labels =,


        outputs = model(inputs)
        loss = criterion(outputs, labels)

        running_loss += loss.item()
        if i % 100 == 99:
            print('[%d, %5d] loss: %.3f' %
                  (epoch + 1, i + 1, running_loss / 100))
            running_loss = 0.0

This code will train the model for 10 epochs, printing the loss every 100 batches. We use the GPU if available to speed up training.


After training, we will evaluate the model on the validation set.

correct = 0
total = 0
with torch.no_grad():
    for data in val_loader:
        images, labels = data
        images, labels =,
        outputs = model(images)
        _, predicted = torch.max(, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

print('Accuracy of the network on the %d validation images: %d %%' % (len(val_data), 100 * correct / total))

This code will evaluate the model on the validation set and print the accuracy.


In this article, we used PyTorch to build a CNN that can distinguish between images of dogs and cats. We first prepared the data by resizing the images and normalizing the pixel values. We then defined the CNN architecture and trained the model using the cross-entropy loss function and the Adam optimizer. Finally, we evaluated the model on the validation set and achieved a high accuracy.


  1. What is the Dogs vs. Cats dataset?

The Dogs vs. Cats dataset is a popular image classification dataset that contains 25,000 images of dogs and cats.

  1. What is a convolutional neural network?

A convolutional neural network is a type of deep learning model that is commonly used in image classification tasks.

  1. What is the cross-entropy loss function?

The cross-entropy loss function is commonly used in classification tasks to measure the difference between the predicted probabilities and the true labels.

  1. What is the Adam optimizer?

The Adam optimizer is a popular optimization algorithm that is commonly used to train deep learning models.

  1. Can this code be used to classify other types of images?

Yes, this code can be modified to classify other types of images by changing the dataset and possibly the architecture of the CNN.

Leave a Reply

Your email address will not be published. Required fields are marked *