Skip to main content

Command Palette

Search for a command to run...

Behind the Pixels: The Math Driving Image Editing

Discover how linear algebra shapes the tools you use every day—and why it’s more fascinating than you think

Updated
13 min read
Behind the Pixels: The Math Driving Image Editing
A

Hi there! 👋 I'm Aditya Chaturvedi, a passionate software engineer who loves solving problems, building creative solutions, and sharing knowledge with others. I started this blog, as a way to document my journey in technology, explore exciting ideas, and connect with like-minded individuals. Whether it's coding tips, mathematics, AI, or musings on the ever-evolving tech landscape, you'll find it all here.

Introduction

Have you ever edited a photo on your phone? Cropped, rotated, or stretched an image to make it perfect? Behind these seemingly simple tools lies the elegant branch of mathematics called linear algebra. Whether you are tilting a picture for better alignment, skewing it for a creative effect, or rotating it to the perfect angle, you are unknowingly relying on mathematical transformations.

In this blog, we will:

  1. Explore common image editing operations (rotate, crop, resize, etc.) that we use every day.

  2. Show how Python’s libraries like Pillow can achieve these operations with built-in functions.

  3. Next we will, peel back the layers to reveal how these tools work under the hood using linear algebra, followed by implementing them from scratch.

By the end, I hope you will develop an appreciation towards how mathematics powers your favorite photo editing tools and gain the knowledge to even code your own basic editor!

Understanding common operations

Let’s start by listing some common transformations you’ve likely used in apps:

Rotation

This is probably the most obvious one. Rotation involves changing the orientation of image by rotating the image by 90°, 180°, or at a custom angle.

In image editing, rotation is a common operation. For example, when we take the image of a document upside down and want to correct it.

Scaling

Scaling changes the size of an image while preserving its overall structure and proportions. This transformation can either enlarge or shrink an image, making it a key operation in image editing.

An infamous example of scaling I can think of is when your passport application or DMV form forces you to provide your photograph in a certain dimension. More commonly, digital zoom (ex: zooming an image after you take it) involves scaling of a portion of image.

Technically what you see in the GIF above is an example of scaling with crop (as the app only shows a portion of the image) but you get the idea.

Reflection

Reflection is a transformation that creates a mirror image of an object or image across a specified axis. It flips the image either horizontally, vertically, or diagonally, depending on the axis of reflection. This operation is often used to reverse the orientation of an image or create symmetrical effects.

Shear

A shear transformation skews an image along one axis (either horizontal or vertical), effectively shifting one axis while keeping the other axis fixed. This creates a distorted, parallelogram-like effect. Horizontal shear shifts pixels horizontally, with the vertical coordinate remaining unchanged. Vertical shear shifts pixels vertically, with the horizontal coordinate remaining unchanged However, it is possible to apply more than 1 shear at the same time.

Shear is an uncommon but useful operation. They are used to fix perspective issues in photographs. For example, if you take a picture of a tall building from an angle, the building may appear to "lean." A combination of horizontal and vertical shear can adjust the image to make the building look straight and properly aligned. Or you may want to add a motion effect to object by performing a horizontal shear.

I could not find this operation on Google Photos so here is a reference from Canva.

Here is another illustration to help understand how this operation helps in perspective correction. The original image on left is tilted and the sheared one is on the right. Notice the extra blacked out area caused on the bottom left of right image that unwillingly discloses the degree of shear.

Cropping

Probably the most well understood transformation. Cropping involves slicing and removing out a portion of the image. I will not go into much detail here.

Spoiler Alert: Cropping is not a linear transformation. It is simply deleting certain pixels of an image but I mentioned it here for completeness.

These transformations seem intuitive, but underneath, they’re all about modifying the grid of pixel coordinates. Lets see how.

Linear transformations

Lets understand how these operations can be defined using linear transformations. For simplicity, let us pick the rotation operation. Rotation of an image 90˚ clockwise can achieved by rotating each of the points in the image by 90˚ clockwise.

To illustrate this, we first need to establish a coordinate system for the image. Here, each point is represented as a vector in 2D space. A coordinate system is defined by the span of its basis vectors. For simplicity, you can think of basis vectors as the x-axis and y-axis. By definition, basis vectors must be of unit length, so (1, 0) and (0, 1) are commonly used as the basis vectors for this space.

We intentionally place the origin at the axis of rotation because the axis itself remains unaffected by the rotation. This simplifies the process and ensures consistency in the transformation.

Now lets look at the rotation example and notice what it does to the basis vectors. (1, 0) becomes (0, -1) and (0, 1) becomes (1, 0).

In linear algebra terms, we can represent this transformation as a matrix.

$$A = \begin{bmatrix} 0& 1\\ -1& 0 \end{bmatrix}$$

A careful reader will quickly notice that the 1st column is simply the new position of the X basis vector (called i-hat in mathematical nomenclature) and the second column is the new position for the Y basis vector (also called j-hat by traditional nomenclature). And this is the secret behind linear transformations.

They are functions that are represent the new position of the basis vectors after applying the transformation. Given that every vector in a space can be defined as a combination of the two basis vectors, we do not need to define the changes for each individual vector. We can simply define the new position of the two basis vectors.

Applying this transformation to the image simply means performing a matrix multiplication between our transformation function and all the points (or vectors) in the image. This will give the new position of each vector.

Here is an example:

$$\vec{v_o} = \begin{bmatrix} 2\newline 2 \end{bmatrix}$$

$$\begin{align} \vec{v_n} => \begin{bmatrix} x\newline y \end{bmatrix} = \begin{bmatrix} 0& 1\newline -1& 0 \end{bmatrix} \begin{bmatrix} 2\newline 2 \end{bmatrix} = \begin{bmatrix} 2\newline -2 \end{bmatrix} \end{align}$$

Intuitively, you should be able to validate the correctness of the above output. Next, to do is as a single operation for every point, we can represent the image as a matrix as well. Consider a square image of 128px x 128px, we can denote each pixel as a vector (x, y). As the axis of rotation is in the center, the range of x and y where \(x,y \in Integers\) will be as follows:

$$x \in [-64, 64) \qquad y \in [-64, 64)$$

The goal of the linear transformation is to find the new position of each pixel. Hence, the full transformation can be represented as:

$$NewCordinates = \begin{bmatrix} 0& 1\newline -1& 0 \end{bmatrix} \begin{bmatrix} -64& -64& ...& -64& ...& 63\newline -64& -63& ...& \ \ \ 63& ...& 63 \end{bmatrix}$$

where the shape of the image matrix is (2, 128²) . We can move each of the pixels from the old location to the new location and there you go, we have rotated the image.

Reflection

Reflection is also as easy as rotation. For a square image, reflection can be done either along the `x-axis or y-axis or the two diagonals. Similar to rotation example above, this transformation can also be easily represented by only focusing on the new locations of the basis vectors.

$$\begin{align} Reflect_X = \begin{bmatrix} 1& 0\newline 0& -1 \end{bmatrix} \newline \newline Reflect_Y = \begin{bmatrix} -1& 0\newline 0& 1 \end{bmatrix} \end{align}$$

Scaling

Scaling can be done along one or both the axis. Also, it can either stretch or squish the image. Here is the general representation of scaling function.

$$\begin{align} ScaleFn = \begin{bmatrix} s_x& 0\newline 0& s_y \end{bmatrix} \end{align}$$

where \(S_x\) and \(S_y\) are the scaling factor. To stretch an image by 2 times in the x direction:

$$\begin{align} Scale2X = \begin{bmatrix} 2& 0\newline 0& 1 \end{bmatrix} \end{align}$$

We can multiply the locations of all the pixels to find their new position. However, this causes an interesting problem. Because we stretched the image by 2x in X-axis, the size of the new image will be 256 × 128 compared to 128 × 128. That is an increase of 16384 pixels. Also, each point along X axis will be multiplied by a factor of 2 which means there will be no pixels at odd number locations anymore.

Old    =>  New
(1, 0) =>  (2, 0)
(2, 0) =>  (4, 0)
(3, 0) =>  (6, 0)
?      =>  (1, 0)
?      =>  (3, 0)

A simple solution to fix this is to duplicate or drop certain pixels:

  • Scaling up (Enlarging): Pixels are interpolated or repeated to fill the new, larger grid. Advanced techniques include bilinear interpolation.

  • Scaling down (Shrinking): Pixels are sampled (e.g., skipping some rows/columns), potentially losing some detail.

Shear

Following the pattern, the shear transformation also can be done along either axis. For example, Shear on X-axis (Horizontal shear) will preserve the x coordinates of all the points and only change the y coordinates. The degree of change is called the shear factor (reminds me of fear factor the tv show).

$$\begin{align} Shear_X = \begin{bmatrix} 1& k\newline 0& 1 \end{bmatrix} \newline \newline Shear_Y = \begin{bmatrix} 1& 0\newline k& 1 \end{bmatrix} \end{align}$$

This operation also changes the dimensions of the image. A tool may choose to resize the image to retain all the pixels or drop the pixels going out of bounds. Just like scaling, the shear operation also involves interpolation or sampling of pixels.

Talk is cheap. Show me the code

Let's dig into implementing these transformations in Python and essentially build the backend for our super basic image editor. First, we will reproduce these transformations using a library like Pillow, and next, we will implement one of the function from scratch.

Pillow implementation

Pillow provides the Image module for performing transformations. It supports several high-level APIs like rotate and resize, as well as a low-level API image.transform(..) that performs all types of transformations based on the correct values of the transformation matrix. Let's start by writing all the code.

# conda install pillow
from PIL import Image


def shear_image(image_path, shear_factor, axis, output_path):
    """
    Applies a shear transformation to an image.

    :param image_path: Path to the input image file.
    :param shear_factor: Shear factor (a float, positive or negative).
    :param axis: 'horizontal' or 'vertical' to specify the direction of shear.
    :param output_path: Path to save the sheared image.
    :return: The sheared image as a PIL Image object.
    """
    # Open the input image
    image = Image.open(image_path)

    # Determine the shear matrix
    if axis == "horizontal":
        shear_matrix = (1, shear_factor, 0, 0, 1, 0)  # Horizontal shear
    elif axis == "vertical":
        shear_matrix = (1, 0, 0, shear_factor, 1, 0)  # Vertical shear
    else:
        raise ValueError("Axis must be 'horizontal' or 'vertical'.")

    # Apply the shear transformation
    sheared_image = image.transform(
        image.size,
        Image.Transform.AFFINE,
        shear_matrix,
        resample=Image.Resampling.BICUBIC,
    )

    # Save the sheared image
    sheared_image.save(output_path)
    print("Success..")

    return sheared_image

def scale_image(image_path, scale_x, scale_y, output_path):
    """
    Scales an image using Pillow.

    :param image_path: Path to the input image file.
    :param scale_x: Scaling factor along the x-axis (width).
    :param scale_y: Scaling factor along the y-axis (height).
    :param output_path: Path to save the scaled image.
    :return: The scaled image as a PIL Image object.
    """
    # Open the input image
    image = Image.open(image_path)

    # Get the original dimensions
    original_width, original_height = image.size

    # Calculate the new dimensions
    new_width = int(original_width * scale_x)
    new_height = int(original_height * scale_y)

    # Resize the image
    scaled_image = image.resize((new_width, new_height), resample=Image.Resampling.BICUBIC)

    scaled_image.save(output_path)
    print("Succes...")

    return scaled_image

def rotate_image(image_path, angle, output_path, expand=True):
    """
    Rotates an image by a specified angle using Pillow.

    :param image_path: Path to the input image file.
    :param angle: The angle (in degrees) to rotate the image. Positive values rotate counterclockwise.
    :param output_path: Path to save the rotated image.
    :param expand: Whether to expand the output image to fit the entire rotated image.
    :return: The rotated image as a PIL Image object.
    """
    # Open the input image
    image = Image.open(image_path)

    # Rotate the image
    rotated_image = image.rotate(
        angle, resample=Image.Resampling.BICUBIC, expand=expand
    )

    rotated_image.save(output_path)
    print("Succes...")

    return rotated_image


if __name__ == "__main__":
    sheared_image = shear_image(
        image_path="mona_lisa.jpeg",  # Replace with your image file path
        shear_factor=-0.1,  # Shear factor (positive or negative)
        axis="horizontal",  # Shear direction: 'horizontal' or 'vertical'
        output_path="mona_lisa_sheared.jpg",
    )

    scale_image(
        image_path="mona_lisa.jpeg",  # Replace with the path to your image
        scale_x=2.0,                 # Scale factor for width
        scale_y=1,                   # Scale factor for height
        output_path="mona_lisa_scaled.jpg"
    )

    rotate_image(
        image_path="mona_lisa2.jpg",  # Replace with the path to your image
        angle=45,  # Angle of rotation in degrees (e.g., 45° counterclockwise)
        output_path="mona_lisa2_rotate.jpg",  # Path to save the rotated image
        expand=True,  # Expand the canvas to fit the rotated image
    )

Note that the transformation matrix in shear_image is of size=6, while we have discussed a matrix of size=4 in all the discussion above. This is because Pillow also supports translation, i.e., moving the center of the image to a different point in space (and thus moving the entire image).

By now, you get the idea of how all these operations can be handled by Pillow in almost a single line of code. Next time, when you get stuck on an image editing task (which often happens to Mac users), do not go searching for the right software to install; simply write a small code in Pillow.

Get your hands dirty implementation

This is the most exciting part of the blog. Let's implement one of the transformations without using any library support. For simplicity, I will demonstrate a 90˚ clockwise rotation. We will use numpy to handle the matrix operations for us.

import numpy as np
import matplotlib.image as mpimg
import imageio


def rotate_image_90_clockwise(image_path, output_path):
    """
    Rotates an image by 90 degrees clockwise using a mathematical transformation matrix.

    :param image_path: Path to the input image file.
    :param output_path: Path to save the rotated image.
    """
    # Read the input image
    image = mpimg.imread(image_path)

    # Convert image to numpy array and get its dimensions
    data = np.asarray(image)
    height, width, channels = data.shape

    # Print the data type and shape for debugging
    print(f"Image data type: {type(data)}")
    print(f"Original image shape: {data.shape}")

    # Initialize an array for the rotated image
    rotated_image = np.zeros((width, height, channels), dtype=np.uint8)

    # Define the rotation matrix for 90 degrees clockwise
    rotation_matrix = np.array([[0, 1], [-1, 0]])

    # Calculate the center of the image
    center_x, center_y = height // 2, width // 2

    # Apply the rotation using the transformation matrix
    for x in range(-center_x, height - center_x):
        for y in range(-center_y, width - center_y):
            original_point = np.array([[x], [y]])
            rotated_point = np.matmul(rotation_matrix, original_point)
            new_x = rotated_point[0][0] + center_y
            new_y = rotated_point[1][0] + center_x
            if 0 <= new_x < width and 0 <= new_y < height:
                rotated_image[new_x, new_y] = data[x + center_x, y + center_y]

    # Save the rotated image to the specified output path
    imageio.imwrite(output_path, rotated_image)

    print(f"Rotated image saved to {output_path}")
    return rotated_image


if __name__ == "__main__":
    rotate_image_90_clockwise("img/mona_lisa.jpg", "img/mona_lisa_rotated_raw.jpg")
  • As the axis of rotation is the center, we translate this image into a (-center_x, height - center_x) coordinate systems

  • For matrix multiplication operation we use np.matmul(..)

  • The code will work both RGB and Grayscale images.

With this, I have illustrated how we can apply linear algebra to perform rotation without using any library. The good part of this implementation is that we can plug in a different rotation matrix without modifying any other part of the code and achieve the desired result.

Conclusion

In today's AI world, everyone talks about linear algebra as a must-know topic for understanding deep learning and other algorithms. However, I believe the real appreciation of linear algebra is in understanding the power of linear transformations. I've understood linear algebra for a long time, but I was inspired to write this blog after watching some 3blue1brown videos, which I highly recommend. These videos helped me see linear algebra as more than just a tool for solving equations. The goal of this blog is to help you appreciate linear algebra so you'll want to explore it further and find answers to any parts that are unclear to you.

Acknowledgement

  • Thanks to 3blue1brown for the amazing videos. Millions of people fall in love with mathematics because of you, and I am just another fan.

  • Thanks to my dear friend Rahul Ranjan, an excellent creative writer and software engineer, for reviewing the draft and providing corrections.