Dynamic Computation Graphs and torch.autograd.Function

Dynamic computation graphs are a cornerstone of modern deep learning frameworks, such as PyTorch. They allow for on-the-fly construction of the computational graph as operations are executed. This is different from static computation graphs, where the graph is defined and compiled before execution.

What sets dynamic computation graphs apart is their flexibility. In a dynamic environment, you can define, change, and execute nodes as needed while the program is running. This is particularly useful for models that involve conditional execution, loops, and recursive functions.

For example, consider the following simple PyTorch code that creates a dynamic computation graph:

import torch

# Create tensors.
x = torch.tensor(1., requires_grad=True)
y = torch.tensor(2., requires_grad=True)

# Perform operations.
z = x * y

# Calculate gradients.
z.backward()

print(x.grad) # Output: tensor(2.)
print(y.grad) # Output: tensor(1.)

In the code above, the graph is built step-by-step. First, two tensors x and y are created with requires_grad set to True to track their gradients. Then, an operation is performed to create a new tensor z. When z.backward() is called, the gradients are calculated dynamically, and the graph is constructed at runtime, allowing for x.grad and y.grad to be computed.

TP-Link Deco BE63 Tri-Band WiFi 7 BE10000 Whole Home Mesh System - 6-Stream 10 Gbps, 4x2.5G Ports Wired Backhaul, 4X Smart Internal Antennas, VPN, MU-MIMO, HomeShield, Free Expert Support (3-Pack) #1

TP-Link Deco BE63 Tri-Band WiFi 7 BE10000 Whole Home Mesh System - 6-Stream 10 Gbps, 4x2.5G Ports Wired Backhaul, 4X Smart Internal Antennas, VPN, MU-MIMO, HomeShield, Free Expert Support (3-Pack)

(4358073)

24% Off ~~$489.99~~ $369.99 (as of December 9, 2025 09:04 GMT +00:00 - )

𝗦𝘂𝗽𝗲𝗿𝗰𝗵𝗮𝗿𝗴𝗲𝗱 𝗕𝗘𝟭𝟬𝟬𝟬𝟬 𝗧𝗿𝗶-𝗕𝗮𝗻𝗱 𝗪𝗶-𝗙𝗶 𝟳 𝗦𝗽𝗲𝗲𝗱𝘀: Features cutting-edge Wi-Fi 7 technology, including Multi-Link Operation, Multi-RUs, 4K-QAM, and 320 MHz channels. Delivers speeds of 5188 Mbps on 6GHz, 4324 Mbps on 5GHz, and 574 Mbps on 2.4GHz. 𝗩𝗮𝘀𝘁 𝗠𝗲𝘀... read more

Understanding torch.autograd.Function

torch.autograd.Function is at the heart of this dynamic graph construction in PyTorch. It’s a base class for all operations that support automatic differentiation. Understanding how torch.autograd.Function works is important for any PyTorch user, especially those looking to create custom operations.

Each instance of torch.autograd.Function has two primary methods: forward() and backward(). The forward() method is what actually performs the computation. It takes in inputs, performs the required operation, and returns the output. On the other hand, the backward() method is responsible for calculating the gradients. It receives the gradient of the output tensor as a parameter and computes the gradient of the input tensor.

import torch
from torch.autograd import Function

class MyMultiply(Function):
    @staticmethod
    def forward(ctx, a, b):
        # ctx is a context object that can be used to stash information
        # for backward computation
        ctx.save_for_backward(a, b)
        return a * b

    @staticmethod
    def backward(ctx, grad_output):
        # Retrieve stored data
        a, b = ctx.saved_tensors
        # Compute gradient of input with respect to output
        grad_a = grad_output * b
        grad_b = grad_output * a
        return grad_a, grad_b

# To apply our custom function
a = torch.tensor(1., requires_grad=True)
b = torch.tensor(2., requires_grad=True)
output = MyMultiply.apply(a, b)
output.backward()
print(a.grad)  # Output: tensor(2.)
print(b.grad)  # Output: tensor(1.)

The context object, ctx, is used inside the forward() and backward() methods to store information this is needed to compute gradients. The method ctx.save_for_backward() can be used to save any variables that will be needed in the backward() pass. In the example above, both inputs a and b are saved since they’re needed to compute the gradients during the backward pass.

Creating Custom Functions with torch.autograd.Function

Creating custom functions with torch.autograd.Function is a powerful feature of PyTorch that enables users to define their own forward and backward passes. This can be particularly useful when dealing with operations that are not included in the standard PyTorch library, or when optimizing specific parts of a model. The process for creating a custom function involves subclassing torch.autograd.Function and implementing the forward() and backward() static methods.

Let’s see an example of how to create a custom activation function using torch.autograd.Function:

import torch
from torch.autograd import Function

class CustomReLU(Function):
    @staticmethod
    def forward(ctx, input):
        ctx.save_for_backward(input)
        return input.clamp(min=0)

    @staticmethod
    def backward(ctx, grad_output):
        input, = ctx.saved_tensors
        grad_input = grad_output.clone()
        grad_input[input < 0] = 0
        return grad_input

# To apply the custom ReLU function
input_tensor = torch.tensor([-2, -1, 0, 1, 2], dtype=torch.float32, requires_grad=True)
output = CustomReLU.apply(input_tensor)
output.backward(torch.ones_like(input_tensor))

print(input_tensor.grad)  # Output: tensor([0., 0., 0., 1., 1.])

In the example above, CustomReLU is a custom implementation of the ReLU activation function. The forward() method computes the ReLU of the input tensor, while the backward() method computes the gradient of the input tensor with respect to the output tensor. Notice that during the backward pass, we clone the grad_output and set the gradient to zero wherever the input tensor is less than zero, following the ReLU derivative rules.

Another aspect to consider when creating custom functions with torch.autograd.Function is that they should be able to handle different types of inputs, such as tensors with different shapes or devices (CPU or GPU). It’s also important to think the numerical stability of the forward and backward methods, especially when dealing with very small or large numbers.

Utilizing Dynamic Computation Graphs in PyTorch

Utilizing the dynamic computation graph in PyTorch is straightforward once you understand how the torch.autograd.Function works. The beauty of PyTorch’s dynamic computation graph lies in its ability to handle complex neural network architectures that have conditional constructs and loops.

Consider the following example where we build a simple RNN from scratch using PyTorch’s dynamic computation graph:

import torch
from torch.autograd import Function

class MyRNNCell(Function):
    @staticmethod
    def forward(ctx, x, hx, wx, wh, b):
        ctx.save_for_backward(x, hx, wx, wh)
        h_next = torch.tanh(x @ wx + hx @ wh + b)
        return h_next

    @staticmethod
    def backward(ctx, grad_h_next):
        x, hx, wx, wh = ctx.saved_tensors
        grad_x = grad_h_next @ wx.t()
        grad_hx = grad_h_next @ wh.t()
        grad_wx = x.t() @ grad_h_next
        grad_wh = hx.t() @ grad_h_next
        grad_b = grad_h_next.sum(0)
        return grad_x, grad_hx, grad_wx, grad_wh, grad_b

# Define the parameters
wx = torch.randn((3, 3), requires_grad=True)
wh = torch.randn((3, 3), requires_grad=True)
b = torch.randn(3, requires_grad=True)

# An input sequence of length 5
xs = [torch.randn(3, requires_grad=True) for _ in range(5)]
hx = torch.zeros(3, requires_grad=True)

# Forward pass for each time step
for i in range(len(xs)):
    hx = MyRNNCell.apply(xs[i], hx, wx, wh, b)

# Backward pass
hx.backward(torch.ones_like(hx))

print(wx.grad)  # Gradient for weight wx
print(wh.grad)  # Gradient for weight wh
print(b.grad)   # Gradient for bias b

In the above example, we’re defining a custom RNN cell by subclassing torch.autograd.Function. The forward() method computes the next hidden state, and the backward() method computes the gradients of the loss with respect to each of the inputs. The RNN cell is then used in a loop to process an input sequence, demonstrating how dynamic computation graphs can elegantly handle sequences of varying lengths, making it perfect for tasks such as time-series prediction or language modeling.

It is also worth mentioning that when you are working with dynamic computation graphs, you can easily integrate control flow statements like if-else conditions or for and while loops within your model’s architecture. That’s not as straightforward when working with static computation graphs, as they require the graph to be defined beforehand.

Here’s an example that uses a condition within the computation graph:

class ConditionalComputation(Function):
    @staticmethod
    def forward(ctx, x):
        ctx.save_for_backward(x)
        if x.sum() > 0:
            output = x * 2
        else:
            output = x / 2
        return output

    @staticmethod
    def backward(ctx, grad_output):
        x, = ctx.saved_tensors
        if x.sum() > 0:
            grad_input = grad_output * 2
        else:
            grad_input = grad_output / 2
        return grad_input

x = torch.tensor([-1., 1., -1., 1.], requires_grad=True)
output = ConditionalComputation.apply(x)
output.backward(torch.ones_like(x))
print(x.grad)  # Output will be tensor([0.5, 2., 0.5, 2.])

In this scenario, we’ve defined a custom function that performs different computations based on the sum of the input tensor. This ability to integrate Pythonic control flow into the computation graph is a strong advantage of PyTorch’s dynamic graphs, providing flexibility and ease of use for researchers and developers alike.

Advanced Techniques for Working with torch.autograd.Function

Advanced Techniques for Working with torch.autograd.Function

When working with torch.autograd.Function, it is crucial to understand how to optimize and extend its capabilities. Here are some advanced techniques that can further enhance your work with PyTorch’s dynamic computation graphs.

One advanced technique is to implement custom double-backward functions. Double-backward functions are necessary when you want to compute higher-order derivatives. In PyTorch, this can be achieved by defining an additional backward method, which calculates the gradient of gradients.

from torch.autograd import gradcheck

class MyDoubleBackwardFn(Function):
    @staticmethod
    def forward(ctx, x):
        return x ** 3

    @staticmethod
    def backward(ctx, grad_output):
        x, = ctx.saved_tensors
        return 3 * x ** 2 * grad_output

    @staticmethod
    def double_backward(ctx, grad_grad_output):
        x, = ctx.saved_tensors
        return 6 * x * grad_grad_output

# Check if the custom function passes the gradient check
x = torch.randn(1, requires_grad=True, dtype=torch.double)
test = gradcheck(MyDoubleBackwardFn.apply, x, eps=1e-6, atol=1e-4)
print(test)  # Should output True if the gradient is correct

Another technique is checkpointing, which helps save memory during training. Checkpointing works by trading compute for memory—it recomputes intermediate forward passes during the backward pass to save memory. PyTorch provides a torch.utils.checkpoint utility to implement this technique easily.

from torch.utils.checkpoint import checkpoint

class ExpensiveOperation(Function):
    @staticmethod
    def forward(ctx, x):
        # An operation that is expensive in terms of memory
        ctx.save_for_backward(x)
        return x ** 2

    @staticmethod
    def backward(ctx, grad_output):
        x, = ctx.saved_tensors
        return 2 * x * grad_output

x = torch.randn(1, requires_grad=True)
# Using checkpointing to save memory
y = checkpoint(ExpensiveOperation.apply, x)
y.backward()

For custom functions that have non-tensor inputs, it is essential to ensure that these inputs are wrapped using torch.nn.Parameter or explicitly passed to the ctx object, so they’re tracked by PyTorch’s autograd engine.

class FunctionWithNonTensorInput(Function):
    @staticmethod
    def forward(ctx, x, power):
        ctx.power = power
        return x ** power

    @staticmethod
    def backward(ctx, grad_output):
        return grad_output * ctx.power * x ** (ctx.power - 1), None

x = torch.tensor([2.], requires_grad=True)
output = FunctionWithNonTensorInput.apply(x, 3)
output.backward()
print(x.grad)  # Output: tensor([12.]) as the derivative of x^3 is 3*x^2

Lastly, when working with custom functions that may be used frequently, it is beneficial to register them as built-in functions. This can make the code cleaner and the custom function easier to reuse. The registration can be done using torch.autograd.function.register_function().

from torch.autograd.function import register_function

@register_function("my_custom_relu")
class CustomReLU(Function):
    # ... (implementation as before)

# Now we can use the function using its registered name
x = torch.tensor([-2, -1, 0, 1, 2], dtype=torch.float32, requires_grad=True)
output = torch.ops.my_custom_relu(x)
output.backward(torch.ones_like(x))

Dynamic Computation Graphs and torch.autograd.Function

TP-Link Deco BE63 Tri-Band WiFi 7 BE10000 Whole Home Mesh System - 6-Stream 10 Gbps, 4x2.5G Ports Wired Backhaul, 4X Smart Internal Antennas, VPN, MU-MIMO, HomeShield, Free Expert Support (3-Pack)

Understanding torch.autograd.Function

Creating Custom Functions with torch.autograd.Function

Utilizing Dynamic Computation Graphs in PyTorch

Advanced Techniques for Working with torch.autograd.Function

Comments

Leave a Reply Cancel reply

Coding for Kids: Python

Python QuickStart Guide

Python for Data Science in 100 Exercises

Python QuickStart Guide