Saving and Loading Models with torch.save and torch.load

In machine learning and deep learning, models are trained on large datasets to learn patterns and make predictions on new data. Once a model is trained, it is crucial to be able to save the model’s state so that it can be reused or shared without having to retrain it from scratch. This process of saving a model’s state is known as model serialization. In PyTorch, serialization is primarily achieved using the torch.save and torch.load functions.

Model serialization is not just about saving the weights of a trained model. It also involves saving the model’s architecture, hyperparameters, and training details that might be necessary for future inference or continued training. Serialization ensures the model can be loaded at a later time or on a different machine with the exact same state it was in when saved.

PyTorch uses a serialization library called Pickle, which is a Python-specific protocol for serializing and de-serializing object structures. When you save a model in PyTorch using torch.save, the function uses Pickle by default to serialize the model object to a file. Similarly, torch.load uses Pickle to de-serialize the file back into a PyTorch model object.

It is important to note that serialization is not inherently secure, as deserializing from an untrusted source can lead to security risks. PyTorch documentation advises caution when loading models from untrusted sources.

Saving Models with torch.save

When saving a model using torch.save, you have the option to save the entire model using Python’s pickle module, or just the model’s state_dict. The state_dict is a Python dictionary object that maps each layer to its parameter tensor. Saving only the state_dict is often recommended because it allows you to re-instantiate the model architecture and load the state_dict into it, which is more modular and can be used for fine-tuning or transfer learning on a different model architecture.

To save a model’s state_dict, you can use the following code:

torch.save(model.state_dict(), 'model_state_dict.pth')

torch.save(model.state_dict(), 'model_state_dict.pth')

Alternatively, if you want to save the entire model, you can pass the model object directly:

torch.save(model, 'model.pth')

torch.save(model, 'model.pth')

It is also possible to save more than just the model’s state_dict. For instance, you may want to save the optimizer’s state_dict, the epoch you ended on, the last loss or the last accuracy, etc. This can be useful for resuming training or analyzing the training process later. You can do this by passing a dictionary to torch.save:

torch.save({

'epoch': epoch,

'model_state_dict': model.state_dict(),

'optimizer_state_dict': optimizer.state_dict(),

'loss': loss,

...

}, 'model_checkpoint.pth')

torch.save({ 'epoch': epoch, 'model_state_dict': model.state_dict(), 'optimizer_state_dict': optimizer.state_dict(), 'loss': loss, ... }, 'model_checkpoint.pth')

torch.save({
    'epoch': epoch,
    'model_state_dict': model.state_dict(),
    'optimizer_state_dict': optimizer.state_dict(),
    'loss': loss,
    ...
}, 'model_checkpoint.pth')

When saving a model in PyTorch, it is crucial to understand that the serialized file is not a standalone file; it requires the original model definition to be rebuilt or the same model class to be available when loading the model back in. Therefore, it’s common practice to save the model class definition in the same script or module that contains the loading logic.

Loading Models with torch.load

Loading models in PyTorch is simpler with the torch.load function. This function allows you to load the serialized model or state_dict that was previously saved with torch.save. When loading a model’s state_dict, you need to initialize the model architecture first, and then load the state_dict into this model. Here’s an example of how to load a model’s state_dict:

# Initialize the model

model = MyModel()

# Load the state_dict

model.load_state_dict(torch.load('model_state_dict.pth'))

# Set the model to evaluation mode

model.eval()

# Initialize the model model = MyModel() # Load the state_dict model.load_state_dict(torch.load('model_state_dict.pth')) # Set the model to evaluation mode model.eval()

# Initialize the model
model = MyModel()

# Load the state_dict
model.load_state_dict(torch.load('model_state_dict.pth'))

# Set the model to evaluation mode
model.eval()

It is important to call model.eval() if you’re loading the model for inference, as this sets the model to evaluation mode, affecting layers like dropout and batch normalization that behave differently during training and inference.

If you saved the entire model object, you can load it back without needing to initialize the model architecture:

# Load the entire model

model = torch.load('model.pth')

# Set the model to evaluation mode

model.eval()

# Load the entire model model = torch.load('model.pth') # Set the model to evaluation mode model.eval()

# Load the entire model
model = torch.load('model.pth')

# Set the model to evaluation mode
model.eval()

When loading a checkpoint that includes more than just the model’s state_dict, such as the optimizer’s state_dict and other training metadata, you can load the file as a dictionary and access its contents:

# Load the checkpoint

checkpoint = torch.load('model_checkpoint.pth')

# Initialize the model and optimizer

model = MyModel()

optimizer = MyOptimizer()

# Load the model and optimizer state_dict

model.load_state_dict(checkpoint['model_state_dict'])

optimizer.load_state_dict(checkpoint['optimizer_state_dict'])

# Load other training metadata if necessary

epoch = checkpoint['epoch']

loss = checkpoint['loss']

...

# Load the checkpoint checkpoint = torch.load('model_checkpoint.pth') # Initialize the model and optimizer model = MyModel() optimizer = MyOptimizer() # Load the model and optimizer state_dict model.load_state_dict(checkpoint['model_state_dict']) optimizer.load_state_dict(checkpoint['optimizer_state_dict']) # Load other training metadata if necessary epoch = checkpoint['epoch'] loss = checkpoint['loss'] ...

# Load the checkpoint
checkpoint = torch.load('model_checkpoint.pth')

# Initialize the model and optimizer
model = MyModel()
optimizer = MyOptimizer()

# Load the model and optimizer state_dict
model.load_state_dict(checkpoint['model_state_dict'])
optimizer.load_state_dict(checkpoint['optimizer_state_dict'])

# Load other training metadata if necessary
epoch = checkpoint['epoch']
loss = checkpoint['loss']
...

It is crucial to match the model architecture and optimizer to those used when the checkpoint was created. If there’s a mismatch, the state_dicts won’t align and you’ll encounter errors when attempting to load them.

One of the best practices when using torch.load is to load the saved model or checkpoint in a map_location context, which allows you to map the saved model to a different device than the one it was saved on. That is particularly useful when loading a model saved on a GPU machine while you’re working on a CPU-only machine.

# Load the state_dict with map_location

model.load_state_dict(torch.load('model_state_dict.pth', map_location=torch.device('cpu')))

# Load the state_dict with map_location model.load_state_dict(torch.load('model_state_dict.pth', map_location=torch.device('cpu')))

# Load the state_dict with map_location
model.load_state_dict(torch.load('model_state_dict.pth', map_location=torch.device('cpu')))

Using torch.load with the map_location argument ensures that the tensors are loaded onto the specified device, enabling seamless device-agnostic model loading.

Best Practices for Model Saving and Loading

General Best Practices

When working with model serialization in PyTorch, it’s essential to adhere to some best practices to ensure that your models are saved and loaded correctly and efficiently. Here are a few tips to keep in mind:

Always make sure that the model architecture is defined consistently between saving and loading. Any changes in the model’s class definition can prevent the state_dict from being loaded correctly.
Keep track of versions of your model definitions and training scripts. This very important when revisiting models after some time or sharing them with others. Version control can help in replicating the environment in which the model was trained and serialized.
Document the model’s architecture, training process, and any special instructions required for loading the model. That’s especially important when sharing models with others or deploying them in different environments.
Always use the map_location argument when loading models, particularly when moving between different devices (e.g., from GPU to CPU).
Be cautious when loading models from untrusted sources. Deserialize only the models whose source you trust to avoid potential security risks.
Save checkpoints regularly during training. This can help in recovering from any unexpected interruptions and also allows you to analyze different stages of the training process.

Saving and Loading Best Practices in Code

Implementing these best practices in code can often mean the difference between a smooth and a frustrating experience with model serialization. Here are some code snippets that illustrate these practices:

# Consistent model definition

class MyModel(nn.Module):

# Model definition goes here

pass

# Save the model's state_dict

torch.save(model.state_dict(), 'model_state_dict_v1.pth')

# Load the model's state_dict

model = MyModel() # Ensure the model architecture is the same

model.load_state_dict(torch.load('model_state_dict_v1.pth', map_location='cpu'))

# Documenting the save with additional metadata

torch.save({

'model_version': '1.0.0',

'architecture': 'MyModel',

'state_dict': model.state_dict(),

'optimizer': optimizer.state_dict(),

# Additional information

}, 'model_with_metadata.pth')

# Loading the model with metadata

checkpoint = torch.load('model_with_metadata.pth', map_location='cpu')

model.load_state_dict(checkpoint['state_dict'])

print(f"Loaded model version {checkpoint['model_version']} with architecture {checkpoint['architecture']}")

# Consistent model definition class MyModel(nn.Module): # Model definition goes here pass # Save the model's state_dict torch.save(model.state_dict(), 'model_state_dict_v1.pth') # Load the model's state_dict model = MyModel() # Ensure the model architecture is the same model.load_state_dict(torch.load('model_state_dict_v1.pth', map_location='cpu')) # Documenting the save with additional metadata torch.save({ 'model_version': '1.0.0', 'architecture': 'MyModel', 'state_dict': model.state_dict(), 'optimizer': optimizer.state_dict(), # Additional information }, 'model_with_metadata.pth') # Loading the model with metadata checkpoint = torch.load('model_with_metadata.pth', map_location='cpu') model.load_state_dict(checkpoint['state_dict']) print(f"Loaded model version {checkpoint['model_version']} with architecture {checkpoint['architecture']}")

# Consistent model definition
class MyModel(nn.Module):
    # Model definition goes here
    pass

# Save the model's state_dict
torch.save(model.state_dict(), 'model_state_dict_v1.pth')

# Load the model's state_dict
model = MyModel()  # Ensure the model architecture is the same
model.load_state_dict(torch.load('model_state_dict_v1.pth', map_location='cpu'))

# Documenting the save with additional metadata
torch.save({
    'model_version': '1.0.0',
    'architecture': 'MyModel',
    'state_dict': model.state_dict(),
    'optimizer': optimizer.state_dict(),
    # Additional information
}, 'model_with_metadata.pth')

# Loading the model with metadata
checkpoint = torch.load('model_with_metadata.pth', map_location='cpu')
model.load_state_dict(checkpoint['state_dict'])
print(f"Loaded model version {checkpoint['model_version']} with architecture {checkpoint['architecture']}")

Examples and Use Cases

Let’s look at some practical examples and use cases where saving and loading models with torch.save and torch.load are essential.

When using transfer learning, you might start with a pre-trained model and fine-tune it on your dataset. After fine-tuning, it is necessary to save the modified model for future use. Here’s how you can save the fine-tuned model’s state_dict:

torch.save(fine_tuned_model.state_dict(), 'fine_tuned_model_state_dict.pth')

torch.save(fine_tuned_model.state_dict(), 'fine_tuned_model_state_dict.pth')

During training, it’s good practice to save checkpoints at regular intervals. This allows you to resume training from a certain point if needed. A checkpoint typically includes not just the model’s state_dict, but also the optimizer’s state, the current epoch, and the loss:

torch.save({

'epoch': current_epoch,

'model_state_dict': model.state_dict(),

'optimizer_state_dict': optimizer.state_dict(),

'loss': training_loss,

...

}, 'model_checkpoint_epoch_{}.pth'.format(current_epoch))

torch.save({ 'epoch': current_epoch, 'model_state_dict': model.state_dict(), 'optimizer_state_dict': optimizer.state_dict(), 'loss': training_loss, ... }, 'model_checkpoint_epoch_{}.pth'.format(current_epoch))

torch.save({
    'epoch': current_epoch,
    'model_state_dict': model.state_dict(),
    'optimizer_state_dict': optimizer.state_dict(),
    'loss': training_loss,
    ...
}, 'model_checkpoint_epoch_{}.pth'.format(current_epoch))

After training, you’ll want to evaluate your model on a test dataset. You can load the saved model and run it in evaluation mode to get the performance metrics:

# Load the state_dict for evaluation

model.load_state_dict(torch.load('model_state_dict.pth'))

model.eval()

# Perform evaluation with the model

test_loss, test_accuracy = evaluate_model(model, test_loader)

# Load the state_dict for evaluation model.load_state_dict(torch.load('model_state_dict.pth')) model.eval() # Perform evaluation with the model test_loss, test_accuracy = evaluate_model(model, test_loader)

# Load the state_dict for evaluation
model.load_state_dict(torch.load('model_state_dict.pth'))
model.eval()

# Perform evaluation with the model
test_loss, test_accuracy = evaluate_model(model, test_loader)

If you need to share your trained model with others, saving the entire model object can be useful. The recipients can then load the model without needing access to the original code that defined the model’s architecture:

# Save the entire model

torch.save(model, 'shared_model.pth')

# Someone else can load the model directly

other_users_model = torch.load('shared_model.pth')

other_users_model.eval() # Don't forget to set to evaluation mode!

# Save the entire model torch.save(model, 'shared_model.pth') # Someone else can load the model directly other_users_model = torch.load('shared_model.pth') other_users_model.eval() # Don't forget to set to evaluation mode!

# Save the entire model
torch.save(model, 'shared_model.pth')

# Someone else can load the model directly
other_users_model = torch.load('shared_model.pth')
other_users_model.eval() # Don't forget to set to evaluation mode!

When deploying models to production, you might save the model’s state_dict along with other necessary information such as class indices, pre-processing steps, etc., in a single file for easier management:

torch.save({

'model_state_dict': model.state_dict(),

'class_to_idx': dataset.class_to_idx,

'preprocessing': {

'mean': [0.485, 0.456, 0.406],

'std': [0.229, 0.224, 0.225]

}

}, 'deployable_model.pth')

# Load the model along with metadata for deployment

deployment_bundle = torch.load('deployable_model.pth')

model.load_state_dict(deployment_bundle['model_state_dict'])

torch.save({ 'model_state_dict': model.state_dict(), 'class_to_idx': dataset.class_to_idx, 'preprocessing': { 'mean': [0.485, 0.456, 0.406], 'std': [0.229, 0.224, 0.225] } }, 'deployable_model.pth') # Load the model along with metadata for deployment deployment_bundle = torch.load('deployable_model.pth') model.load_state_dict(deployment_bundle['model_state_dict'])

torch.save({
    'model_state_dict': model.state_dict(),
    'class_to_idx': dataset.class_to_idx,
    'preprocessing': {
        'mean': [0.485, 0.456, 0.406],
        'std': [0.229, 0.224, 0.225]
    }
}, 'deployable_model.pth')

# Load the model along with metadata for deployment
deployment_bundle = torch.load('deployable_model.pth')
model.load_state_dict(deployment_bundle['model_state_dict'])

Saving and Loading Models with torch.save and torch.load

Saving Models with torch.save

Loading Models with torch.load

Best Practices for Model Saving and Loading

Examples and Use Cases

Comments

Leave a Reply Cancel reply

Python Programming for Beginners

Murach’s Python Programming

Python Mastery

Python for Beginners