Catalyst 2021–Accelerated PyTorch 2.0

Published in

Catalyst Team

9 min readApr 19, 2021

During the last decade, the Deep Learning progress led to various projects and frameworks. One of the most famous among researchers became the PyTorch one. Thanks to its pure pythonic way of executing and great low-level design, it gathered a lot of attention from the research community. Nevertheless, with great power comes great responsibility: due to such low-level functions, users are likely to introduce bugs during their research and development process. Meanwhile, Deep Learning methods are now used almost everywhere: from on-site e-commerce recommendations to healthcare treatment or bank scoring, meaning that such… “bugs” could have serious consequences.

For the last three years, Catalyst-Team and collaborators have been working on Catalyst — a high-level PyTorch framework for Deep Learning Research and Development. It focuses on reproducibility, rapid experimentation, and codebase reuse so you can create something new rather than write yet another train loop. You get metrics, model checkpointing, advanced logging, and distributed training support without boilerplate code and low-level bugs.

In this post, I would like to share our vision on high-level Deep Learning framework API and show current development progress on several examples. All this work is done purely in an open-source way, without any side investments. If you are interested in such open-source Ecosystem development, you could support our initiative or write directly to team@catalyst-team.com for collaboration.

Let’s look at the typical Deep Learning R&D process and try to understand how we could make it better. From my perspective, Deep Learning is about creating some artifact (like NN model), which will effectively transfer your data into some machine-readable vector space. This space should give you an easy-to-use option to group or separate your data points and measure the difference with your target distribution by some predefined metrics. While we have archived enormous progress in creating different architectures and complex pipelines to transfer the data into vector spaces, a few things usually stay the same — the overall pipeline logic, the metrics, and the hardware accelerator wrappers.

Speaking about the first, I mean the SGD train loop, which usually looks like:

In such a case, you have some Deep Learning experiment, which defines the number of stages and epochs in your run, your data/components for each stage, and some logic on how would you like to handle one batch to train the model. And usually, you could quickly write something like:

PyTorch train loop example

This example shows how great PyTorch is: the code is very simple and easy to follow. Nevertheless, it becomes complicated when your project grows larger: large-project-1 or large-project-2. The same straight that gives you the power for low-level tuning and human-readable code strikes your back when there is too much human-readable code.

There is no problem if you work alone on only one project. Still, it becomes pretty painful to work in a fully-custom-pipeline team — you have to understand and test a lot of code precisely. Another case — when you work on many projects simultaneously — it becomes even more interesting if they share a lot in common (several classification/segmentation Kaggle competitions at the same time). After a few months of such work, you definitely would like to create a toolset to help you in your career.

From a historical perspective, development progress is fueled by the creation of new abstractions to encapsulate complicated things in simple interfaces, allowing them to be reused much more quickly and easily. And here come the high-level frameworks: even PyTorch is a high-level framework on top of matrix multiplications.

But before we dive deep into the Catalyst design and principles, I would like to add quick notes on another two things that are also the same for any deep learning research: metrics and infrastructure helpers.

From my perspective, metrics are an essential foundation for Deep Learning research — we could not create any new methods if we could not measure their performance correctly. In this way, creating a unified metrics API with tests, docs and examples could significantly contribute to the community.

Finally, with the rise of new hardware accelerators for Deep Learning experiments, it also became crucial to have simple APIs for the researchers to operate with complex distributed setups and training pipelines. What is even more important, we must watch after our metric computation during the distributed training and examine their correctness.

Catalyst

To solve all the challenges above, we created Catalyst — a PyTorch framework for Deep Learning R&D focused on rapid experimentation, reproducibility, and codebase reuse. It comprises a few helpful abstractions:

Runner

Starting from the beginning, Runner is an abstraction that takes all the logic of your deep learning experiment: the data you are using, the model you are training, the batch handling logic, and everything about the used metrics and monitoring systems.

The Runner has the most crucial role in connecting all other abstractions and defining the whole experiment logic into one place. Most importantly, it does not force you to use Catalyst-only primitives. It gives you a flexible way to determine the level of high-level API you want to get from the framework.

For example, you could:

Define everything in a Catalyst-way with Runner and Callbacks:
ML — multiclass classification example.
Write forward-backward on your own, using Catalyst as a for-loop wrapper: CustomRunner — PyTorch for-loop decomposition.
Mix these approaches: CV — MNIST GAN, CV — MNIST VAE examples.

Finally, the Runner architecture does not depend on PyTorch, providing directions for adoption for Tensorflow2 or JAX.
Supported Runners are listed under the Runner API section.

Engine

The Engine is the main force of the Runner. It defines the logic of hardware communication and different deep learning techniques usage like distributed or mixed-precision training.

Thanks to the Engines’ design, it’s straightforward to adapt your pipeline for different hardware accelerators. For example, you could easily support PyTorch distribute setup, Nvidia-Apex setup, or AMP distributed setup. We are also working on other hardware accelerators support like DeepSpeed, Horovod, or TPU.
You can watch Engines development progress under the Engine API section.

Callback

The Callback is an abstraction that helps you to customize the logic during your run. Once again, you could do anything natively with PyTorch and Catalyst as a for-loop wrapper. However, thanks to the callbacks, it's much easier to reuse typical deep learning extensions like metrics or augmentation tricks. For example, it's much more convenient to define the required metrics with them: ML - multiclass classification and ML – RecSys examples.

The Callback API repeats main for-loops in our train-loop abstraction:

You can find all supported callbacks under the Callback API section.

Metric

Speaking about the reusable deep learning components, the Catalyst also provides Metric abstraction for convenient metric computation during an experiment run. Its API is quite simple:

You can find all supported metrics under the Metric API section.

Catalyst Metric API has a default update and compute methods to support per-batch statistic accumulation and final computation during training. All metrics also support update and compute key-value extensions for convenient usage during the run — it gives you the flexibility to store any number of metrics or aggregations you want with a simple communication protocol to use for their logging.

Logger

Finally, speaking about the logging, with the last Catalyst release, 21.xx, we have united the monitoring system API support into one abstraction:

With such a simple API, we already provide integrations for Tensorboard and MLFlow monitoring systems. More advanced loggers for Neptune and Wandb with artifacts and hyperparameters storing are in development thanks to joint collaborations between our teams.
All currently supported loggers can be found under the Logger API section.

Examples

Combining all abstractions together, it’s straightforward to write complex deep learning pipelines in a compact but user-friendly way.

PyTorch way — for-loop decomposition with Catalyst

Before Python API examples, I would like to mention that all Catalyst abstractions are fully compatible with native PyTorch and could be used as a simple for-loop wrapper to structure your code better.

CustomRunner — PyTorch for-loop decomposition

Python API — user-friendly Deep Learning R&D

Hyperparameters optimization with Optuna

All the above examples help you write fully compatible PyTorch code without any external mixins. No custom modules or datasets required — everything works natively with PyTorch codebase, while Catalyst links it together in a more readable and reproducible way.

For more advanced examples, like GANs, VAE, or multistage runs (another unique feature of the Catalyst), please follow our minimal examples section.

The Catalyst Python API supports various user-friendly tricks, like overfit, fp16, ddp, and more, to make it easy for you to debug and speed up your R&D. To read more about all these features, please follow our .train documentation. A minor example for your interest:

full-featured MNIST example in only 60 lines of code

Config API — from research to production

Last but not least, Catalyst supports two advanced APIs for convenient production-friendly deep learning R&D. With Config API and Hydra API the Deep Learning R&D becomes fully reproducible thanks to YAML hyperparameters storage usage.

Config APIs examples can be found here. As you can see, the Config API fully repeats Runner specification in a YAML-based way, allowing you to change any part of your experiment without any code changes at all.

Thanks to such hyperparameters storage, it’s also very easy to run hyperparameters optimization with catalyst-dl tune. You could find an example for catalyst-dl tune under the Config API minimal example section. Once again, you could tune any part of your experiment with only a few lines change in your YAML file. That’s it, so simple.

During the last 3 years, we have done enormous work for accelerating Deep Learning R&D in a purely open-source ecosystem way thanks to our team and contributions. In this post, we have covered core framework design principles and a few minimal examples, so you could speed up your Deep Learning with Catalyst and make it fully reproducible.

If you are interested in Catalyst use cases:

If you are interested in Catalyst development:

If you are motivated by our open-source Catalyst ecosystem vision, you could support our initiative or write directly to team@catalyst-team.com for collaboration.