Skip to content

Commit

Permalink
Merge branch 'main' into fix_ci_after_pr_557
Browse files Browse the repository at this point in the history
  • Loading branch information
cyanguwa authored Jan 19, 2024
2 parents 5157a4a + b4b8ae7 commit d4727cf
Show file tree
Hide file tree
Showing 2 changed files with 2 additions and 2 deletions.
2 changes: 1 addition & 1 deletion docs/examples/advanced_optimizations.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -199,7 +199,7 @@
"\n",
"</div>\n",
"\n",
"PyTorch's autograd functionality assumes that a model parameter and its corresponding gradient have the same data type. However, while low-precision data types like FP8 are sufficient for evaluating a neural network's forward and backward passes, the optimization step typically requires full FP32 precision to avoid signficant learning degradation. In addition, Tensor Cores on Hopper GPUs have the option to accumulate matrix products directly into FP32, resulting in better numerical accuracy and avoiding the need for a separate casting kernel. Thus, Transformer Engine provides an option to directly generate FP32 gradients for weight tensors. The FP32 gradients are not output to the parameter's `grad` tensor, but rather to a `main_grad` tensor that must be initialized before the backward pass."
"PyTorch's autograd functionality assumes that a model parameter and its corresponding gradient have the same data type. However, while low-precision data types like FP8 are sufficient for evaluating a neural network's forward and backward passes, the optimization step typically requires full FP32 precision to avoid significant learning degradation. In addition, Tensor Cores on Hopper GPUs have the option to accumulate matrix products directly into FP32, resulting in better numerical accuracy and avoiding the need for a separate casting kernel. Thus, Transformer Engine provides an option to directly generate FP32 gradients for weight tensors. The FP32 gradients are not output to the parameter's `grad` tensor, but rather to a `main_grad` tensor that must be initialized before the backward pass."
]
},
{
Expand Down
2 changes: 1 addition & 1 deletion docs/examples/quickstart.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
"\n",
"## Overview\n",
"\n",
"Transformer Engine (TE) is a library for accelerating Transformer models on NVIDIA GPUs, providing better performance with lower memory utilization in both training and inference. It provides support for 8-bit floating point (FP8) precision on Hopper GPUs, implements a collection of highly optimized building blocks for popular Transformer architectures, and exposes an automatic-mixed-precision-like API that can be used seamlessy with your PyTorch code. It also includes a framework-agnostic C++ API that can be integrated with other deep learning libraries to enable FP8 support for Transformers.\n",
"Transformer Engine (TE) is a library for accelerating Transformer models on NVIDIA GPUs, providing better performance with lower memory utilization in both training and inference. It provides support for 8-bit floating point (FP8) precision on Hopper GPUs, implements a collection of highly optimized building blocks for popular Transformer architectures, and exposes an automatic-mixed-precision-like API that can be used seamlessly with your PyTorch code. It also includes a framework-agnostic C++ API that can be integrated with other deep learning libraries to enable FP8 support for Transformers.\n",
"\n",
"## Let's build a Transformer layer!\n",
"\n",
Expand Down

0 comments on commit d4727cf

Please sign in to comment.