Internal change

PiperOrigin-RevId: 527383364 Change-Id: I547671eaad4f979d0503a61b4e846c56dd7b2f01
google · Apr 26, 2023 · ac3b374 · ac3b374
1 parent 18392cf
commit ac3b374
Show file tree

Hide file tree

Showing 4 changed files with 560 additions and 5 deletions.
diff --git a/README.md b/README.md
@@ -44,9 +44,8 @@ and closing the gap between simulation and the real world.
 Explore Brax easily and quickly through a series of colab notebooks:
 
 * [Brax Basics](https://colab.research.google.com/github/google/brax/blob/main/notebooks/basics.ipynb) introduces the Brax API, and shows how to simulate basic physics primitives.
-* [Brax Training](https://colab.research.google.com/github/google/brax/blob/main/notebooks/training.ipynb)
-introduces the Brax v2 API, and shows how to train a policy with the
-generalized backend.
+* [Brax Training](https://colab.research.google.com/github/google/brax/blob/main/notebooks/training.ipynb) introduces Brax's training algorithms, and lets you train your own policies directly within the colab. It also demonstrates loading and saving policies.
+* [Brax Training with PyTorch on GPU](https://colab.research.google.com/github/google/brax/blob/main/notebooks/training_torch.ipynb) demonstrates how Brax can be used in other ML frameworks for fast training, in this case PyTorch.
 
 ## Using Brax Locally
 

diff --git a/brax/training/agents/es/train.py b/brax/training/agents/es/train.py
@@ -200,7 +200,7 @@ def compute_delta(
     Returns:
 
     """
-    # NOTE - -> len(weights) * perturbation_std" is
+    # NOTE: The trick "len(weights) -> len(weights) * perturbation_std" is
     # equivalent to tuning the l2_coef.
     weights = jnp.reshape(weights, ([population_size] + [1] * (noise.ndim - 1)))
     delta = jnp.sum(noise * weights, axis=0) / population_size

diff --git a/brax/v1/experimental/composer/agent_utils.py b/brax/v1/experimental/composer/agent_utils.py
@@ -33,7 +33,7 @@
        e.g. equivalent to agent1=(..., action_agents=('agent1',), ...)
 
 agent_groups currently defines which rewards/actions belong to which agent.
-observation is the same among all agents (TODO -.
+observation is the same among all agents (TODO: add optionality).
 """
 
 from collections import OrderedDict as odict