Implement RL model for basic SubjuGator tasks #1325

danushsingla · 2025-02-09T02:14:46Z

What needs to change?

We need to implement a basic RL model algorithm to SubjuGator using the ROS2 simulation so that the robot can automatically learn how to perform tasks.

How would this task be tested?

Download Python dependencies
Run a stable_baselines3 script that connects to the ROS2 simulation
Analyze stats about training/evaluation

danushsingla · 2025-02-09T02:18:07Z

I met with Mohana, Will, Daniel, and Keith about our future steps.

I have concluded that we should use stable baselines 3, implementing PPO through that, and that would serve as the way we train and evaluate the model. The next goal is to read through research papers, as outlined in Notion, and come up with what we should feed through the model.

I have outlined the following for Keith to provide. This information is what we, for now, think we should be sending to the PPO algorithm

Possible actions for the robot

This can be anything that the robot does like moving forward, backward, sideways, etc.
If a movement is continuous (like different thrust levels for moving forward) then please state that

States for the robot

I need information about what the robot is measuring
This can be something as simple as measuring speed or measuring the distance from the target
I essentially need to build the entire ROS2 world in a matrix of numbers
For these states, I also need possible ranges for each value. If the robot can have a top speed of 100 mph then state that and its min value (which is negative if it can go backwards)

willzoo · 2025-02-10T00:38:20Z

Over this week, I met with Danush Mohana and others to discuss the idea of using Reinforcement Learning algorithms to train Subjugator, as an alternative to writing missions manually. As Danush mentioned in his comment, we have decided on PPO as the training algorithm that would be the most optimal for us to use, as opposed to TRPO or GRPO. Although the PPO algorithm is abstracted through stable baselines 3, the best first step for us is to start reading through the research papers on the Notion so that we can get an understanding of how it works, and I have started on the PPO paper this week. Also, this is just my input, but I think it might be best to start with integrating the model with something simple like the ROS2 turtlesim, as a test run, before trying to integrate it with something as complex as SubjuGator.

mohana-pamidi · 2025-02-10T02:53:07Z

During this week, I was also able to meet with the team and was introduced to the idea of using a modified version of PPO Reinforcement learning algorithm. To gain background information on the algorithm and reinforcement learning in general, I began by reading the PPO Research paper and found how each of the requirements (Robot states/behaviors) that Danush mentioned fit into and affected the algorithm. Furthermore, I was able to understand the concept of the adaptive KL Penalty coefficient, which is something we talked about implementing with the Subjugator. I also agree with Will about how we can first test this with the TurtleSim first as it will be easier to implement and we can migrate the software to the Sub. Furthermore, I am not sure if we have to build the model completely from scratch, or if there's open-source software we can build from and optimize-but if we are going to try to optimize it in the future somehow, we could benefit from starting to think of areas we can optimize.

willzoo · 2025-02-17T04:00:14Z

This week, I met with Danush, and Mohana, and got some more progress done on planning the implementation of our RL algorithm. With Danush, I continued reviewing the PPO paper, and annotating it while breaking down each equation into its component parts to understand it. I read over the stable-baselines3 documentation and example code with Mohana, and it seems that its possible for us to start training RL agents without actually understanding the PPO algorithm, since the library already has it implemented. However, when we get to writing the paper, we will need to have a solid understanding of the algorithm anyways, so it would be better, in my opinion, to get a full grasp of it now, since it should also make implementing it easier. Besides that, we also found a framework for integrating ROS with a Reinforcement Learning environment at https://github.com/ncbdrck/realros. It uses ROS Noetic with Ubuntu 20.04, however, so it might need some adapting.

danushsingla · 2025-02-21T20:07:06Z

We have identified a plan

Create a basic subscribe topic to get camera data
Condense that camera data using convolutions
Create the gym make environment and configure it for the camera data, actions, and reward/punishment
Run some tests and see what works what doesn't

We are applying this to the goal of circling a buoy without touching it.

willzoo · 2025-02-24T03:27:59Z

I met with Danush and Mohana this week, and we created the plan that Danush outlined in his previous message. We have made a new repository at https://github.com/danushsingla/RL_MIL, in which we are planning to execute the previously mentioned plan. To add to his plan, we are abandoning the idea of using turtlesim to start out testing the PPO model, and are going straight to Gazebo. I have gotten started on the first step of the plan (making a basic subscribe topic). I've written the python file for it, but still need to finish setting up a workspace and package for it in the new repo. Also, we should probably move this issue to the mil2 repository.

mohana-pamidi · 2025-02-24T04:45:16Z

I met with Danush and William to discuss our plan in implementing the RL model. This upcoming week, I plan to look through the gym environment files and understand how to make our own custom environment. I will be writing a subscriber script that can subscribe to the camera topics and eventually put the data through a convolutional neural network to filter the data and use this for our environment.

danushsingla · 2025-03-02T17:14:27Z

Met with Mohana and Will this week as we start getting deeper into programming. We have had some issues setting up the workflow, and another issue with getting the subscriber to work. It seems that there might be an issue with the publisher, not on our end. This is something we will have to discuss with the rest of the team.

willzoo · 2025-03-03T02:39:14Z

I met with Mohana and Danush this week, and we couldn't get the subscriber to work, or the image topic itself to have anything published to it, for that matter, when we met in person. After some debugging on my own I realized that the gazebo simulation starts out automatically paused (oops). After verifying that the subscriber was receiving the image from the topic when the simulation was unpaused, I added OpenCV code that periodically displayed the original subscribed image, and a simple max pooling algorithm that extracts key features from the image so it can run faster when passed into the RL model. Next week we should decide if we need to use a CNN or just the max pooling we have now, and also we should start trying to build the gym environment. All the changes are visible at https://github.com/danushsingla/RL_MIL.

mohana-pamidi · 2025-03-03T04:29:13Z

I was able to meet up with William and Danush again this week, and try to get a basic subscriber to the image topic up and running. Initially when I ran the simulation on my computer, I realized that it wasn’t set up properly and I had rerun the install.sh after pulling from the mil2 GitHub. Next Week, I can start working on building the open gym environment so that we can have a framework ready when get the information max pooling algorithm.

danushsingla added the software label Feb 9, 2025

danushsingla self-assigned this Feb 9, 2025

willzoo self-assigned this Feb 9, 2025

mohana-pamidi self-assigned this Feb 9, 2025

willzoo mentioned this issue Mar 9, 2025

Implement RL model for basic SubjuGator tasks uf-mil/mil2#82

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement RL model for basic SubjuGator tasks #1325

Implement RL model for basic SubjuGator tasks #1325

danushsingla commented Feb 9, 2025

danushsingla commented Feb 9, 2025

willzoo commented Feb 10, 2025

mohana-pamidi commented Feb 10, 2025

willzoo commented Feb 17, 2025

danushsingla commented Feb 21, 2025

willzoo commented Feb 24, 2025

mohana-pamidi commented Feb 24, 2025

danushsingla commented Mar 2, 2025

willzoo commented Mar 3, 2025

mohana-pamidi commented Mar 3, 2025 •

edited

Loading

Implement RL model for basic SubjuGator tasks #1325

Implement RL model for basic SubjuGator tasks #1325

Comments

danushsingla commented Feb 9, 2025

What needs to change?

How would this task be tested?

danushsingla commented Feb 9, 2025

willzoo commented Feb 10, 2025

mohana-pamidi commented Feb 10, 2025

willzoo commented Feb 17, 2025

danushsingla commented Feb 21, 2025

willzoo commented Feb 24, 2025

mohana-pamidi commented Feb 24, 2025

danushsingla commented Mar 2, 2025

willzoo commented Mar 3, 2025

mohana-pamidi commented Mar 3, 2025 • edited Loading

mohana-pamidi commented Mar 3, 2025 •

edited

Loading