Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement RL model for basic SubjuGator tasks #1325

Open
danushsingla opened this issue Feb 9, 2025 · 10 comments
Open

Implement RL model for basic SubjuGator tasks #1325

danushsingla opened this issue Feb 9, 2025 · 10 comments
Assignees
Labels

Comments

@danushsingla
Copy link

What needs to change?

We need to implement a basic RL model algorithm to SubjuGator using the ROS2 simulation so that the robot can automatically learn how to perform tasks.

How would this task be tested?

  1. Download Python dependencies
  2. Run a stable_baselines3 script that connects to the ROS2 simulation
  3. Analyze stats about training/evaluation
@danushsingla
Copy link
Author

I met with Mohana, Will, Daniel, and Keith about our future steps.

I have concluded that we should use stable baselines 3, implementing PPO through that, and that would serve as the way we train and evaluate the model. The next goal is to read through research papers, as outlined in Notion, and come up with what we should feed through the model.

I have outlined the following for Keith to provide. This information is what we, for now, think we should be sending to the PPO algorithm

Possible actions for the robot

This can be anything that the robot does like moving forward, backward, sideways, etc.
If a movement is continuous (like different thrust levels for moving forward) then please state that

States for the robot

I need information about what the robot is measuring
This can be something as simple as measuring speed or measuring the distance from the target
I essentially need to build the entire ROS2 world in a matrix of numbers
For these states, I also need possible ranges for each value. If the robot can have a top speed of 100 mph then state that and its min value (which is negative if it can go backwards)

@danushsingla danushsingla self-assigned this Feb 9, 2025
@willzoo willzoo self-assigned this Feb 9, 2025
@mohana-pamidi mohana-pamidi self-assigned this Feb 9, 2025
@willzoo
Copy link

willzoo commented Feb 10, 2025

Over this week, I met with Danush Mohana and others to discuss the idea of using Reinforcement Learning algorithms to train Subjugator, as an alternative to writing missions manually. As Danush mentioned in his comment, we have decided on PPO as the training algorithm that would be the most optimal for us to use, as opposed to TRPO or GRPO. Although the PPO algorithm is abstracted through stable baselines 3, the best first step for us is to start reading through the research papers on the Notion so that we can get an understanding of how it works, and I have started on the PPO paper this week. Also, this is just my input, but I think it might be best to start with integrating the model with something simple like the ROS2 turtlesim, as a test run, before trying to integrate it with something as complex as SubjuGator.

@mohana-pamidi
Copy link

During this week, I was also able to meet with the team and was introduced to the idea of using a modified version of PPO Reinforcement learning algorithm. To gain background information on the algorithm and reinforcement learning in general, I began by reading the PPO Research paper and found how each of the requirements (Robot states/behaviors) that Danush mentioned fit into and affected the algorithm. Furthermore, I was able to understand the concept of the adaptive KL Penalty coefficient, which is something we talked about implementing with the Subjugator. I also agree with Will about how we can first test this with the TurtleSim first as it will be easier to implement and we can migrate the software to the Sub. Furthermore, I am not sure if we have to build the model completely from scratch, or if there's open-source software we can build from and optimize-but if we are going to try to optimize it in the future somehow, we could benefit from starting to think of areas we can optimize.

@willzoo
Copy link

willzoo commented Feb 17, 2025

This week, I met with Danush, and Mohana, and got some more progress done on planning the implementation of our RL algorithm. With Danush, I continued reviewing the PPO paper, and annotating it while breaking down each equation into its component parts to understand it. I read over the stable-baselines3 documentation and example code with Mohana, and it seems that its possible for us to start training RL agents without actually understanding the PPO algorithm, since the library already has it implemented. However, when we get to writing the paper, we will need to have a solid understanding of the algorithm anyways, so it would be better, in my opinion, to get a full grasp of it now, since it should also make implementing it easier. Besides that, we also found a framework for integrating ROS with a Reinforcement Learning environment at https://github.com/ncbdrck/realros. It uses ROS Noetic with Ubuntu 20.04, however, so it might need some adapting.

@danushsingla
Copy link
Author

We have identified a plan

  1. Create a basic subscribe topic to get camera data
  2. Condense that camera data using convolutions
  3. Create the gym make environment and configure it for the camera data, actions, and reward/punishment
  4. Run some tests and see what works what doesn't

We are applying this to the goal of circling a buoy without touching it.

@willzoo
Copy link

willzoo commented Feb 24, 2025

I met with Danush and Mohana this week, and we created the plan that Danush outlined in his previous message. We have made a new repository at https://github.com/danushsingla/RL_MIL, in which we are planning to execute the previously mentioned plan. To add to his plan, we are abandoning the idea of using turtlesim to start out testing the PPO model, and are going straight to Gazebo. I have gotten started on the first step of the plan (making a basic subscribe topic). I've written the python file for it, but still need to finish setting up a workspace and package for it in the new repo. Also, we should probably move this issue to the mil2 repository.

@mohana-pamidi
Copy link

I met with Danush and William to discuss our plan in implementing the RL model. This upcoming week, I plan to look through the gym environment files and understand how to make our own custom environment. I will be writing a subscriber script that can subscribe to the camera topics and eventually put the data through a convolutional neural network to filter the data and use this for our environment.

@danushsingla
Copy link
Author

Met with Mohana and Will this week as we start getting deeper into programming. We have had some issues setting up the workflow, and another issue with getting the subscriber to work. It seems that there might be an issue with the publisher, not on our end. This is something we will have to discuss with the rest of the team.

@willzoo
Copy link

willzoo commented Mar 3, 2025

I met with Mohana and Danush this week, and we couldn't get the subscriber to work, or the image topic itself to have anything published to it, for that matter, when we met in person. After some debugging on my own I realized that the gazebo simulation starts out automatically paused (oops). After verifying that the subscriber was receiving the image from the topic when the simulation was unpaused, I added OpenCV code that periodically displayed the original subscribed image, and a simple max pooling algorithm that extracts key features from the image so it can run faster when passed into the RL model. Next week we should decide if we need to use a CNN or just the max pooling we have now, and also we should start trying to build the gym environment. All the changes are visible at https://github.com/danushsingla/RL_MIL.

@mohana-pamidi
Copy link

mohana-pamidi commented Mar 3, 2025

I was able to meet up with William and Danush again this week, and try to get a basic subscriber to the image topic up and running. Initially when I ran the simulation on my computer, I realized that it wasn’t set up properly and I had rerun the install.sh after pulling from the mil2 GitHub. Next Week, I can start working on building the open gym environment so that we can have a framework ready when get the information max pooling algorithm.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants