-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement RL model for basic SubjuGator tasks #1325
Comments
I met with Mohana, Will, Daniel, and Keith about our future steps. I have concluded that we should use stable baselines 3, implementing PPO through that, and that would serve as the way we train and evaluate the model. The next goal is to read through research papers, as outlined in Notion, and come up with what we should feed through the model. I have outlined the following for Keith to provide. This information is what we, for now, think we should be sending to the PPO algorithm Possible actions for the robot This can be anything that the robot does like moving forward, backward, sideways, etc. States for the robot I need information about what the robot is measuring |
Over this week, I met with Danush Mohana and others to discuss the idea of using Reinforcement Learning algorithms to train Subjugator, as an alternative to writing missions manually. As Danush mentioned in his comment, we have decided on PPO as the training algorithm that would be the most optimal for us to use, as opposed to TRPO or GRPO. Although the PPO algorithm is abstracted through stable baselines 3, the best first step for us is to start reading through the research papers on the Notion so that we can get an understanding of how it works, and I have started on the PPO paper this week. Also, this is just my input, but I think it might be best to start with integrating the model with something simple like the ROS2 turtlesim, as a test run, before trying to integrate it with something as complex as SubjuGator. |
During this week, I was also able to meet with the team and was introduced to the idea of using a modified version of PPO Reinforcement learning algorithm. To gain background information on the algorithm and reinforcement learning in general, I began by reading the PPO Research paper and found how each of the requirements (Robot states/behaviors) that Danush mentioned fit into and affected the algorithm. Furthermore, I was able to understand the concept of the adaptive KL Penalty coefficient, which is something we talked about implementing with the Subjugator. I also agree with Will about how we can first test this with the TurtleSim first as it will be easier to implement and we can migrate the software to the Sub. Furthermore, I am not sure if we have to build the model completely from scratch, or if there's open-source software we can build from and optimize-but if we are going to try to optimize it in the future somehow, we could benefit from starting to think of areas we can optimize. |
This week, I met with Danush, and Mohana, and got some more progress done on planning the implementation of our RL algorithm. With Danush, I continued reviewing the PPO paper, and annotating it while breaking down each equation into its component parts to understand it. I read over the stable-baselines3 documentation and example code with Mohana, and it seems that its possible for us to start training RL agents without actually understanding the PPO algorithm, since the library already has it implemented. However, when we get to writing the paper, we will need to have a solid understanding of the algorithm anyways, so it would be better, in my opinion, to get a full grasp of it now, since it should also make implementing it easier. Besides that, we also found a framework for integrating ROS with a Reinforcement Learning environment at https://github.com/ncbdrck/realros. It uses ROS Noetic with Ubuntu 20.04, however, so it might need some adapting. |
We have identified a plan
We are applying this to the goal of circling a buoy without touching it. |
I met with Danush and Mohana this week, and we created the plan that Danush outlined in his previous message. We have made a new repository at https://github.com/danushsingla/RL_MIL, in which we are planning to execute the previously mentioned plan. To add to his plan, we are abandoning the idea of using turtlesim to start out testing the PPO model, and are going straight to Gazebo. I have gotten started on the first step of the plan (making a basic subscribe topic). I've written the python file for it, but still need to finish setting up a workspace and package for it in the new repo. Also, we should probably move this issue to the mil2 repository. |
I met with Danush and William to discuss our plan in implementing the RL model. This upcoming week, I plan to look through the gym environment files and understand how to make our own custom environment. I will be writing a subscriber script that can subscribe to the camera topics and eventually put the data through a convolutional neural network to filter the data and use this for our environment. |
Met with Mohana and Will this week as we start getting deeper into programming. We have had some issues setting up the workflow, and another issue with getting the subscriber to work. It seems that there might be an issue with the publisher, not on our end. This is something we will have to discuss with the rest of the team. |
I met with Mohana and Danush this week, and we couldn't get the subscriber to work, or the image topic itself to have anything published to it, for that matter, when we met in person. After some debugging on my own I realized that the gazebo simulation starts out automatically paused (oops). After verifying that the subscriber was receiving the image from the topic when the simulation was unpaused, I added OpenCV code that periodically displayed the original subscribed image, and a simple max pooling algorithm that extracts key features from the image so it can run faster when passed into the RL model. Next week we should decide if we need to use a CNN or just the max pooling we have now, and also we should start trying to build the gym environment. All the changes are visible at https://github.com/danushsingla/RL_MIL. |
I was able to meet up with William and Danush again this week, and try to get a basic subscriber to the image topic up and running. Initially when I ran the simulation on my computer, I realized that it wasn’t set up properly and I had rerun the install.sh after pulling from the mil2 GitHub. Next Week, I can start working on building the open gym environment so that we can have a framework ready when get the information max pooling algorithm. |
What needs to change?
We need to implement a basic RL model algorithm to SubjuGator using the ROS2 simulation so that the robot can automatically learn how to perform tasks.
How would this task be tested?
The text was updated successfully, but these errors were encountered: