Bad performance on experiment reproduction #51

JasonLiu324 · 2024-09-29T08:22:35Z

Hi, I have successfully run the whole project and tested on several gym tasks, like FrankaCabinet and Humanoid. But the experiment result is not so good as I expected. What may be the reason?

My workstation environment is:
Ubuntu 22.04
12GB RTX 4080 GPU
16GB CPU

And the command lines I have used are:
python eureka.py env=FrankaCabinet sample=5 iteration=5 model_name=gpt-4
python eureka.py env=Anymal sample=5 iteration=5 model_name=gpt-4

The final success rate is only approximately 0.1. Does it related to the number of samples? My workstation can only run 5 samples in parallel due to the limit of GPU memory.

JasonLiu324 · 2024-09-30T07:43:26Z

And the wierd thing is that the reward reflection during the running process is almost the same:
Iteration 0: User Content:
We trained a RL policy using the provided reward function code and tracked the values of the individual components in the reward function as well as global policy metrics such as success rates and episode lengths after every 300 epochs and the maximum, mean, minimum values encountered:
distance_reward: ['0.79', '0.95', '0.91', '0.90', '0.90', '0.80', '0.86', '0.92', '0.92', '0.88'], Max: 0.98, Mean: 0.89, Min: 0.76
door_open_reward: ['0.00', '0.08', '0.18', '0.28', '0.29', '0.18', '0.15', '0.00', '0.16', '0.00'], Max: 0.32, Mean: 0.13, Min: 0.00
task_score: ['0.00', '0.00', '0.00', '0.02', '0.01', '0.03', '0.01', '0.00', '0.02', '0.00'], Max: 0.11, Mean: 0.01, Min: 0.00
episode_lengths: ['499.00', '359.18', '500.00', '495.78', '496.36', '493.24', '492.34', '499.69', '500.00', '500.00'], Max: 500.00, Mean: 490.73, Min: 230.97

Iteration 1: User Content:
We trained a RL policy using the provided reward function code and tracked the values of the individual components in the reward function as well as global policy metrics such as success rates and episode lengths after every 300 epochs and the maximum, mean, minimum values encountered:
distance_reward: ['0.79', '0.95', '0.91', '0.90', '0.90', '0.80', '0.86', '0.92', '0.92', '0.88'], Max: 0.98, Mean: 0.89, Min: 0.76
door_open_reward: ['0.00', '0.08', '0.18', '0.28', '0.29', '0.18', '0.15', '0.00', '0.16', '0.00'], Max: 0.32, Mean: 0.13, Min: 0.00
task_score: ['0.00', '0.00', '0.00', '0.02', '0.01', '0.03', '0.01', '0.00', '0.02', '0.00'], Max: 0.11, Mean: 0.01, Min: 0.00
episode_lengths: ['499.00', '359.18', '500.00', '495.78', '496.36', '493.24', '492.34', '499.69', '500.00', '500.00'], Max: 500.00, Mean: 490.73, Min: 230.97

Iteration 2: User Content:
We trained a RL policy using the provided reward function code and tracked the values of the individual components in the reward function as well as global policy metrics such as success rates and episode lengths after every 300 epochs and the maximum, mean, minimum values encountered:
distance_reward: ['0.79', '0.95', '0.91', '0.90', '0.90', '0.80', '0.86', '0.92', '0.92', '0.88'], Max: 0.98, Mean: 0.89, Min: 0.76
door_open_reward: ['0.00', '0.08', '0.18', '0.28', '0.29', '0.18', '0.15', '0.00', '0.16', '0.00'], Max: 0.32, Mean: 0.13, Min: 0.00
task_score: ['0.00', '0.00', '0.00', '0.02', '0.01', '0.03', '0.01', '0.00', '0.02', '0.00'], Max: 0.11, Mean: 0.01, Min: 0.00
episode_lengths: ['499.00', '359.18', '500.00', '495.78', '496.36', '493.24', '492.34', '499.69', '500.00', '500.00'], Max: 500.00, Mean: 490.73, Min: 230.97

Iteration 3: User Content:
We trained a RL policy using the provided reward function code and tracked the values of the individual components in the reward function as well as global policy metrics such as success rates and episode lengths after every 300 epochs and the maximum, mean, minimum values encountered:
distance_reward: ['0.79', '0.95', '0.91', '0.90', '0.90', '0.80', '0.86', '0.92', '0.92', '0.88'], Max: 0.98, Mean: 0.89, Min: 0.76
door_open_reward: ['0.00', '0.08', '0.18', '0.28', '0.29', '0.18', '0.15', '0.00', '0.16', '0.00'], Max: 0.32, Mean: 0.13, Min: 0.00
task_score: ['0.00', '0.00', '0.00', '0.02', '0.01', '0.03', '0.01', '0.00', '0.02', '0.00'], Max: 0.11, Mean: 0.01, Min: 0.00
episode_lengths: ['499.00', '359.18', '500.00', '495.78', '496.36', '493.24', '492.34', '499.69', '500.00', '500.00'], Max: 500.00, Mean: 490.73, Min: 230.97

The values are totally the same. I think there must be something wrong with the training process.

Abc123bit · 2024-10-31T01:45:58Z

Atom light value from 0 to 1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bad performance on experiment reproduction #51

Bad performance on experiment reproduction #51

JasonLiu324 commented Sep 29, 2024 •

edited

Loading

JasonLiu324 commented Sep 30, 2024

Abc123bit commented Oct 31, 2024

Bad performance on experiment reproduction #51

Bad performance on experiment reproduction #51

Comments

JasonLiu324 commented Sep 29, 2024 • edited Loading

JasonLiu324 commented Sep 30, 2024

Abc123bit commented Oct 31, 2024

JasonLiu324 commented Sep 29, 2024 •

edited

Loading