Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to eliminate "invalid actions" #19

Open
epaulz-vt opened this issue Mar 30, 2021 · 3 comments
Open

How to eliminate "invalid actions" #19

epaulz-vt opened this issue Mar 30, 2021 · 3 comments

Comments

@epaulz-vt
Copy link

Hello,

Not sure if this repo is active, but I am interested in using your environment for a research project. I have built my own simple DeepQ network to train on the ATC environment. I got it working except I get the messages "Warning invalid action: 400 for index: 0" and "Warning invalid action: 57000 for index: 1" often and I can't figure out how to resolve this.

It seems as though my agent is staying near the initial starting point, and will only move up/down/left, but will not move to the right. It does not seem to be learning past a certain point, and I wonder if this is caused by using these "invalid actions"?

Any assistance would be much appreciated.

Eric

@fvalka
Copy link
Owner

fvalka commented Mar 30, 2021

Hello Eric,

not very active. But I still am.

Sounds like you're trying to perform actions which are outside of the action space.

If you are using the continous action space with normalization (the default) everything should be normalized to between -1 and 1

see the action space definition here:

self.action_space = gym.spaces.Box(low=np.array([-1, -1, -1]),
high=np.array([1, 1, 1]))

Hope that helped.

All the best
Fabian

@epaulz-vt
Copy link
Author

Thank you for your response. I have managed to move past the invalid action issue. However, I am having a hard time understanding how to properly interact with the action space of this environment from my custom DeepQ network... let me explain.

When training on an environment like CartPole or LunarLander, the "action space" is a set of scalar values (say 0-4), one of which is selected and then gets interpreted and perhaps translated by the environment in some way. When I use that approach here, it seems that each "action" is a tuple of 3 separate actions (v,h,phi). When I try to choose a scalar action, I get an error because the environment expects to be able to index my action. However, my attempts to modify my model to select and store actions in tuples does not seem to be working.

Do you perhaps have any examples of training a model other than those from 'baselines' so that I could get a better idea of how to interact with this environment? I am very interested in getting this working.

@epaulz-vt
Copy link
Author

I suppose a simpler way to explain my dilemma is that I don't quite understand how to interact with the continuous action space (I am still fairly new to machine learning). I see that there seems to be a way to switch the environment to a discrete action space. However, no matter which mode it's in when I attempt to understand the action space with "num_outputs = env.action_space.n" it keeps telling me that 'Box' and 'MultiDiscrete' don't have an 'n' attribute.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants