-
Notifications
You must be signed in to change notification settings - Fork 114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reproducing the paper result of doom_deadly_corridor in VizDoom #296
Comments
Hi @sjaelee25 I think you might be dealing with two separate issues here.
A good test to whether your agent can learn anything better: look at reward_max. Looks like under the random policy initially the agent almost never sees higher rewards compared to what it eventually gets. RL can't learn a behavior it's never able to observe. Why there's a discrepancy between the current version of the codebase and 2020 version when this experiment was done, I can't say. 1000 different things were changed between then and now. But I remember this scenario was always very high variance. |
Also I recommend playing the scenario by hand to be able to see what kinds of rewards are achievable with human-level play. This will give you better idea of what's possible and what the agent needs to do. |
Thank you for your reply and advice! As you mentioned, the reward scale for While other tasks are also considered, I would like to demonstrate the advantages of my proposed method using deadly_corridor. I apologize for asking about code that developed over 3 years ago, but I am also currently curious of reward shaping (not scaling). Thank you again! |
I don’t think there was any special reward shaping function. You can check the codebase release version for icml2020 to make sure this is an exploration task and it’s just very sensitive to initial conditions. Most likely your agent hits an early local minimum and is unable to improve |
Okay, thank you for your reply! |
I have tried to reproduce the experimental result for Deadly Corridor in VizDoom basic tasks.
the command is referred as in documentation https://www.samplefactory.dev/09-environment-integrations/vizdoom/#reproducing-paper-results
python -m sf_examples.vizdoom.train_vizdoom --train_for_env_steps=500000000 --algo=APPO **--env=doom_deadly_corridor** --env_frameskip=4 --use_rnn=True --num_workers=36 --num_envs_per_worker=8 --num_policies=1 --batch_size=2048 --wide_aspect_ratio=False --experiment=doom_basic_envs
However, the performance is like
while the paper result is as follows
Could you check this issue? Thanks!
The text was updated successfully, but these errors were encountered: