-
Notifications
You must be signed in to change notification settings - Fork 323
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enhancement/train batch function #107
Enhancement/train batch function #107
Conversation
What Changed: - Custom train_batch method in VPG model - This generates a batch of data at each time step - Experience source no longer gets initialized with a device, instead the correct device is passed to the step() method in the train_batch function - Moved experience methods from rl.comon to datamodules
Hello @djbyrne! Thanks for updating this PR. There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2020-07-11 13:10:37 UTC |
Codecov Report
@@ Coverage Diff @@
## master #107 +/- ##
==========================================
+ Coverage 91.91% 91.98% +0.07%
==========================================
Files 77 78 +1
Lines 3944 4018 +74
==========================================
+ Hits 3625 3696 +71
- Misses 319 322 +3
Continue to review full report at Codecov.
|
Co-authored-by: Jirka Borovec <[email protected]>
…/djbyrne/pytorch-lightning-bolts into enhancement/train_batch_function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot! While this already looks really good and clean, I also added some questions.
|
||
return experience, reward, done | ||
|
||
def run_episode(self, device: torch.device) -> float: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is episode
a common RL term for this? Intuitively I would have called this sequence...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It depends on the task. Most tasks are Episodic in some form and will have a termination state denoting the end of the episode. This function was originally used for carrying out a validation episode and is useful
return reward, final_state, done | ||
|
||
|
||
class EpisodicExperienceStream(ExperienceSource, IterableDataset): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question about wording with episodic
pl_bolts/models/rl/__init__.py
Outdated
@@ -5,4 +5,4 @@ | |||
from pl_bolts.models.rl.noisy_dqn_model import NoisyDQN | |||
from pl_bolts.models.rl.per_dqn_model import PERDQN | |||
from pl_bolts.models.rl.reinforce_model import Reinforce | |||
from pl_bolts.models.rl.vanilla_policy_gradient_model import PolicyGradient | |||
# from pl_bolts.models.rl.vanilla_policy_gradient_model import PolicyGradient |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why did you change this one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant to raise an issue with this, some of these imports in the inits are raising errors in my runs. I meant to look into specifically why it was happening. Will update this
|
||
self.reward_list = [] | ||
for _ in range(100): | ||
self.reward_list.append(0) | ||
self.reward_list.append(torch.tensor(0)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe add a device here?
Co-authored-by: Justus Schock <[email protected]>
from @djbyrne - there are currently two things blocking the latest PR for RL bolts
|
…/djbyrne/pytorch-lightning-bolts into enhancement/train_batch_function
Before submitting
What does this PR do?
This is in relation to Lightning-AI/pytorch-lightning#2453 . Although this is a PL issue, further discussion showed that the issue would be handled with the current implementation of PL. This shows a proof of concept outlining a cleaner interface for online batch generation for RL and unsupervised learning
PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.
Did you have fun?
👍